Goto Lab:

Lab 2

Due Thurs 16 Sept in class

Case study: Radioactivity and Infant Mortality

E. J. Sternglass in 1969 attributed a slowdown in the decrease in infant mortality rates in the US to increased radiation exposure from nuclear weapons testing. He estimated around 400,000 "excess deaths" between 1945 and 1962 were due to this increased exposure.

Sternglass's hypothesis was later examined by Victor Fuchs of Stanford university in 1979/80. This case study in exploratory analysis will use Fuchs's data to examine the Sternglass hypothesis.

The file has 528 observations, each of one corresponding to one row of the file. The first 48 observations are for each one of the continental states and the year 1960, the next 48 are for the same states (ordered in the same way) but for 1961, and so on until 1970. The variables are, from the first to the last column, total infant mortality (TIM), neonatal infant mortality (NIM), postnatal infant mortality (PIM), percent non-white births (NWBR), per capita income (INC), strontium-90 content of milk (SR90), caresium-137 content of milk, number of births (NB), state (STATE), and year (YEAR).

mort.asc

TIM NIM PIM NWBR INC SR90 CS137 NB STATE YEAR

The question of interest here is:
Does exposure to low-level radiation cause an increase in infant mortality?
Some points to think about are:

Association vs causation: can the data actually answer the question of interest?
What kind of data would we need to really answer the question?
Why use radioactivity in milk as a measure of exposure?
Why consider income and % non-white births?
Conclusion you'd make, assuming you could find an association.
Conclusion you'd make, assuming you could not find an association.

Assignment:

You'll be making most of the plots that were shown in class. I realize some may still be fighting with Splus, but stay with it. Here are the plots you'll be turning in:

Histogram of SR90
Histogram of log SR90
TIM vs INC
TIM vs log(NWBR)
TIM vs log(SR90)
TIM vs log(SR90 lagged 1) This year's TIM vs last year's log SR(90)
INC 1963 - INC 1960 vs INC 1960
TIM 1963 - TIM 1960 vs SR90 1963 - SR90 1960
TIM vs year for the 48 states (in just one graph)
SR90 vs year for the 48 states (in just one graph)
Graphs 1-4 on a single page of 4 plots; same with plots 5-8; plots 9-10 on a single page of 2 plots. So you can tell Splus to place 4 plots on each page using:
> par(mfrow=c(2,2))
Here are some hints:

Read the file into a dataframe called mort.
If all else fails, consult the online manuals. Fortune favors those that can gleen useful information from manuals. Also feel free to email me or German (german@stat.duke.edu).
From the previous lab, you should be able to make plots 1 - 5 with relative ease.
Plot 6 requires you to make a vector which contains the previous year's log(SR90). Here's a way to do it:
Make a vector of SR90:
> sr90_mort$SR90
Now make a matrix out of it so that each state corresponds to a row of the matrix, and each year corresponds to a column. Because the data rows correspond to 48 states repeated in the same order for 11 times, this matrix is easy to make:
> sr90mat_matrix(sr90,ncol=11,nrow=48) > sr90mat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 10.0 10.0 13.8 29.5 29.0 20.0 14.8 12.0 12.0 12.0 10.0 [2,] 10.0 10.3 13.8 31.5 30.0 22.0 17.0 14.0 13.0 8.0 9.0 [3,] 8.0 8.5 11.0 26.7 25.0 18.3 13.0 13.0 11.0 8.0 9.0 : [46,] 6.8 9.5 14.5 24.2 26.4 20.8 16.6 10.8 10.4 7.0 5.0 [47,] 9.5 12.0 13.5 28.0 30.0 17.5 11.8 8.0 8.0 6.0 6.0 [48,] 3.7 3.9 4.2 12.8 9.7 6.9 5.8 3 2 1.8 2
A lag 1 measurement is the measurement from the previous year. So for 1960 we want the 1959 measurement; for 1961 we want the 1960 measurement; and so on. To make such a matrix for sr90 we create a new matrix where the rows are shifted over by a column. We use cbind which combines columns into matricies. Since we don't have 1959 sr90 measurements, the lag 1 sr90 measurements for 1960 will be missing. In Splus, NA denotes a missing value. We'll add a column of missing values to get the lag 1 sr90 values:
> sr90lag1_cbind(rep(NA,48),sr90mat) > sr90lag1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,] NA 10.0 10.0 13.8 29.5 29.0 20.0 14.8 12.0 12.0 12.0 10.0 [2,] NA 10.0 10.3 13.8 31.5 30.0 22.0 17.0 14.0 13.0 8.0 9.0 [3,] NA 8.0 8.5 11.0 26.7 25.0 18.3 13.0 13.0 11.0 8.0 9.0 : [46,] NA 6.8 9.5 14.5 24.2 26.4 20.8 16.6 10.8 10.4 7.0 5.0 [47,] NA 9.5 12.0 13.5 28.0 30.0 17.5 11.8 8.0 8.0 6.0 6.0 [48,] NA 3.7 3.9 4.2 12.8 9.7 6.9 5.8 3 2 1.8 2
Since there is no TIM measurement in 1971 to compare the 1970 SR90 measurement to, we can remove the last column of the lag 1 matrix for SR90.
> sr90lag1_sr90lag1[,-12] > sr90lag1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] NA 10.0 10.0 13.8 29.5 29.0 20.0 14.8 12.0 12.0 12.0 [2,] NA 10.0 10.3 13.8 31.5 30.0 22.0 17.0 14.0 13.0 8.0 [3,] NA 8.0 8.5 11.0 26.7 25.0 18.3 13.0 13.0 11.0 8.0 : [46,] NA 6.8 9.5 14.5 24.2 26.4 20.8 16.6 10.8 10.4 7.0 [47,] NA 9.5 12.0 13.5 28.0 30.0 17.5 11.8 8.0 8.0 6.0 [48,] NA 3.7 3.9 4.2 12.8 9.7 6.9 5.8 3 2 1.8
Take a look at sr90lag1 to make sure it's the right thing. Now if we make a similar, but unlagged, matrix of TIM, we can see which TIM values match up with the lag 1 SR90 values:
> timmat_matrix(mort$TIM,ncol=11,nrow=48) > timmat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 25.50 25.26 25.33 24.14 23.51 22.58 22.36 22.91 21.52 20.15 21.01 [2,] 23.62 24.36 22.09 22.68 22.57 21.09 21.58 20.56 18.83 20.53 18.05 [3,] 24.13 27.17 22.73 24.60 24.66 21.88 22.77 20.92 19.73 19.95 17.60 : [46,] 23.42 22.72 22.80 22.07 22.49 21.43 21.13 19.18 19.69 18.88 18.73 [47,] 23.19 22.88 21.92 21.41 22.38 21.07 21.42 19.36 19.71 17.49 15.92 [48,] 23.31 23.28 22.93 22.33 21.75 22.06 20.84 19.57 18.97 18.3 17.22
This is now easy to plot:
> plot(sr90lag1,timmat,xlab="log SR90 from the previous year", ylab="total infant mortality")
Note that plot treats the matricies sr90lag1 and timmat as though they are long vectors created by combining the columns. Splus can turn a matrix into a vector with the function as.vector()
> as.vector(timmat)
This vector is exactly the same as mort$TIM. This was just an aside. You don't need to know this to make the plot, but it may be helpful some day.
Plots 7 and 8 require you to look at differences. This is quite easy. Again, making matricies helps here. Here's one way to create the vector of income differences:
> incmat_matrix(mort$INC,ncol=11,nrow=48) > inc60_incmat[,1] > incdiff_incmat[,4]-incmat[,1]
Since incmat holds income by state in each column. Column 1 corresponds to 1960 and Column 4 corresponds to 1963. So now the vector incdiff holds the change in income. Likewise you can make vectors incdiff and sr90diff.
Note the plot shown in class was forced to be a square and the x and y-axes were fixed in the plot command. This can be done with the command:
> par(pty="s") # make plot type square > plot(inc60,incdiff,xlab="1960 Income", ylab="1963 Income - 1960 Income", xlim=c(15,40),ylim=c(-25,30))
Now Splus will make square plotting regions until you tell it to stop. You can do this with a par command:
> par(pty="m")
This says make plot type maximal - which is the default. You should be able to figure out how to make plot 8 now.
Use matplot to do these final plots. You've already made the data matricies sr90mat and timmat. All that's left is to make a year matrix and then use matplot. Remember, matplot plots columns of the matricies against each other, but our matricies have each state's data recorded along the rows. The function t() transposes a matrix - making its rows columns. Hence matplot is called below with the two matricies transposed. Below is how to make plot 9:
> yearmat_matrix(mort$YEAR,ncol=11,nrow=48) > matplot(t(yearmat),t(timmat),xlab="year",ylab="total infant mortality", type="l",main="TIM from 1960-1970 by state")
Notice I used type="l", plotting lines only. I think the figure gets too cluttered if you try to add plotting sympols. The lines are good here because they connect data from the same state, and the lines give us a sense of connection in time. I leave it to you to make plot 10.
Good luck!