Goto Lab: | #0 | #1 | #2 |
E. J. Sternglass in 1969 attributed a slowdown in the decrease in infant mortality rates in the US to increased radiation exposure from nuclear weapons testing. He estimated around 400,000 "excess deaths" between 1945 and 1962 were due to this increased exposure.
Sternglass's hypothesis was later examined by Victor Fuchs of Stanford university in 1979/80. This case study in exploratory analysis will use Fuchs's data to examine the Sternglass hypothesis.
The file has 528 observations,
each of one corresponding
to one row of the file. The first 48 observations are for
each one of the continental states and the year 1960, the
next 48 are for the same states (ordered in the same way)
but for 1961, and so on until 1970. The variables are, from
the first to the last column, total infant mortality (TIM),
neonatal infant mortality (NIM), postnatal infant mortality
(PIM), percent non-white births (NWBR), per capita income
(INC), strontium-90 content of milk (SR90), caresium-137
content of milk, number of births (NB), state (STATE),
and year (YEAR).
mort.asc
TIM | NIM | PIM | NWBR | INC | SR90 | CS137 | NB | STATE | YEAR |
The question of interest here is:
Does exposure to low-level radiation cause an increase
in infant mortality?
Some points to think about are:
You'll be making most of the plots that were shown in class. I realize some may still be fighting with Splus, but stay with it. Here are the plots you'll be turning in:
> par(mfrow=c(2,2))Here are some hints:
> sr90_mort$SR90Now make a matrix out of it so that each state corresponds to a row of the matrix, and each year corresponds to a column. Because the data rows correspond to 48 states repeated in the same order for 11 times, this matrix is easy to make:
> sr90mat_matrix(sr90,ncol=11,nrow=48) > sr90mat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 10.0 10.0 13.8 29.5 29.0 20.0 14.8 12.0 12.0 12.0 10.0 [2,] 10.0 10.3 13.8 31.5 30.0 22.0 17.0 14.0 13.0 8.0 9.0 [3,] 8.0 8.5 11.0 26.7 25.0 18.3 13.0 13.0 11.0 8.0 9.0 : [46,] 6.8 9.5 14.5 24.2 26.4 20.8 16.6 10.8 10.4 7.0 5.0 [47,] 9.5 12.0 13.5 28.0 30.0 17.5 11.8 8.0 8.0 6.0 6.0 [48,] 3.7 3.9 4.2 12.8 9.7 6.9 5.8 3 2 1.8 2A lag 1 measurement is the measurement from the previous year. So for 1960 we want the 1959 measurement; for 1961 we want the 1960 measurement; and so on. To make such a matrix for sr90 we create a new matrix where the rows are shifted over by a column. We use cbind which combines columns into matricies. Since we don't have 1959 sr90 measurements, the lag 1 sr90 measurements for 1960 will be missing. In Splus, NA denotes a missing value. We'll add a column of missing values to get the lag 1 sr90 values:
> sr90lag1_cbind(rep(NA,48),sr90mat) > sr90lag1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [1,] NA 10.0 10.0 13.8 29.5 29.0 20.0 14.8 12.0 12.0 12.0 10.0 [2,] NA 10.0 10.3 13.8 31.5 30.0 22.0 17.0 14.0 13.0 8.0 9.0 [3,] NA 8.0 8.5 11.0 26.7 25.0 18.3 13.0 13.0 11.0 8.0 9.0 : [46,] NA 6.8 9.5 14.5 24.2 26.4 20.8 16.6 10.8 10.4 7.0 5.0 [47,] NA 9.5 12.0 13.5 28.0 30.0 17.5 11.8 8.0 8.0 6.0 6.0 [48,] NA 3.7 3.9 4.2 12.8 9.7 6.9 5.8 3 2 1.8 2Since there is no TIM measurement in 1971 to compare the 1970 SR90 measurement to, we can remove the last column of the lag 1 matrix for SR90.
> sr90lag1_sr90lag1[,-12] > sr90lag1 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] NA 10.0 10.0 13.8 29.5 29.0 20.0 14.8 12.0 12.0 12.0 [2,] NA 10.0 10.3 13.8 31.5 30.0 22.0 17.0 14.0 13.0 8.0 [3,] NA 8.0 8.5 11.0 26.7 25.0 18.3 13.0 13.0 11.0 8.0 : [46,] NA 6.8 9.5 14.5 24.2 26.4 20.8 16.6 10.8 10.4 7.0 [47,] NA 9.5 12.0 13.5 28.0 30.0 17.5 11.8 8.0 8.0 6.0 [48,] NA 3.7 3.9 4.2 12.8 9.7 6.9 5.8 3 2 1.8Take a look at sr90lag1 to make sure it's the right thing. Now if we make a similar, but unlagged, matrix of TIM, we can see which TIM values match up with the lag 1 SR90 values:
> timmat_matrix(mort$TIM,ncol=11,nrow=48) > timmat [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [1,] 25.50 25.26 25.33 24.14 23.51 22.58 22.36 22.91 21.52 20.15 21.01 [2,] 23.62 24.36 22.09 22.68 22.57 21.09 21.58 20.56 18.83 20.53 18.05 [3,] 24.13 27.17 22.73 24.60 24.66 21.88 22.77 20.92 19.73 19.95 17.60 : [46,] 23.42 22.72 22.80 22.07 22.49 21.43 21.13 19.18 19.69 18.88 18.73 [47,] 23.19 22.88 21.92 21.41 22.38 21.07 21.42 19.36 19.71 17.49 15.92 [48,] 23.31 23.28 22.93 22.33 21.75 22.06 20.84 19.57 18.97 18.3 17.22This is now easy to plot:
> plot(sr90lag1,timmat,xlab="log SR90 from the previous year", ylab="total infant mortality")Note that plot treats the matricies sr90lag1 and timmat as though they are long vectors created by combining the columns. Splus can turn a matrix into a vector with the function as.vector()
> as.vector(timmat)This vector is exactly the same as mort$TIM. This was just an aside. You don't need to know this to make the plot, but it may be helpful some day.
> incmat_matrix(mort$INC,ncol=11,nrow=48) > inc60_incmat[,1] > incdiff_incmat[,4]-incmat[,1]Since incmat holds income by state in each column. Column 1 corresponds to 1960 and Column 4 corresponds to 1963. So now the vector incdiff holds the change in income. Likewise you can make vectors incdiff and sr90diff.
> par(pty="s") # make plot type square > plot(inc60,incdiff,xlab="1960 Income", ylab="1963 Income - 1960 Income", xlim=c(15,40),ylim=c(-25,30))Now Splus will make square plotting regions until you tell it to stop. You can do this with a par command:
> par(pty="m")This says make plot type maximal - which is the default. You should be able to figure out how to make plot 8 now.
> yearmat_matrix(mort$YEAR,ncol=11,nrow=48) > matplot(t(yearmat),t(timmat),xlab="year",ylab="total infant mortality", type="l",main="TIM from 1960-1970 by state")Notice I used type="l", plotting lines only. I think the figure gets too cluttered if you try to add plotting sympols. The lines are good here because they connect data from the same state, and the lines give us a sense of connection in time. I leave it to you to make plot 10.