6. An example case study in Minitab

A good statistical analysis typically includes (i) a statement of the scientific question of interest; (ii) description of the data; (iii) some informal exploratory data analysis and (iv) formal inference addressing the questions raised in (i) ( outline ). Here is an example:

Last semester we asked students in STAT 110 to fill out questionaires asking for their age, height, weight, religion, major, attitude about abortion legislation, political leanings, their attitude towards STAT 110 before the first class, and their attitude towards STAT 110 after the first two classes. We have entered these data into the file /afs/acpub/project/sta215/class.data. There are 9 columns corresponding to the variables:

AGE (in years); HEIGHT (in inches); WEIGHT (in pound); RELIGION (1=Catholic, 2=Protestant, 3=Jewish, 4=Other, 5=None); MAJOR (1=psych, 3=bio, 4=pps, 5=soc, 11=other); attitude about abortion legislation: ABVIEW (1= ``unrestricted pro choice'' to 4=''unrestricted pro life''); political leaning: POL (1=very conservative to 5=very liberal), attitude towards STA 110 before the first class: BEFORE (0=very negative to 8=very positive); attitude after first class: AFTER (0 to 8).

Copy the file to your account:

	cp /afs/acpub/project/sta215/class.data class.data

You can look at the data by typing more class.data

   19 70.80   165	  2       4     2          3.0 	2   8
   21 67.92   140	  2       1     1          2.5  2   8       
   ....

Then enter Minitab and read in the data

   minitab
   read 'class.data' c1-c9
   names c1='AGE' c2='HEIGHT' c3='WEIGHT' c4='REL'
   names c5='MAJ' c6='ABVIEW' c7='POL' c8='BEFORE' c9='AFTER'
   let c10='BEFORE' - 'AFTER'
   names c10='ATTDIFF'

The read command reads the data into the minitab columns c1 through c9. The let command defines a new variable by taking the difference of 'BEFORE' and 'AFTER'. You will need the variable 'ATTDIFF' to answer the questions later. Looking at these data, what do you think about the following three questions? For each question, your answer should contain three parts: (a) Which variables will you use to answer the question? What is the relevant data model? (b) Use appropriate graphical descriptive statistics, (c) Explain in no more than 50 words how the plots answer the question.

Here are two sample questions to help you with Questions 1,2 and 3 (in your answer you of course don't need to mention the Minitab commands used):

Example 1: Are there any differences across religions in attitudes towards abortion?

(a) To answer this question we will consider the two columns REL and ABVIEW. This is k-sample data: Think of the data rearranged in a table with the ABVIEW measurements for students with REL=1 (catholic) in the first column, REL=2 (protestant) in the second column etc.

(b) The appropriate graphical format is shown in the section on in descriptive statistics for k-sample data.

Creating such a graph with Minitab is not straightforward. Therefore we will use Minitab only to compute the required sample means and sample standard deviations. We get these by using the Minitab command (don't forget the ``;''!):

 descriptive 'ABVIEW';
 by 'REL'.

The by subcommand will compute the desciptive statistics seperately for each level of REL, i.e. for each column of the k-sample data. This is exactly what we need to make the desired plot. Read off the sample means and standard deviations, and draw (by hand) the graph.

(c) The average responses for the 5 groups seem very similar. Granted, there are some small differences, but those can be attributed to chance. The sample means are so close that it is impossible to tell whether differences reflect true effects, or are just due to random errors.

Example 2: Are older students more conservative than younger ones?

(a) We will look at the variables AGE and POL. This is a correlation-regression data model.

(b) A useful graphical representation is the least squares regression line (see Figure 3.27. in the textbook). Use the commands plot 'POL' 'AGE' and regress 'POL' 1 'AGE' to get a scatterplot of POL versus AGE, and to compute intercept and slope of the regression line.

(c) The line shows only a slight positive slope, i.e. suggests that older students are on the average slightly more liberal. But the sample correlation coefficient of only 0.14 tells us that the strength of this relationship is very weak.

Question 1: Are there any differences in political leaning across majors?

Click here for an answer.

Question 2: Have attitudes towards STAT 110 changed after the first class?

Hint

Click here for an answer.

Question 3: What weight would you expect for a student with a height of 6.1 ft?

Click here for an answer.