The main objective of this analysis is to understand how encouragement affects the frequency that children watch Sesame Street. We will use the following variables:

Response:

Predictors:

# read in dataset
sesame <- read_csv("data/sesame.csv")
# mean-center relevant continuous variables, make categorical variables factors
sesame <- sesame %>% 
  mutate(viewcat = as.factor(viewcat), 
         site = as.factor(site), 
         prenumbCent = prenumb - mean(prenumb), 
         preletCent = prelet - mean(prelet), 
         ageCent = age - mean(age),
         viewenc = ifelse(viewenc == 1, "Encouraged", "Not Encouraged"))


Questions

  1. We will build a model to predict how often a child in this study watched Sesame Street. What type of model should we build? Why?

  2. Describe how you would conduct exploratory data analysis. What plots and/or summary statistics would you include? What information would you learn from the exploratory data analysis?

model1 <- multinom(viewcat ~ site + viewenc + prenumbCent + preletCent + ageCent,
                   data = sesame)
## # weights:  40 (27 variable)
## initial  value 332.710647 
## iter  10 value 308.329993
## iter  20 value 282.195646
## iter  30 value 280.866903
## final  value 280.866878 
## converged
kable(tidy(model1, conf.int=TRUE, exponentiate = FALSE),
      format = "markdown")
y.level term estimate std.error statistic p.value conf.low conf.high
2 (Intercept) 2.4476479 0.5948957 4.1144148 0.0000388 1.2816736 3.6136221
2 site2 -0.0562097 0.7812605 -0.0719474 0.9426438 -1.5874521 1.4750327
2 site3 -1.0005051 0.6497524 -1.5398252 0.1236030 -2.2739963 0.2729862
2 site4 -1.8505122 0.6480878 -2.8553417 0.0042991 -3.1207411 -0.5802834
2 site5 -1.7438076 0.8392694 -2.0777686 0.0377307 -3.3887455 -0.0988698
2 viewencNot Encouraged -2.6728847 0.5013879 -5.3309715 0.0000001 -3.6555870 -1.6901824
2 prenumbCent 0.0325903 0.0355537 0.9166519 0.3593251 -0.0370936 0.1022742
2 preletCent 0.0036567 0.0415046 0.0881037 0.9297943 -0.0776909 0.0850043
2 ageCent -0.0341343 0.0417836 -0.8169316 0.4139675 -0.1160287 0.0477600
3 (Intercept) 2.5153851 0.5911066 4.2553830 0.0000209 1.3568374 3.6739328
3 site2 0.2024818 0.7468361 0.2711195 0.7862991 -1.2612899 1.6662536
3 site3 -0.8327374 0.6358129 -1.3097210 0.1902902 -2.0789078 0.4134329
3 site4 -2.4377774 0.6854693 -3.5563625 0.0003760 -3.7812725 -1.0942822
3 site5 -3.6262825 1.2406520 -2.9228844 0.0034681 -6.0579158 -1.1946492
3 viewencNot Encouraged -2.4463374 0.5017349 -4.8757573 0.0000011 -3.4297196 -1.4629551
3 prenumbCent 0.0705815 0.0354470 1.9911869 0.0464603 0.0011068 0.1400563
3 preletCent -0.0151321 0.0410750 -0.3684021 0.7125735 -0.0956376 0.0653734
3 ageCent -0.0329576 0.0417658 -0.7891049 0.4300507 -0.1148170 0.0489019
4 (Intercept) 2.0259213 0.6142830 3.2980262 0.0009737 0.8219488 3.2298938
4 site2 0.9550186 0.7495735 1.2740827 0.2026341 -0.5141185 2.4241557
4 site3 -0.5940721 0.6701628 -0.8864594 0.3753700 -1.9075671 0.7194228
4 site4 -2.3732146 0.7582255 -3.1299588 0.0017483 -3.8593092 -0.8871199
4 site5 -1.6291088 0.8770139 -1.8575632 0.0632311 -3.3480244 0.0898069
4 viewencNot Encouraged -2.3322395 0.5110683 -4.5634591 0.0000050 -3.3339150 -1.3305639
4 prenumbCent 0.0700397 0.0356597 1.9641139 0.0495169 0.0001480 0.1399314
4 preletCent 0.0085247 0.0400768 0.2127080 0.8315547 -0.0700245 0.0870738
4 ageCent -0.0266555 0.0431001 -0.6184563 0.5362746 -0.1111301 0.0578191


  1. Interpret the intercept associated with the odds of viewcat == 2 versus viewcat == 1.

  2. Interpret the effect of the numbers pretest score on the odds of viewership.

  3. The primary objective of the experiment was to understand the effect of encouragement viewenc on viewership. Does encouragement have a significant effect on viewership? If so, describe the effect. Otherwise, explain why not.

  4. We want to test if there are any significant interactions with viewenc and the pretests. We create a model that includes the variables from model1 along with viewenc*preletCent and viewenc*prenumbCent.

model2 <- multinom(viewcat ~ site + viewenc + prenumbCent + preletCent + ageCent + 
                     viewenc*preletCent + viewenc*prenumbCent,
                   data = sesame)
## # weights:  48 (33 variable)
## initial  value 332.710647 
## iter  10 value 309.032326
## iter  20 value 288.174392
## iter  30 value 277.855094
## final  value 277.774974 
## converged

The results from the drop-in-deviance test are shown below. Is there evidence of a significant interaction effect? Explain.

anova(model1, model2, test = "Chisq")
## Likelihood ratio tests of Multinomial Models
## 
## Response: viewcat
##                                                                                                Model
## 1                                                site + viewenc + prenumbCent + preletCent + ageCent
## 2 site + viewenc + prenumbCent + preletCent + ageCent + viewenc * preletCent + viewenc * prenumbCent
##   Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1       693   561.7338                                
## 2       687   555.5499 1 vs 2     6 6.183808 0.4029182
  1. How would you assess the appropriateness of the model flit? Describe the plots, tables, and/or calculations you would create to assess model fit.

References

Data from http://www2.stat.duke.edu/~jerry/sta210/sesamelab.html