The main objective of this analysis is to understand how encouragement affects the frequency that children watch Sesame Street. We will use the following variables:
Response:
viewcat
Predictors:
age
: child’s age in monthsprenumb
: score on numbers pretest (0 to 54)prelet
: score on letters pretest (0 to 58)viewenc
: 1: encouraged to watch, 2: not encouragedsite:
# read in dataset
sesame <- read_csv("data/sesame.csv")
# mean-center relevant continuous variables, make categorical variables factors
sesame <- sesame %>%
mutate(viewcat = as.factor(viewcat),
site = as.factor(site),
prenumbCent = prenumb - mean(prenumb),
preletCent = prelet - mean(prelet),
ageCent = age - mean(age),
viewenc = ifelse(viewenc == 1, "Encouraged", "Not Encouraged"))
We will build a model to predict how often a child in this study watched Sesame Street. What type of model should we build? Why?
Describe how you would conduct exploratory data analysis. What plots and/or summary statistics would you include? What information would you learn from the exploratory data analysis?
model1 <- multinom(viewcat ~ site + viewenc + prenumbCent + preletCent + ageCent,
data = sesame)
## # weights: 40 (27 variable)
## initial value 332.710647
## iter 10 value 308.329993
## iter 20 value 282.195646
## iter 30 value 280.866903
## final value 280.866878
## converged
kable(tidy(model1, conf.int=TRUE, exponentiate = FALSE),
format = "markdown")
y.level | term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|---|
2 | (Intercept) | 2.4476479 | 0.5948957 | 4.1144148 | 0.0000388 | 1.2816736 | 3.6136221 |
2 | site2 | -0.0562097 | 0.7812605 | -0.0719474 | 0.9426438 | -1.5874521 | 1.4750327 |
2 | site3 | -1.0005051 | 0.6497524 | -1.5398252 | 0.1236030 | -2.2739963 | 0.2729862 |
2 | site4 | -1.8505122 | 0.6480878 | -2.8553417 | 0.0042991 | -3.1207411 | -0.5802834 |
2 | site5 | -1.7438076 | 0.8392694 | -2.0777686 | 0.0377307 | -3.3887455 | -0.0988698 |
2 | viewencNot Encouraged | -2.6728847 | 0.5013879 | -5.3309715 | 0.0000001 | -3.6555870 | -1.6901824 |
2 | prenumbCent | 0.0325903 | 0.0355537 | 0.9166519 | 0.3593251 | -0.0370936 | 0.1022742 |
2 | preletCent | 0.0036567 | 0.0415046 | 0.0881037 | 0.9297943 | -0.0776909 | 0.0850043 |
2 | ageCent | -0.0341343 | 0.0417836 | -0.8169316 | 0.4139675 | -0.1160287 | 0.0477600 |
3 | (Intercept) | 2.5153851 | 0.5911066 | 4.2553830 | 0.0000209 | 1.3568374 | 3.6739328 |
3 | site2 | 0.2024818 | 0.7468361 | 0.2711195 | 0.7862991 | -1.2612899 | 1.6662536 |
3 | site3 | -0.8327374 | 0.6358129 | -1.3097210 | 0.1902902 | -2.0789078 | 0.4134329 |
3 | site4 | -2.4377774 | 0.6854693 | -3.5563625 | 0.0003760 | -3.7812725 | -1.0942822 |
3 | site5 | -3.6262825 | 1.2406520 | -2.9228844 | 0.0034681 | -6.0579158 | -1.1946492 |
3 | viewencNot Encouraged | -2.4463374 | 0.5017349 | -4.8757573 | 0.0000011 | -3.4297196 | -1.4629551 |
3 | prenumbCent | 0.0705815 | 0.0354470 | 1.9911869 | 0.0464603 | 0.0011068 | 0.1400563 |
3 | preletCent | -0.0151321 | 0.0410750 | -0.3684021 | 0.7125735 | -0.0956376 | 0.0653734 |
3 | ageCent | -0.0329576 | 0.0417658 | -0.7891049 | 0.4300507 | -0.1148170 | 0.0489019 |
4 | (Intercept) | 2.0259213 | 0.6142830 | 3.2980262 | 0.0009737 | 0.8219488 | 3.2298938 |
4 | site2 | 0.9550186 | 0.7495735 | 1.2740827 | 0.2026341 | -0.5141185 | 2.4241557 |
4 | site3 | -0.5940721 | 0.6701628 | -0.8864594 | 0.3753700 | -1.9075671 | 0.7194228 |
4 | site4 | -2.3732146 | 0.7582255 | -3.1299588 | 0.0017483 | -3.8593092 | -0.8871199 |
4 | site5 | -1.6291088 | 0.8770139 | -1.8575632 | 0.0632311 | -3.3480244 | 0.0898069 |
4 | viewencNot Encouraged | -2.3322395 | 0.5110683 | -4.5634591 | 0.0000050 | -3.3339150 | -1.3305639 |
4 | prenumbCent | 0.0700397 | 0.0356597 | 1.9641139 | 0.0495169 | 0.0001480 | 0.1399314 |
4 | preletCent | 0.0085247 | 0.0400768 | 0.2127080 | 0.8315547 | -0.0700245 | 0.0870738 |
4 | ageCent | -0.0266555 | 0.0431001 | -0.6184563 | 0.5362746 | -0.1111301 | 0.0578191 |
Interpret the intercept associated with the odds of viewcat == 2
versus viewcat == 1
.
Interpret the effect of the numbers pretest score on the odds of viewership.
The primary objective of the experiment was to understand the effect of encouragement viewenc
on viewership. Does encouragement have a significant effect on viewership? If so, describe the effect. Otherwise, explain why not.
We want to test if there are any significant interactions with viewenc
and the pretests. We create a model that includes the variables from model1
along with viewenc*preletCent
and viewenc*prenumbCent
.
model2 <- multinom(viewcat ~ site + viewenc + prenumbCent + preletCent + ageCent +
viewenc*preletCent + viewenc*prenumbCent,
data = sesame)
## # weights: 48 (33 variable)
## initial value 332.710647
## iter 10 value 309.032326
## iter 20 value 288.174392
## iter 30 value 277.855094
## final value 277.774974
## converged
The results from the drop-in-deviance test are shown below. Is there evidence of a significant interaction effect? Explain.
anova(model1, model2, test = "Chisq")
## Likelihood ratio tests of Multinomial Models
##
## Response: viewcat
## Model
## 1 site + viewenc + prenumbCent + preletCent + ageCent
## 2 site + viewenc + prenumbCent + preletCent + ageCent + viewenc * preletCent + viewenc * prenumbCent
## Resid. df Resid. Dev Test Df LR stat. Pr(Chi)
## 1 693 561.7338
## 2 687 555.5499 1 vs 2 6 6.183808 0.4029182
Data from http://www2.stat.duke.edu/~jerry/sta210/sesamelab.html