Multinomial Logistic Regression

The main objective of this analysis is to understand how encouragement affects the frequency that children watch Sesame Street. We will use the following variables:

Questions

We will build a model to predict how often a child in this study watched Sesame Street. What type of model should we build? Why?
Describe how you would conduct exploratory data analysis. What plots and/or summary statistics would you include? What information would you learn from the exploratory data analysis?

model1 <- multinom(viewcat ~ site + viewenc + prenumbCent + preletCent + ageCent,
                   data = sesame)

## # weights:  40 (27 variable)
## initial  value 332.710647 
## iter  10 value 308.329993
## iter  20 value 282.195646
## iter  30 value 280.866903
## final  value 280.866878 
## converged

kable(tidy(model1, conf.int=TRUE, exponentiate = FALSE),
      format = "markdown")

y.level	term	estimate	std.error	statistic	p.value	conf.low	conf.high
2	(Intercept)	2.4476479	0.5948957	4.1144148	0.0000388	1.2816736	3.6136221
2	site2	-0.0562097	0.7812605	-0.0719474	0.9426438	-1.5874521	1.4750327
2	site3	-1.0005051	0.6497524	-1.5398252	0.1236030	-2.2739963	0.2729862
2	site4	-1.8505122	0.6480878	-2.8553417	0.0042991	-3.1207411	-0.5802834
2	site5	-1.7438076	0.8392694	-2.0777686	0.0377307	-3.3887455	-0.0988698
2	viewencNot Encouraged	-2.6728847	0.5013879	-5.3309715	0.0000001	-3.6555870	-1.6901824
2	prenumbCent	0.0325903	0.0355537	0.9166519	0.3593251	-0.0370936	0.1022742
2	preletCent	0.0036567	0.0415046	0.0881037	0.9297943	-0.0776909	0.0850043
2	ageCent	-0.0341343	0.0417836	-0.8169316	0.4139675	-0.1160287	0.0477600
3	(Intercept)	2.5153851	0.5911066	4.2553830	0.0000209	1.3568374	3.6739328
3	site2	0.2024818	0.7468361	0.2711195	0.7862991	-1.2612899	1.6662536
3	site3	-0.8327374	0.6358129	-1.3097210	0.1902902	-2.0789078	0.4134329
3	site4	-2.4377774	0.6854693	-3.5563625	0.0003760	-3.7812725	-1.0942822
3	site5	-3.6262825	1.2406520	-2.9228844	0.0034681	-6.0579158	-1.1946492
3	viewencNot Encouraged	-2.4463374	0.5017349	-4.8757573	0.0000011	-3.4297196	-1.4629551
3	prenumbCent	0.0705815	0.0354470	1.9911869	0.0464603	0.0011068	0.1400563
3	preletCent	-0.0151321	0.0410750	-0.3684021	0.7125735	-0.0956376	0.0653734
3	ageCent	-0.0329576	0.0417658	-0.7891049	0.4300507	-0.1148170	0.0489019
4	(Intercept)	2.0259213	0.6142830	3.2980262	0.0009737	0.8219488	3.2298938
4	site2	0.9550186	0.7495735	1.2740827	0.2026341	-0.5141185	2.4241557
4	site3	-0.5940721	0.6701628	-0.8864594	0.3753700	-1.9075671	0.7194228
4	site4	-2.3732146	0.7582255	-3.1299588	0.0017483	-3.8593092	-0.8871199
4	site5	-1.6291088	0.8770139	-1.8575632	0.0632311	-3.3480244	0.0898069
4	viewencNot Encouraged	-2.3322395	0.5110683	-4.5634591	0.0000050	-3.3339150	-1.3305639
4	prenumbCent	0.0700397	0.0356597	1.9641139	0.0495169	0.0001480	0.1399314
4	preletCent	0.0085247	0.0400768	0.2127080	0.8315547	-0.0700245	0.0870738
4	ageCent	-0.0266555	0.0431001	-0.6184563	0.5362746	-0.1111301	0.0578191

Interpret the intercept associated with the odds of viewcat == 2 versus viewcat == 1.
Interpret the effect of the numbers pretest score on the odds of viewership.
The primary objective of the experiment was to understand the effect of encouragement viewenc on viewership. Does encouragement have a significant effect on viewership? If so, describe the effect. Otherwise, explain why not.
We want to test if there are any significant interactions with viewenc and the pretests. We create a model that includes the variables from model1 along with viewenc*preletCent and viewenc*prenumbCent.

model2 <- multinom(viewcat ~ site + viewenc + prenumbCent + preletCent + ageCent + 
                     viewenc*preletCent + viewenc*prenumbCent,
                   data = sesame)

## # weights:  48 (33 variable)
## initial  value 332.710647 
## iter  10 value 309.032326
## iter  20 value 288.174392
## iter  30 value 277.855094
## final  value 277.774974 
## converged

The results from the drop-in-deviance test are shown below. Is there evidence of a significant interaction effect? Explain.

anova(model1, model2, test = "Chisq")

## Likelihood ratio tests of Multinomial Models
## 
## Response: viewcat
##                                                                                                Model
## 1                                                site + viewenc + prenumbCent + preletCent + ageCent
## 2 site + viewenc + prenumbCent + preletCent + ageCent + viewenc * preletCent + viewenc * prenumbCent
##   Resid. df Resid. Dev   Test    Df LR stat.   Pr(Chi)
## 1       693   561.7338                                
## 2       687   555.5499 1 vs 2     6 6.183808 0.4029182

How would you assess the appropriateness of the model flit? Describe the plots, tables, and/or calculations you would create to assess model fit.

Multinomial Logistic Regression

Sesame Street Analysis

2019-04-01

Questions

References