November 10, 2015

Inference for a proportion

Inference for difference in two proportions

Inference for regression

Testing errors

**Due Thursday:**HW 3

2010 GSS:

gss <- read.csv("https://stat.duke.edu/~mc301/data/gss2010.csv")

Data dictionary at https://gssdataexplorer.norc.org/variables/vfilter

Note that not all questions are asked every year

Another question on the survey is "Do you think the use of marijuana should be made legal or not?". Do these data convincing evidence that majority of Americans think that the use of marijuana should **not** be legal? Note that the variable of interest in the dataset is `grass`

.

grass_summ <- gss %>% filter(!is.na(grass)) %>% summarise(x = sum(grass == "NOT LEGAL"), n = length(grass), p_hat = x / n) grass_summ

## x n p_hat ## 1 656 1259 0.5210485

What are the hypotheses?

Let \(p\) represent people who do **not** think marijuana should be legalized.

\[H_0: p = 0.5\] \[H_A: p > 0.5\]

\[\hat{p} \sim N\left(mean = p, SE = \sqrt{\frac{p (1-p)}{n}} \right)\] \[Z = \frac{obs - null}{SE}\]

se <- sqrt(0.5 * 0.5 / grass_summ$n) (z <- (grass_summ$p_hat - 0.5) / se)

## [1] 1.493699

p-value = P(observed or more extreme outcome | \(H_0\) true)

pnorm(z, lower.tail = FALSE)

## [1] 0.06762719

Assuming \(\alpha = 0.05\), what is the conclusion of the hypothesis test?

What is the equivalent confidence level to this hypothesis test? At this level would you expect a confidence interval to include the null value?

\[point~estimate \pm critical~value \times SE\]

z_star <- qnorm(0.95) se <- sqrt(0.52 * (1 - 0.52) / grass_summ$n) pt_est <- grass_summ$p_hat round(pt_est + c(-1,1) * z_star * se, 3)

## [1] 0.498 0.544

Interpret this interval in context of the data. Does your confidence interval agree with the result of the hypothesis test?

Note that `prop.test`

function uses a different (equivalent) distribution.

# HT prop.test(grass_summ$x, grass_summ$n, p = 0.5, alternative = "greater", correct = FALSE)

## ## 1-sample proportions test without continuity correction ## ## data: grass_summ$x out of grass_summ$n, null probability 0.5 ## X-squared = 2.2311, df = 1, p-value = 0.06763 ## alternative hypothesis: true p is greater than 0.5 ## 95 percent confidence interval: ## 0.4978702 1.0000000 ## sample estimates: ## p ## 0.5210485

# CI prop.test(grass_summ$x, grass_summ$n, correct = FALSE, conf.level = 0.90)$conf.int

## [1] 0.4978702 0.5441364 ## attr(,"conf.level") ## [1] 0.9

Is there a difference between the proportions of people who think marijuana should **not** be legalized based on whether they favor or oppose a law which would require a person to obtain a police permit before he or she could buy a gun?

Let \(p\) represent people who do **not** think marijuana should be legalized.

\[H_0: p_{pro\_gunlaw} = p_{anti\_gunlaw}\] \[H_A: p_{pro\_gunlaw} \ne p_{anti\_gunlaw}\]

Note that the variable identifying people who are pro and anti gun laws in the dataset is `gunlaw`

.

What type of visualization would be appropriate for evaluating this research question?

gss_gun_grss_summ <- gss_gun_grss %>% group_by(gunlaw) %>% summarise(x = sum(grass == "NOT LEGAL"), n = length(grass), p_hat = x / n) gss_gun_grss_summ

## Source: local data frame [2 x 4] ## ## gunlaw x n p_hat ## (fctr) (int) (int) (dbl) ## 1 FAVOR 211 420 0.5023810 ## 2 OPPOSE 65 135 0.4814815

\[(\hat{p}_1 - \hat{p}_2) \sim N\left(mean = (p_1 - p_2), SE = \sqrt{ \frac{p_1 (1 - p_1)}{n_1} + \frac{p_2 (1 - p_2)}{n_2} } \right)\] \[Z = \frac{obs - null}{SE}\]

We need to find a reasonable value for \(p_1\) and \(p_2\) that are equal to each other, and that make sense in the context of these data?

total_suc <- sum(gss_gun_grss_summ$x) total_n <- sum(gss_gun_grss_summ$n) (p_pool <- total_suc / total_n)

## [1] 0.4972973

se <- sqrt( (p_pool * (1-p_pool))/gss_gun_grss_summ$n[1] + (p_pool * (1-p_pool))/gss_gun_grss_summ$n[2] ) (z <- ((gss_gun_grss_summ$p_hat[1] - gss_gun_grss_summ$p_hat[2]) - 0) / se)

## [1] 0.4224902

p-value = P(observed or more extreme outcome | \(H_0\) true)

pnorm(z, lower.tail= FALSE) * 2

## [1] 0.6726672

Assuming \(\alpha = 0.05\), what is the conclusion of the hypothesis test?

What is the equivalent confidence level to this hypothesis test? At this level would you expect a confidence interval to include 0?

\[point~estimate \pm critical~value \times SE\]

The only difference is that SE is calculated using the sample proportions, and not the pooled proportion.

prop.test(x = c(gss_gun_grss_summ$x[1], gss_gun_grss_summ$x[2]), n = c(gss_gun_grss_summ$n[1], gss_gun_grss_summ$n[2]), correct = FALSE)

## ## 2-sample test for equality of proportions without continuity ## correction ## ## data: c(gss_gun_grss_summ$x[1], gss_gun_grss_summ$x[2]) out of c(gss_gun_grss_summ$n[1], gss_gun_grss_summ$n[2]) ## X-squared = 0.1785, df = 1, p-value = 0.6727 ## alternative hypothesis: two.sided ## 95 percent confidence interval: ## -0.07600556 0.11780450 ## sample estimates: ## prop 1 prop 2 ## 0.5023810 0.4814815

We now have been introduced to both simulation based and CLT based methods for statistical inference.

For most simulation based methods you wrote your own code, for CLT based methods we introduced some built in functions.

Take away message: If certain conditions are met CLT based methods may be used for statistical inference. To do so, we would need to know how the standard error is calculated for the given sample statistic of interest.

- What you should know:
- What does standard error mean?
- What does the p-value mean?
- How do we make decisions based on the p-value?

- numerical data -
`t.test`

- testing for one mean: \(H_0: \mu_x = \mu_0\)
- comparing two means (groups
`1`

and`2`

): \(H_0: \mu_1 = \mu_2\)

- categorical data -
`prop.test`

- testing for one proportion: \(H_0: p_x = p_0\)
- comparing two proportions (groups
`1`

and`2`

): \(H_0: p_1 = p_2\)

The GSS gives the following 10 question vocabulary test:

- SPACE (school, noon, captain, room, board, donâ€™t know)
- BROADEN (efface, make level, elapse, embroider, widen, donâ€™t know)
- EMANATE (populate, free, prominent, rival, come, donâ€™t know)
- EDIBLE (auspicious, eligible, fit to eat, sagacious, able to speak, donâ€™t know)
- ANIMOSITY (hatred, animation, disobedience, diversity, friendship, donâ€™t know)
- PACT (puissance, remonstrance, agreement, skillet, pressure, donâ€™t know)
- CLOISTERED (miniature, bunched, arched, malady, secluded, donâ€™t know)
- CAPRICE (value, a star, grimace, whim, inducement, donâ€™t know)
- ACCUSTOM (disappoint, customary, encounter, get used to, business, donâ€™t know)
- ALLUSION (reference, dream, eulogy, illusion, aria, donâ€™t know)

The variable of interest is `wordsum`

:

ggplot(data = gss, aes(x = wordsum)) + geom_histogram(binwidth = 1)

Number of words correct in a vocabulary test (0 - 10) vs. age:

vocab_age <- lm(wordsum ~ age, data = gss) summary(vocab_age)

## ## Call: ## lm(formula = wordsum ~ age, data = gss) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.1186 -1.1859 0.0562 1.2713 4.3251 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.392481 0.159257 33.860 < 2e-16 *** ## age 0.013447 0.003132 4.294 1.88e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.053 on 1383 degrees of freedom ## (659 observations deleted due to missingness) ## Multiple R-squared: 0.01315, Adjusted R-squared: 0.01244 ## F-statistic: 18.44 on 1 and 1383 DF, p-value: 1.88e-05

\(H_0: \beta_{age} = 0\), Slope of age is 0

\(H_A: \beta_{age} \ne 0\), Slope of age is different than 0

p-value = 1.88e-05 \(\rightarrow\) Reject \(H_0\) at 5% significance level.

The data provide convincing evidence that the slope of age is different than 0 for predicting score on vocabulary test.

confint(vocab_age, level = 0.95)

## 2.5 % 97.5 % ## (Intercept) 5.080070478 5.7048924 ## age 0.007303302 0.0195906

We are 95% confident that for each year a person is older their vocabulary score is expected to be higher, on average, by 0.0073 to 0.0196 points.

Number of words correct in a vocabulary test (0 - 10) vs. age + unemployed over the last 10 years:

vocab_age_unemp <- lm(wordsum ~ age + unemp, data = gss) summary(vocab_age_unemp)

## ## Call: ## lm(formula = wordsum ~ age + unemp, data = gss) ## ## Residuals: ## Min 1Q Median 3Q Max ## -6.1105 -1.2748 0.0904 1.3782 4.4321 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5.124384 0.266168 19.252 < 2e-16 *** ## age 0.018261 0.004657 3.921 9.64e-05 *** ## unempYES -0.049498 0.174337 -0.284 0.777 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.148 on 734 degrees of freedom ## (1307 observations deleted due to missingness) ## Multiple R-squared: 0.0248, Adjusted R-squared: 0.02215 ## F-statistic: 9.335 on 2 and 734 DF, p-value: 9.926e-05

**Age:**

\(H_0: \beta_{age} = 0\), Slope of age is 0, given that unemployment is in the model \(H_A: \beta_{age} \ne 0\), Slope of age is not 0, given that unemployment is in the model

p-value = 9.64e-05 \(\rightarrow\) Reject \(H_0\) at 5% significance level. The data provide convincing evidence that the slope of age is different than 0 for predicting score on vocabulary test, given that unemployment is in the model.

**Unemployment:**

\(H_0: \beta_{unempYES} = 0\), Slope of unemployment is 0, given that age is in the model \ \(H_A: \beta_{unempYES} \ne 0\), Slope of unemployment is not 0, given that age is in the model

p-value = 0.777 \(\rightarrow\) Fail to reject \(H_0\) at 5% significance level. The data do not provide convincing evidence that the slope of unemployment is different than 0 for predicting score on vocabulary test, given that age is in the model.

confint(vocab_age_unemp, level = 0.95)

## 2.5 % 97.5 % ## (Intercept) 4.601842116 5.64692636 ## age 0.009117995 0.02740331 ## unempYES -0.391757193 0.29276161

All else held constant (i.e. given no change in unemployment status), we are 95% confident that for each year a person is older their vocabulary score is expected to be higher, on average, by 0.0091 to 0.0274 points.

Interpret the confidence interval for the slope of unemployment.

Model selection based on p-value method:

- Backwards elimination: Remove the variable with the highest p-value, re-fit, repeat until all variables are significant at the chosen significance level.
- Forward selection: Start with the variable with the lowest p-value, re-fit, repeat until all no more significant variables left at the chosen sigificance level.

This is used in literature, but is not really recommended!

- Relies on arbitraty significance level cutoff
- Multiple testing!

Instead use adjusted \(R^2\), or AIC, or other crieterion based model selection.

- Type 1: Reject \(H_0\) when you shouldn't have
- P(Type 1 error) = \(\alpha\)

- Type 2: Fail to reject \(H_0\) when you should have
- P(Type 2 error) is harder to calculate, but increases as \(\alpha\) decreases

In a court of law

\(H_0\): Defendant is innocent

\(H_A\): Defendant is guilty

Which is worse: Type 1 or Type 2 error?