Accessing the data

The data sets can be found in the \data folder of your homework repository.

Accessing the assignment repo

Go to the #assignment-links channel on Slack and accept the assignment.

This is a team assignment. Since this is the first assignment you’re working on in your new teams, the first student from the team creating the repo will also need to note the team name and others creating the repos will need to join that team. Please be careful at this stage to make sure you join the correct repo!

Assignment

Movies

  1. Horror movies: With Halloween around the corner, let’s take a look at some horror movies.

    1. How many horror movies are in the movies dataset?

    2. Calculate 90% confidence interval for the mean audience score of horror movies, and interpret this interval in context of the data.

    3. Calculate 90% confidence interval for the mean critics’ score of horror movies, and interpret this interval in context of the data.

    4. How do the audience scores compare to critics’ scores for horror movies?

General Social Survey

  1. Working extra: One of the questions on the GSS is “How many days per month do you work extra hours beyond your usual schedule?”. Data from this question is recorded in the variable moredays.

    1. Visualize the distribution of moredays. Is the mean or the median a better measure of typical number of days oer month Americans work extra hours beyond your usual schedule?

    2. Calculate a 95% confidence interval for the typical number of days per month Americans work extra hours beyond their usual schedule. Interpret the confidence interval in context of the data.

    3. You (probably) mentioned in your interpretation that you are “95% confident”. What does “95 confident” mean?

    4. Does this interval suggest that the typical number of days Americans work extra hours beyond their usual schedule more than 5?

    5. Now calculate a 90% as well as a 99% confidence interval for the same population parameter. How does the width of the confidence interval change as the confidence level increases?

  2. Working extra and a variable of your choice:

    1. Pick a categorical variable that you think might be associated with moredays, using the GSS Data Explorer tool to look up what the variables mean.

    2. Follow one of the three instructions listed here depending on the variable you chose:

      • If the variable you picked is numerical convert it to a categorical variable with two meaningful levels.
      • If the variable you picked is categorical, but has more than two levels, convert it to a variable with two meaningful levels.
      • If the variable you picked is categorical with only two levels, just state the levels.
    3. Split your data into two, based on the two levels of the categorical variable you chose. Construct a 95% confidence interval for the typical number of days per month Americans work extra hours beyond their usual schedule for each of the two levels of the categorical variable.

Grading

Total 40 pts
Question 1 10 pts
Question 2 15 pts
Question 3 10 pts
Overall organization, code quakity, clarity, commits, etc. 5 pts