Getting Started

A brief outline of the steps you need to get started is shown below. See the Lab 01 Instructions for more details about the steps.

  • Go to the STA210-Sp19 organization on GitHub (https://github.com/STA210-Sp19). Click on the repo with the prefix hw-01. It contains the starter documents you need to complete the lab.
  • Clone the repo in RStudio Cloud
  • Configure git using the use_git_config() function.

Tips

Here are some tips as you complete HW 01:

  • Make sure all your code chunks are informatively named, and these labels are not repeated
  • Periodically Knit your document and commit changes (the more often the better, for example, once per each new task)
  • Push all your changes back to your GitHub repo
  • Once you push your changes back you do not need to do anything else to “submit” your work. And you can of course push multiple times throughout the assignment. At the time of the deadline we will take whatever is in your repo and consider it your final submission, and grade the state of your work at that time (which means even if you made mistakes before then, you wouldn’t be penalized for them as long as the final state of your work is correct).

Packages

We will use the following packages in this assignment:

library(tidyverse)
library(modelr)
library(readr) 
library(broom)

We use functions from the readr package to read in the bikeshare data from a .csv file. The broom package is used to display model output in a tidy format.

Data

For this assignment, you will continue analyzing the bikeshare dataset from Lab 01. The data comes from the Capital Bikeshare in Washington D.C. The Capital Bikeshare is a system in which customers can rent a bike for little cost, ride it around the city, and return it to a station near their destination. You can get more information about the bikeshare on their website, https://www.capitalbikeshare.com/. We will read in the data from the file bikeshare.csv located in the data folder.

bikeshare <- read_csv("data/bikeshare.csv")

This dataset contains information about the number of bike rentals, environmental conditions, and other information about the each day in 2011 and 2012. This anlaysis will focus on the following variables:

season 1: Winter, 2: Spring, 3: Summer, 4: Fall
temp Temperature (in \(^{\circ}C\)) ÷ 41
count total number of bike rentals

Questions

Part 1: Computations & Concepts

The Computations & Concepts section of homework contains short answer questions about the concepts discussed in class. Some of these questions may also require short chunks of code to produce the output needed to answer the question. Answers should be written in complete sentences.

  1. For Part 1 of the assignment, we will analyze the simple linear regression model you created in Lab 01. Before fitting the regression model, we want to do two things to prepare the data:
    • Take a subset that only contains data from winter months.
    • Create a new variable temp_c that is calculated as temp * 41.

Fill in the code below to create a new dataframe with the modifications stated above and assign the dataframe to winter_data.

_____ <- bikeshare %>%
  filter(_____________) %>%
  mutate(_____________)
  1. Fit a regression model with temp_c as the predictor variable and count as the response and assign the results to winter_model. Use the code below to display the model coefficients along with the test statistics and confidence intervals for the coefficients.
winter_model %>%
  tidy(conf.int=TRUE)
  1. Interpret the 95% confidence interval for \(\beta_1\) in the context of the data.

  2. Suppose we now calculate a 90% confidence interval for \(\beta_1\). Would the width of the interval be larger, smaller, or the same as the width of the 95% confidence interval calculated in the previous question? Briefly explain.

  3. Based on the confidence interval, is there a statistically significant linear relationship between the temperature and the number of bike rentals in the winter? Briefly explain your reasoning.

  4. What is the value of the test statistic associated with the null hypothesis \(H_0: \beta_1 = 0\)? Interpret this value in the context of the problem.

  5. Suppose your roommate reads the regression output and says, “the probability that the slope is not 0 is 7.28e-25.” Is your friend correct? Briefly explain.

  6. Use the code below to calculate R2, and interpret this value in the context of the problem.

rsquare(winter_model,winter_data)

Part 2: Data Analysis

The Data Analysis section of homework contains open-ended data analysis questions. Your response should be neatly organized and read as a complete narrative. This means that in addition to addressing the question there should also be data exploration and an analysis of the model assumptions. In short, these questions should be treated as “mini-projects”.

  1. Use simple linear regression to describe the relationship between the temperature (temp_c) and the number of bike rentals (count) for spring season. The description should include discussion about the significance of the relationship and interpretations of any relevant model coefficients. Your response should also include data exploration and a discussion of the assumptions.

    • Include all relevant code and resulting output.
    • All plots should have proper labels for the axes and an informative title.

Grading

Total 60
Questions 1 - 8 30
Question 9 20
Documents neatly organized (Markdown and knitted documents) 5
Grammar and writing quality 3
Regular and informative commit messages 2