Bike sharing systems are new generation of traditional bike rentals where whole process from membership, rental and return back has become automatic. Through these systems, user is able to easily rent a bike from a particular position and return back at another position. Currently, there are about over 500 bike-sharing programs around the world which is composed of over 500 thousands bicycles. Today, there exists great interest in these systems due to their important role in traffic, environmental and health issues.
Apart from interesting real world applications of bike sharing systems, the characteristics of data being generated by these systems make them attractive for the research. Opposed to other transport services such as bus or subway, the duration of travel, departure and arrival position is explicitly recorded in these systems. This feature turns bike sharing system into a virtual sensor network that can be used for sensing mobility in the city. Hence, it is expected that most of important events in the city could be detected via monitoring these data.
Source: UCI Machine Learning Repository - Bike Sharing Dataset
Here are the steps for getting started:
and voila, you’re done! Once you push your changes back you do not need to do anything else to “submit” your work. And you can of course push multiple times throughout the assignment. At the time of the deadline we will take whatever is in your repo and consider it your final submission, and grade the state of your work at that time (which means even if you made mistakes before then, you wouldn’t be penalized for them as long as the final state of your work is correct).
The data include daily bike rental counts (by members and casual users) of Capital Bikeshare in Washington, DC in 2011 and 2012 as well as weather information on these days.
The original data sources are http://capitalbikeshare.com/system-data and http://www.freemeteo.com.
The codebook is below:
Variable name | Description |
---|---|
instant |
record index |
dteday |
date |
season |
season (1:winter, 2:spring, 3:summer, 4:fall) |
yr |
year (0: 2011, 1:2012) |
mnth |
month (1 to 12) |
holiday |
weather day is holiday or not (extracted from http://dchr.dc.gov/page/holiday-schedule) |
weekday |
day of the week |
workingday |
if day is neither weekend nor holiday is 1, otherwise is 0. |
weathersit |
1: Clear, Few clouds, Partly cloudy, Partly cloudy |
2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist | |
3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds | |
4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog | |
temp |
Normalized temperature in Celsius. The values are divided by 41 (max) |
atemp |
Normalized feeling temperature in Celsius. The values are divided by 50 (max) |
hum |
Normalized humidity. The values are divided by 100 (max) |
windspeed |
Normalized wind speed. The values are divided by 67 (max) |
casual |
Count of casual users |
registered |
Count of registered users |
cnt |
Count of total rental bikes including both casual and registered |
Recode the season
variable to be a factor with meaningful level names as outlined in the codebook, with spring as the baseline level.
Recode the binary variables holiday
and workingday
to be factors with levels no (0) and yes (1), with no as the baseline level.
Recode the yr
variable to be a factor with levels 2011 and 2012, with 2011 as the baseline level.
Recode the weathersit
variable as 1 - clear, 2 - mist, 3 - light precipitation, and 4 - heavy precipitation, with clear as the baseline.
Calculate raw temperature, feeling temperature, humidity, and windspeed as their values given in the dataset multiplied by the maximum raw values stated in the codebook for each variable. Instead of writing over the existing variables, create new ones with concise but informative names.
Check that the sum of casual
and registered
adds up to cnt
for each record.
Fit a linear model predicting total daily bike rentals from daily temperature. Write the linear model, interpret the slope and the intercept in context of the data, and determine and interpret the \(R^2\).
Fit another linear model predicting total daily bike rentals from daily feeling temperature. Write the linear model, interpret the slope and the intercept in context of the data, and determine and interpret the \(R^2\). Is temperature or feeling temperature a better predictor of bike rentals?
Fit a full model predicting total daily bike rentals from season, year, whether the day is holiday or not, whether the day is a workingday or not, the weather category, temperature, feeling temperature, humidity, and windspeed, as well as the interaction between at least one numerical and one categorical variable.
Perform bckward selection using adjusted \(R^2\) as the decision criterion to find the “best” model. Provide the model output for the final model.
Interpret slope coefficients associated with two of the variables in your final model in context of the data. Note: If one of these is categorical with multiple levels, make sure you interpret all of the slope coefficients associated with the levels of the variable.
Based on the final model you found in the previous question, discuss what makes for a good day to bike in DC (as measured by rental bikes being more in demand).
When you’re done, review the .md document on GitHub to make sure you’re happy with the final state of your work. Then go get some rest!
Use the #questions channels on Slack to ask questions or posting on the GitHub Communiy. If your question is about an error you’re getting, make sure to clearly explain what generated the error as well as what the error says.
You are also welcomed to discuss the homework with each other broadly (no sharing code!) as well as ask questions at office hours.
This is an individual assignment. You are welcomed to exchange ideas with classmates and ask questions on the getting help channels discussed above however you may not share your text or code answers directly with classmates.
The Duke Community Standard applies and course academic integrity policies apply. Please review them here. Specifically, the note on sharing / reusing code.