Part 1: Home field advantage?

In team sports, the term home advantage (also called home field/court/ice advantage) describes the psychological advantage that the home team is said to have over the visiting team as a result of playing in familiar facilities and in front of supportive fans. In many sports, such designations may also apply to games played at a neutral site; as the rules of various sports make different provisions for home and visiting teams. In baseball, for instance, the team designated the home team bats second in each inning, whereas the “visiting” team bats first. Additionally in baseball, stadium dimensions differ by stadium (higher fences, higher pitching mounds etc.) that can effect a teams performance over the season.

Explore and test whether home field advantage exists for each team:

To get a list of unique team names you can use the unique function.

Part 2: Trend throughout the season

Explore how the batting/pitching statistics trend as the season progress. This is a very open-ended question, but as you’re planning your approach generally think about how we can measure trend. In your write up address this first before you present your findings. Some interesting questions that can be answered are 1) are trends consistent through the season and 2) are there breaks in the trend before/after all-star break.

A note on working with dates in R: There are a variety of packages that allow you to work with dates in R. One good option is lubridate. First, install and load the package:


To convert the date variable to a data object that lubridate will understand as years, months, and days, use the ymd function:

MLB2013$ymd = ymd(MLB2013$date)

Note that you can now easily extract year, month, and date information from this field:

MLB2013$y = year(MLB2013$ymd)
MLB2013$m = month(MLB2013$ymd)
MLB2013$d = day(MLB2013$ymd)

You can also do slightly fancier things like extracting the day of the week:

MLB2013$day = wday(MLB2013$ymd, label = TRUE)

For more information on the lubridate package see Note that you can use this package to calculate durations (which measure the exact amount of time between two points), periods (which accurately track clock times despite leap years, leap seconds and day light savings time), or intervals (a protean summary of the time information between two points).


The data set you will need for this assignment can be found at


As usual, you should complete your analysis in R / RMarkdown, and turn in a fully reproducible report. Submit your Rmd and HTML files on Sakai. Your Rmd file should use MLB2013.csv as the input data, so that I can reproduce your work with the dataset and the code included in your Rmd file.

Honor code:

This is a team assignment. You are welcomed to talk accross teams (feel free to ask questions, share ideas, or discuss concepts with other teams), but all calculations, R code, and writing must only be shared within the team. Failure to abide by these policies will result in a 0 for everyone involved. If you borrow code from an online source, make sure to cite it using a comment in your code. The comment should be visible in the HTML output.


Besides sharing ideas between each other, you can ask questions on Piazza or come by office hours. If your question is related to a code error make sure to post a MWE (minimum working example) on Piazza so that others can recreate your issue. Office hours before the assignment is due:

  • Joe and Ken: Monday 4:30 - 5:30pm at 211 Old Chem
  • Dr. Çetinkaya-Rundel: Monday 3:30 – 4:30pm