You’re hired (by the Celtics)!

2/3 through the season (immediately after game 62), you are hired by the Boston Celtics to predict the point spread of the next game for the remaining 20 games of the season. Your task is to build a model using the variables given in the NBA1011 dataset. In addition to using the given variables as is, you might consider augmenting the data for the task (weighted average or rolling window).

Use games 1-52 as a training set, games 53-62 as a validation set, and games 63-82 as your final test set


Be creative! Use opponent stats, outside data etc.

Prize: In addition to a good score, and glory, the winning team’s members will win Duke StatSci t-shirts. The winner will be the team with lowest RMSE.


The data set you will need for this assignment can be found at


As usual, you should complete your analysis in R / RMarkdown, and turn in a fully reproducible report. Submit your Rmd and HTML files on Sakai. Your Rmd file should use NBA1011.csv as the input data, so that I can reproduce your work with the dataset and the code included in your Rmd file.

Honor code:

This is a team assignment. In this assigment we want you to limit any specific discussion on model selection to your team only, though you are allowed to have general conversations with other teams. As usual, all calculations, R code, and writing must only be shared within the team. Failure to abide by these policies will result in a 0 for everyone involved. If you borrow code from an online source, make sure to cite it using a comment in your code. The comment should be visible in the HTML output.


Besides sharing ideas between each other, you can ask questions on Piazza or come by office hours. If your question is related to a code error make sure to post a MWE (minimum working example) on Piazza so that others can recreate your issue. Office hours before the assignment is due:

  • Joe and Ken: Wednesday 4:45 - 5:45pm at 211 Old Chem
  • Dr. Çetinkaya-Rundel: Wednesday 11:30am – 12:30pm