Badly formated data

I have scraped the schedule for Duke men’s basketball from the goduke statsgeek website. The resulting data frame is not well formated and needs a lot of TLC to be useful for any statistical or data science use case. You can load the data frame into R using the following code:

load(url("http://stat.duke.edu/~cr173/Sta112_Fa16/data/duke_sched.Rdata"))

To start you are best off examining duke_sched using RStudio’s viewer to get a sense of the data.

Data Cleaning

Clean up duke_sched as best you can using stringr, dplyr and any other tools your are familiar with such that:

All of your code should be reproducible such that if later in the season I went back and updated the scraped results you would still be able to produce a clean and updated data frame at the end without revising your code.

Submission instructions

Your submission should be an R Markdown file in your team App Ex repo, in a folder called AppEx_11_28_2016.Rmd.

Due date

Parts 1 and 2 are due Tuesday, Dec 5th, 5pm