These data were collected from http://www.baseball-reference.com by Joe Futoma and Ken McAlinn (Duke Stat Sci PhD students) as part of the Data Expeditions project sponsored by iiD. The python code used for collecting these data can be downloaded here.
This data frame contains the following variables (columns):
team
- team nameopponent
- opposing team namedate
- data, yyyymmddheader
- whether the game was 0 - regular game, 1 - double header (first game), 2 - double header (second game)home
- home or away, 0 - away game, 1 - home gamewin
- win/loss, 0 - loss, 1 - winb_XXX
- batting stats
b_AB
- at bats b_R
- runs scoredb_H
- hitsb_RBI
- runs batted inb_BB
- bases on balls (walks)b_SO
- strikeoutsb_BA
- batting averageb_OPS
- on-base plus sluggingb_Pit
- number of pitchesb_Str
- strikesb_WPA
- win probability added totalb_aLI
- average leverage index (1 - average, >1 - high pressure, <1 - low pressure)b_WPA+
- win probability added b_WPA-
- win probability subtracted b_RE24
- base-out runs addedb_PO
- putoutsb_A
- assists b_2B
- doublesb_3B
- triplesb_HR
- home runsb_LOB
- left on Baseb_RISP
- runners in scoring positionb_avg
- batting average b_SB
- stolen basesp_XXX
- pitching stats
p_IP
- innings pitched p_H
- hitsp_R
- runsp_ER
- earned runsp_BB
- bases on balls (walks)p_SO
- strikeoutsp_HR
- home runsp_ERA
- earned run averagep_BF
- batters facedp_Pit
- number of pitchesp_Str
- strikesp_Ctct
- contact percentagep_StS
- strikes swingingp_StL
- strikes lookingp_GB
- ground ballp_FB
- fly ballp_LD
- line drivep_Unk
- unknownp_GSc
- game scorep_IR
- inherit runnersp_IS
- inherit scoresp_WPA
- win probability added totalp_aLI
- average leverage indexp_RE24
- base-out runs addedYou can find detailed information on what these statistics mean at http://www.baseball-reference.com/, at http://mlb.mlb.com/mlb/official_info/baseball_basics/abbreviations.jsp, or elsewhere on the web.
Reference: “Baseball Reference.” baseball-reference.com.