These data were collected from http://www.baseball-reference.com by Joe Futoma and Ken McAlinn (Duke Stat Sci PhD students) as part of the Data Expeditions project sponsored by iiD. The python code used for collecting these data can be downloaded here.
This data frame contains the following variables (columns):
team
- team nameopponent
- opposing team namedate
- data, yyyymmddheader
- whether the game was 0 - regular game, 1 - double header (first game), 2 - double header (second game)home
- home or away, 0 - away game, 1 - home gamewin
- win/loss, 0 - loss, 1 - winb_XXX
- batting stats
b_AB
- at batsb_R
- runs scoredb_H
- hitsb_RBI
- runs batted inb_BB
- bases on balls (walks)b_SO
- strikeoutsb_BA
- batting averageb_OPS
- on-base plus sluggingb_Pit
- number of pitchesb_Str
- strikesb_WPA
- win probability added totalb_aLI
- average leverage index (1 - average, >1 - high pressure, <1 - low pressure)b_WPA+
- win probability addedb_WPA-
- win probability subtractedb_RE24
- base-out runs addedb_PO
- putoutsb_A
- assistsb_2B
- doublesb_3B
- triplesb_HR
- home runsb_LOB
- left on Baseb_RISP
- runners in scoring positionb_avg
- batting averageb_SB
- stolen basesp_XXX
- pitching stats
p_IP
- innings pitchedp_H
- hitsp_R
- runsp_ER
- earned runsp_BB
- bases on balls (walks)p_SO
- strikeoutsp_HR
- home runsp_ERA
- earned run averagep_BF
- batters facedp_Pit
- number of pitchesp_Str
- strikesp_Ctct
- contact percentagep_StS
- strikes swingingp_StL
- strikes lookingp_GB
- ground ballp_FB
- fly ballp_LD
- line drivep_Unk
- unknownp_GSc
- game scorep_IR
- inherit runnersp_IS
- inherit scoresp_WPA
- win probability added totalp_aLI
- average leverage indexp_RE24
- base-out runs addedYou can find detailed information on what these statistics mean at http://www.baseball-reference.com/, at http://mlb.mlb.com/mlb/official_info/baseball_basics/abbreviations.jsp, or elsewhere on the web.
Reference: “Baseball Reference.” baseball-reference.com.