dplyr

A Grammar of Data Manipulation

dplyr is based on the concepts of functions as verbs that manipulate data frames.

Single table functions / verbs:

  • filter(): pick rows matching criteria

  • slice(): pick rows using index(es)

  • select(): pick columns by name

  • rename(): rename specific columns

  • arrange(): reorder rows

  • mutate(): add new variables

  • transmute(): create new data frame with variables

  • sample_n() / sample_frac(): randomly sample rows

  • summarise(): reduce variables to values

dplyr function rules

  • First argument is a data frame

  • Subsequent arguments say what to do with data frame

  • Always return a data frame

  • Avoid modify in place


Nested:

f( g( h(x), z=1), y=1 )


Piped:

h(x) %>% g(z=1) %>% g(y=1)

Read data

library(dplyr)
library(data.table)
library(lubridate)

park = read.csv("/home/vis/cr173/Sta523/data/parking/NYParkingViolations_small.csv",
                stringsAsFactors=FALSE) %>% 
       as.data.frame() %>%
       tbl_df()

class(park)
## [1] "tbl_df"     "tbl"        "data.frame"

Data

park$Issue.Date = mdy(park$Issue.Date)
park
## Source: local data frame [91,003 x 43]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1       1.359e+09  FXX9781                 NY        PAS 2014-02-20             20              SUBN
## 2       7.486e+09  FLZ6021                 NY        PAS 2013-08-12             37              4DSD
## 3       1.354e+09  53902MB                 NY        COM 2013-10-24             14               VAN
## 4       1.342e+09  FYM2426                 NY        PAS 2013-09-16             21               SDN
## 5       1.372e+09  GPV3714                 NY        PAS 2014-06-10             71              SUBN
## 6       1.362e+09   XBAV11                 NJ        PAS 2013-12-27             78               VAN
## 7       7.004e+09  59305JY                 NY        COM 2013-09-18             38              DELV
## 8       1.362e+09  2146518                 IN        PAS 2013-11-22             41               VAN
## 9       7.541e+09  49138M2                 MD        PAS 2013-12-18             38              SUBN
## 10      7.311e+09  GHD5283                 NY        PAS 2013-09-24             46              4DSD
## ..            ...      ...                ...        ...        ...            ...               ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
##   Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
##   Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
##   Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
##   (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
##   Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
##   (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
##   Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
##   No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

filter()

filter(park, Issue.Date > "2013/09/01", Issue.Date < "2014/6/30")
## Source: local data frame [80,761 x 43]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1       1.359e+09  FXX9781                 NY        PAS 2014-02-20             20              SUBN
## 2       1.354e+09  53902MB                 NY        COM 2013-10-24             14               VAN
## 3       1.342e+09  FYM2426                 NY        PAS 2013-09-16             21               SDN
## 4       1.372e+09  GPV3714                 NY        PAS 2014-06-10             71              SUBN
## 5       1.362e+09   XBAV11                 NJ        PAS 2013-12-27             78               VAN
## 6       7.004e+09  59305JY                 NY        COM 2013-09-18             38              DELV
## 7       1.362e+09  2146518                 IN        PAS 2013-11-22             41               VAN
## 8       7.541e+09  49138M2                 MD        PAS 2013-12-18             38              SUBN
## 9       7.311e+09  GHD5283                 NY        PAS 2013-09-24             46              4DSD
## 10      7.142e+09  GBR2885                 NY        PAS 2013-11-15             20              SUBN
## ..            ...      ...                ...        ...        ...            ...               ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
##   Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
##   Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
##   Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
##   (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
##   Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
##   (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
##   Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
##   No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

filter(park, Registration.State == "CA" | Registration.State == "AZ")
## Source: local data frame [443 x 43]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1       8.001e+09  4GXP803                 CA        PAS 2014-06-13             21              4DSD
## 2       7.555e+09  6RXT702                 CA        PAS 2013-12-03             20              4DSD
## 3       7.502e+09  AE32447                 AZ        PAS 2013-09-04             78              DELV
## 4       7.215e+09   572WWB                 CA        PAS 2013-08-01             21              4DSD
## 5       7.539e+09  AD56343                 AZ        PAS 2014-02-21             78              DELV
## 6       1.356e+09  AD55598                 AZ        PAS 2013-12-20             14               VAN
## 7       7.229e+09  6UXK262                 CA        PAS 2013-09-14             46              SUBN
## 8       7.042e+09  5UIT291                 CA        PAS 2014-04-24             14              4DSD
## 9       7.381e+09  AE85624                 AZ        PAS 2014-06-06             38               VAN
## 10      1.356e+09  AE97488                 AZ        PAS 2013-11-20             14               VAN
## ..            ...      ...                ...        ...        ...            ...               ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
##   Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
##   Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
##   Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
##   (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
##   Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
##   (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
##   Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
##   No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

slice()

slice(park, 3:8)
## Source: local data frame [6 x 43]
## 
##   Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1      1.354e+09  53902MB                 NY        COM 2013-10-24             14               VAN
## 2      1.342e+09  FYM2426                 NY        PAS 2013-09-16             21               SDN
## 3      1.372e+09  GPV3714                 NY        PAS 2014-06-10             71              SUBN
## 4      1.362e+09   XBAV11                 NJ        PAS 2013-12-27             78               VAN
## 5      7.004e+09  59305JY                 NY        COM 2013-09-18             38              DELV
## 6      1.362e+09  2146518                 IN        PAS 2013-11-22             41               VAN
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
##   Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
##   Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
##   Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
##   (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
##   Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
##   (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
##   Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
##   No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

slice(park, (n()-5):n())
## Source: local data frame [6 x 43]
## 
##   Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1      7.293e+09  69752MD                 NY        COM 2013-10-18             21               VAN
## 2      7.099e+09  EUL4171                 NY        PAS 2013-10-18             21              4DSD
## 3      7.225e+09  FYW4417                 NY        PAS 2014-05-20             46              SUBN
## 4      7.254e+09  57224MA                 NY        COM 2013-09-17             82               VAN
## 5      7.278e+09  PRU5820                 GA        PAS 2014-02-26             31              4DSD
## 6      7.857e+09  93951JX                 NY        COM 2013-08-30             69               VAN
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
##   Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
##   Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
##   Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
##   (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
##   Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
##   (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
##   Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
##   No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

select()

select(park, contains("street"))
## Source: local data frame [91,003 x 5]
## 
##    Street.Code1 Street.Code2 Street.Code3   Street.Name Intersecting.Street
## 1         60810        20390        20490 ROCKAWAY BLVD                    
## 2         28990        14810        14890     Austin St                    
## 3         10810        34770        34790         8 AVE                    
## 4             0        40404        40404 GREENPORT AVE                    
## 5         36630        26230        77930       E 40 ST                    
## 6         54490        41290        61890    MADISON ST                    
## 7         12940         9140        61090       62nd St                    
## 8          8790        54580        22590        37 AVE                    
## 9         58330        65730        60430 Manhattan Ave                    
## 10        35730        14630        14680     E 21st St                    
## ..          ...          ...          ...           ...                 ...

select(park, Street.Code1:Street.Code3)
## Source: local data frame [91,003 x 3]
## 
##    Street.Code1 Street.Code2 Street.Code3
## 1         60810        20390        20490
## 2         28990        14810        14890
## 3         10810        34770        34790
## 4             0        40404        40404
## 5         36630        26230        77930
## 6         54490        41290        61890
## 7         12940         9140        61090
## 8          8790        54580        22590
## 9         58330        65730        60430
## 10        35730        14630        14680
## ..          ...          ...          ...

{.smaller}v

select(park, -(Street.Code1:Street.Code3))
## Source: local data frame [91,003 x 40]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1       1.359e+09  FXX9781                 NY        PAS 2014-02-20             20              SUBN
## 2       7.486e+09  FLZ6021                 NY        PAS 2013-08-12             37              4DSD
## 3       1.354e+09  53902MB                 NY        COM 2013-10-24             14               VAN
## 4       1.342e+09  FYM2426                 NY        PAS 2013-09-16             21               SDN
## 5       1.372e+09  GPV3714                 NY        PAS 2014-06-10             71              SUBN
## 6       1.362e+09   XBAV11                 NJ        PAS 2013-12-27             78               VAN
## 7       7.004e+09  59305JY                 NY        COM 2013-09-18             38              DELV
## 8       1.362e+09  2146518                 IN        PAS 2013-11-22             41               VAN
## 9       7.541e+09  49138M2                 MD        PAS 2013-12-18             38              SUBN
## 10      7.311e+09  GHD5283                 NY        PAS 2013-09-24             46              4DSD
## ..            ...      ...                ...        ...        ...            ...               ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Vehicle.Expiration.Date (int),
##   Violation.Location (int), Violation.Precinct (int), Issuer.Precinct (int), Issuer.Code (int),
##   Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), Time.First.Observed (chr), Violation.County
##   (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number (chr), Street.Name (chr), Intersecting.Street
##   (chr), Date.First.Observed (int), Law.Section (int), Sub.Division (chr), Violation.Legal.Code (chr),
##   Days.Parking.In.Effect.... (chr), From.Hours.In.Effect (chr), To.Hours.In.Effect (chr), Vehicle.Color
##   (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), Meter.Number (chr), Feet.From.Curb (int),
##   Violation.Post.Code (chr), Violation.Description (chr), No.Standing.or.Stopping.Violation (lgl),
##   Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

rename()

rename(park, Days.Parking.In.Effect = Days.Parking.In.Effect....)
## Source: local data frame [91,003 x 43]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1       1.359e+09  FXX9781                 NY        PAS 2014-02-20             20              SUBN
## 2       7.486e+09  FLZ6021                 NY        PAS 2013-08-12             37              4DSD
## 3       1.354e+09  53902MB                 NY        COM 2013-10-24             14               VAN
## 4       1.342e+09  FYM2426                 NY        PAS 2013-09-16             21               SDN
## 5       1.372e+09  GPV3714                 NY        PAS 2014-06-10             71              SUBN
## 6       1.362e+09   XBAV11                 NJ        PAS 2013-12-27             78               VAN
## 7       7.004e+09  59305JY                 NY        COM 2013-09-18             38              DELV
## 8       1.362e+09  2146518                 IN        PAS 2013-11-22             41               VAN
## 9       7.541e+09  49138M2                 MD        PAS 2013-12-18             38              SUBN
## 10      7.311e+09  GHD5283                 NY        PAS 2013-09-24             46              4DSD
## ..            ...      ...                ...        ...        ...            ...               ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
##   Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
##   Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
##   Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
##   (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
##   Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect (chr), From.Hours.In.Effect (chr),
##   To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
##   Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
##   No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

arrange()

select(park, 1:6) %>% arrange(Issue.Date, Registration.State, Plate.Type, Violation.Code)
## Source: local data frame [91,003 x 6]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1       1.365e+09  R595749                 IL        PAS 2000-02-20             20
## 2       1.367e+09  72307MA                 NY        COM 2000-04-05             14
## 3       1.362e+09  63538JM                 NY        COM 2000-11-06             47
## 4       1.344e+09  PRK4769                 GA        PAS 2001-12-23             46
## 5       1.363e+09  49784JG                 NY        COM 2003-12-24             20
## 6       1.366e+09  49715JG                 NY        COM 2010-03-12             21
## 7       1.361e+09  96171MA                 NY        COM 2010-09-11             45
## 8       1.356e+09  GEN6640                 NY        PAS 2010-10-10             20
## 9       1.358e+09  DDF4985                 NY        PAS 2010-10-13             14
## 10      1.356e+09  DXY6498                 NY        PAS 2011-11-02             14
## ..            ...      ...                ...        ...        ...            ...

mutate()

select(park, 2:5) %>% mutate(month = month(Issue.Date), 
                             day = day(Issue.Date),
                             year = year(Issue.Date),
                             wday = wday(Issue.Date, label=TRUE))
## Source: local data frame [91,003 x 8]
## 
##    Plate.ID Registration.State Plate.Type Issue.Date month day year  wday
## 1   FXX9781                 NY        PAS 2014-02-20     2  20 2014 Thurs
## 2   FLZ6021                 NY        PAS 2013-08-12     8  12 2013   Mon
## 3   53902MB                 NY        COM 2013-10-24    10  24 2013 Thurs
## 4   FYM2426                 NY        PAS 2013-09-16     9  16 2013   Mon
## 5   GPV3714                 NY        PAS 2014-06-10     6  10 2014  Tues
## 6    XBAV11                 NJ        PAS 2013-12-27    12  27 2013   Fri
## 7   59305JY                 NY        COM 2013-09-18     9  18 2013   Wed
## 8   2146518                 IN        PAS 2013-11-22    11  22 2013   Fri
## 9   49138M2                 MD        PAS 2013-12-18    12  18 2013   Wed
## 10  GHD5283                 NY        PAS 2013-09-24     9  24 2013  Tues
## ..      ...                ...        ...        ...   ... ...  ...   ...

transmute()

transmute(park,
          month = month(Issue.Date), 
          day = day(Issue.Date),
          year = year(Issue.Date),
          wday = wday(Issue.Date, label=TRUE))
## Source: local data frame [91,003 x 4]
## 
##    month day year  wday
## 1      2  20 2014 Thurs
## 2      8  12 2013   Mon
## 3     10  24 2013 Thurs
## 4      9  16 2013   Mon
## 5      6  10 2014  Tues
## 6     12  27 2013   Fri
## 7      9  18 2013   Wed
## 8     11  22 2013   Fri
## 9     12  18 2013   Wed
## 10     9  24 2013  Tues
## ..   ... ...  ...   ...

distinct()

distinct(park)
## Source: local data frame [91,003 x 43]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1       1.359e+09  FXX9781                 NY        PAS 2014-02-20             20              SUBN
## 2       7.486e+09  FLZ6021                 NY        PAS 2013-08-12             37              4DSD
## 3       1.354e+09  53902MB                 NY        COM 2013-10-24             14               VAN
## 4       1.342e+09  FYM2426                 NY        PAS 2013-09-16             21               SDN
## 5       1.372e+09  GPV3714                 NY        PAS 2014-06-10             71              SUBN
## 6       1.362e+09   XBAV11                 NJ        PAS 2013-12-27             78               VAN
## 7       7.004e+09  59305JY                 NY        COM 2013-09-18             38              DELV
## 8       1.362e+09  2146518                 IN        PAS 2013-11-22             41               VAN
## 9       7.541e+09  49138M2                 MD        PAS 2013-12-18             38              SUBN
## 10      7.311e+09  GHD5283                 NY        PAS 2013-09-24             46              4DSD
## ..            ...      ...                ...        ...        ...            ...               ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
##   Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
##   Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
##   Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
##   (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
##   Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
##   (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
##   Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
##   No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)

sample_n()

select(park, 1:6) %>% sample_n(10)
## Source: local data frame [10 x 6]
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1       7.382e+09  S217629                 IL        PAS 2013-11-25             21
## 2       7.213e+09  DME8369                 NY        PAS 2014-01-31             71
## 3       7.789e+09  GGB4259                 NY        PAS 2014-05-24             38
## 4       7.939e+09  58679JZ                 NY        COM 2014-04-01             48
## 5       7.306e+09   W35CRT                 NJ        PAS 2013-08-21             37
## 6       7.888e+09  67835JU                 NY        COM 2013-09-18             31
## 7       7.537e+09  DLV3548                 NY        PAS 2013-11-16             71
## 8       7.422e+09  92979JE                 NY        COM 2014-06-13             19
## 9       7.685e+09  GCM1989                 NY        PAS 2013-12-20             16
## 10      7.099e+09  GJX6784                 NY        PAS 2013-12-25             14

sample_frac()

select(park, 1:6) %>% sample_frac(0.0001)
## Source: local data frame [9 x 6]
## 
##   Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1      5.072e+09  GBK7354                 NY        PAS 2013-11-20              7
## 2      7.956e+09  EEH8346                 NY        PAS 2014-04-22             37
## 3      7.422e+09  CXB2013                 NY        PAS 2013-12-24             21
## 4      7.528e+09   G79DBS                 NJ        PAS 2013-10-29             14
## 5      7.149e+09  FRW5407                 NY        PAS 2013-09-21             70
## 6      7.028e+09  93050JY                 NY        COM 2013-09-10             19
## 7      7.092e+09  AEW9877                 NY        PAS 2014-06-23             37
## 8      1.360e+09  FEX6097                 NY        PAS 2013-12-04             20
## 9      7.826e+09  54973MD                 NY        COM 2013-10-16             69

summarise()

summarize(park, n(), min(Issue.Date), max(Issue.Date))
## Source: local data frame [1 x 3]
## 
##     n() min(Issue.Date) max(Issue.Date)
## 1 91003      2000-02-20      2031-07-13

group_by()

select(park, 1:6) %>% group_by(Registration.State)
## Source: local data frame [91,003 x 6]
## Groups: Registration.State
## 
##    Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1       1.359e+09  FXX9781                 NY        PAS 2014-02-20             20
## 2       7.486e+09  FLZ6021                 NY        PAS 2013-08-12             37
## 3       1.354e+09  53902MB                 NY        COM 2013-10-24             14
## 4       1.342e+09  FYM2426                 NY        PAS 2013-09-16             21
## 5       1.372e+09  GPV3714                 NY        PAS 2014-06-10             71
## 6       1.362e+09   XBAV11                 NJ        PAS 2013-12-27             78
## 7       7.004e+09  59305JY                 NY        COM 2013-09-18             38
## 8       1.362e+09  2146518                 IN        PAS 2013-11-22             41
## 9       7.541e+09  49138M2                 MD        PAS 2013-12-18             38
## 10      7.311e+09  GHD5283                 NY        PAS 2013-09-24             46
## ..            ...      ...                ...        ...        ...            ...

summarise() & group_by()

select(park, 1:6) %>% 
group_by(Registration.State) %>%
summarize(n(), min(Issue.Date), max(Issue.Date))
## Source: local data frame [62 x 4]
## 
##    Registration.State   n() min(Issue.Date) max(Issue.Date)
## 1                  99   348      2013-01-14      2014-06-16
## 2                  AB     3      2013-09-13      2014-06-19
## 3                  AK    13      2013-08-08      2014-06-09
## 4                  AL    52      2013-07-29      2014-06-24
## 5                  AR    22      2013-08-02      2014-06-02
## 6                  AZ   250      2013-07-22      2014-06-24
## 7                  BC     5      2013-12-18      2014-06-23
## 8                  CA   193      2013-07-24      2014-06-25
## 9                  CO    52      2013-07-30      2014-06-21
## 10                 CT  1388      2013-07-19      2014-06-25
## 11                 DC    44      2013-08-02      2014-06-18
## 12                 DE   127      2013-06-11      2014-06-24
## 13                 DP    49      2013-08-02      2014-06-18
## 14                 FL  1152      2013-06-26      2014-06-25
## 15                 GA   293      2001-12-23      2014-06-24
## 16                 GV    10      2013-07-26      2014-05-23
## 17                 HI     3      2013-08-02      2014-04-24
## 18                 IA    68      2013-07-25      2014-06-13
## 19                 ID    69      2013-08-07      2014-06-19
## 20                 IL   318      2000-02-20      2014-06-25
## 21                 IN   508      2013-07-21      2014-06-25
## 22                 KS    19      2013-08-12      2014-06-18
## 23                 KY    37      2013-06-25      2014-06-19
## 24                 LA    22      2013-09-04      2014-06-07
## 25                 MA   780      2013-07-17      2014-06-25
## 26                 MD   518      2012-11-06      2014-06-25
## 27                 ME   185      2012-08-26      2014-06-23
## 28                 MI   171      2013-07-30      2014-06-24
## 29                 MN   148      2013-07-29      2014-06-24
## 30                 MO    26      2013-07-28      2014-06-09
## 31                 MS    37      2013-08-02      2014-06-23
## 32                 MT     7      2013-10-08      2014-06-05
## 33                 MX     1      2013-12-21      2013-12-21
## 34                 NB     4      2013-08-16      2014-02-25
## 35                 NC   481      2013-03-24      2016-02-17
## 36                 ND     5      2013-08-27      2014-06-06
## 37                 NE    16      2013-07-30      2014-06-04
## 38                 NH   104      2013-07-24      2029-12-29
## 39                 NJ  8674      2013-01-06      2017-05-21
## 40                 NM    27      2013-07-31      2014-06-20
## 41                 NS     1      2014-04-16      2014-04-16
## 42                 NV    21      2013-09-08      2014-05-22
## 43                 NY 70351      2000-04-05      2031-07-13
## 44                 OH   228      2013-07-23      2014-06-24
## 45                 OK   208      2013-07-12      2014-06-25
## 46                 ON    40      2013-07-19      2014-06-24
## 47                 OR    37      2013-07-30      2014-06-24
## 48                 PA  2263      2013-01-08      2015-05-17
## 49                 PR     2      2013-11-07      2014-02-27
## 50                 QB    49      2013-08-05      2014-06-20
## 51                 RI   123      2013-07-20      2014-06-25
## 52                 SC   196      2013-07-30      2014-06-24
## 53                 SD     5      2013-09-12      2014-04-22
## 54                 TN   165      2013-07-30      2014-06-23
## 55                 TX   251      2013-07-10      2014-06-24
## 56                 UT     9      2013-08-09      2014-04-07
## 57                 VA   628      2013-03-04      2014-12-19
## 58                 VT    79      2013-07-29      2014-06-20
## 59                 WA    56      2013-08-17      2014-06-09
## 60                 WI    41      2013-07-19      2014-06-25
## 61                 WV    19      2013-08-12      2014-06-06
## 62                 WY     2      2013-12-18      2014-04-10

select(park, 1:6) %>% 
filter(Plate.Type != 999) %>%
group_by(Plate.Type, Violation.Code) %>%
summarize(n = n(), n_states = n_distinct(Registration.State))
## Source: local data frame [690 x 4]
## Groups: Plate.Type
## 
##    Plate.Type Violation.Code n n_states
## 1         AGC             19 1        1
## 2         AGR             17 1        1
## 3         AGR             71 1        1
## 4         AGR             82 1        1
## 5         APP             14 4        2
## 6         APP             19 2        2
## 7         APP             31 1        1
## 8         APP             38 2        1
## 9         APP             40 1        1
## 10        APP             46 4        2
## ..        ...            ... .      ...

Exercise

Exercise

Using either the small or large Parking Violation dataset, try to create the following data frames:

  • A geocoding data frame with just three columns: violation precinct, address (where address is house number and street name combined), and intersecting street. You should exclude any entry without an address. Also consider adding a 4th column that is an indicator variable for addresses without a house number.

  • Construct a data frame containing Registration State and a issue date and time combined into a single DateTime column (try paste and ymd_hm).
    • Subset these data to include only datetimes between 12/22/2013 and 1/5/2014 and construct a plot of the timeseries of the frequency of violations over this two week period by state.
    • Use these data (full not subsetted) to answer the questions:
      • What day of the week are you most likely to get a ticket?
      • What hour of the day are you most likely to get a ticket?
      • What hour and day of the week are you most likely to get a ticket?

Acknowledgments

Acknowledgments