dplyr is based on the concepts of functions as verbs that manipulate data frames.
Single table functions / verbs:
filter()
: pick rows matching criteria
slice()
: pick rows using index(es)
select()
: pick columns by name
rename()
: rename specific columns
arrange()
: reorder rows
mutate()
: add new variables
transmute()
: create new data frame with variables
sample_n()
/ sample_frac()
: randomly sample rows
summarise()
: reduce variables to valuesFirst argument is a data frame
Subsequent arguments say what to do with data frame
Always return a data frame
Avoid modify in place
Nested:
f( g( h(x), z=1), y=1 )
Piped:
h(x) %>% g(z=1) %>% g(y=1)
library(dplyr) library(data.table) library(lubridate) park = read.csv("/home/vis/cr173/Sta523/data/parking/NYParkingViolations_small.csv", stringsAsFactors=FALSE) %>% as.data.frame() %>% tbl_df() class(park)
## [1] "tbl_df" "tbl" "data.frame"
park$Issue.Date = mdy(park$Issue.Date) park
## Source: local data frame [91,003 x 43] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN ## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD ## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN ## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN ## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN ## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN ## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV ## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN ## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN ## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD ## .. ... ... ... ... ... ... ... ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int), ## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int), ## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), ## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number ## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int), ## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect ## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), ## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr), ## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
filter(park, Issue.Date > "2013/09/01", Issue.Date < "2014/6/30")
## Source: local data frame [80,761 x 43] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN ## 2 1.354e+09 53902MB NY COM 2013-10-24 14 VAN ## 3 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN ## 4 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN ## 5 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN ## 6 7.004e+09 59305JY NY COM 2013-09-18 38 DELV ## 7 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN ## 8 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN ## 9 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD ## 10 7.142e+09 GBR2885 NY PAS 2013-11-15 20 SUBN ## .. ... ... ... ... ... ... ... ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int), ## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int), ## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), ## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number ## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int), ## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect ## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), ## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr), ## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
filter(park, Registration.State == "CA" | Registration.State == "AZ")
## Source: local data frame [443 x 43] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 8.001e+09 4GXP803 CA PAS 2014-06-13 21 4DSD ## 2 7.555e+09 6RXT702 CA PAS 2013-12-03 20 4DSD ## 3 7.502e+09 AE32447 AZ PAS 2013-09-04 78 DELV ## 4 7.215e+09 572WWB CA PAS 2013-08-01 21 4DSD ## 5 7.539e+09 AD56343 AZ PAS 2014-02-21 78 DELV ## 6 1.356e+09 AD55598 AZ PAS 2013-12-20 14 VAN ## 7 7.229e+09 6UXK262 CA PAS 2013-09-14 46 SUBN ## 8 7.042e+09 5UIT291 CA PAS 2014-04-24 14 4DSD ## 9 7.381e+09 AE85624 AZ PAS 2014-06-06 38 VAN ## 10 1.356e+09 AE97488 AZ PAS 2013-11-20 14 VAN ## .. ... ... ... ... ... ... ... ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int), ## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int), ## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), ## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number ## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int), ## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect ## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), ## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr), ## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
slice(park, 3:8)
## Source: local data frame [6 x 43] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 1.354e+09 53902MB NY COM 2013-10-24 14 VAN ## 2 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN ## 3 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN ## 4 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN ## 5 7.004e+09 59305JY NY COM 2013-09-18 38 DELV ## 6 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int), ## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int), ## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), ## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number ## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int), ## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect ## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), ## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr), ## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
slice(park, (n()-5):n())
## Source: local data frame [6 x 43] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 7.293e+09 69752MD NY COM 2013-10-18 21 VAN ## 2 7.099e+09 EUL4171 NY PAS 2013-10-18 21 4DSD ## 3 7.225e+09 FYW4417 NY PAS 2014-05-20 46 SUBN ## 4 7.254e+09 57224MA NY COM 2013-09-17 82 VAN ## 5 7.278e+09 PRU5820 GA PAS 2014-02-26 31 4DSD ## 6 7.857e+09 93951JX NY COM 2013-08-30 69 VAN ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int), ## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int), ## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), ## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number ## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int), ## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect ## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), ## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr), ## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
select(park, contains("street"))
## Source: local data frame [91,003 x 5] ## ## Street.Code1 Street.Code2 Street.Code3 Street.Name Intersecting.Street ## 1 60810 20390 20490 ROCKAWAY BLVD ## 2 28990 14810 14890 Austin St ## 3 10810 34770 34790 8 AVE ## 4 0 40404 40404 GREENPORT AVE ## 5 36630 26230 77930 E 40 ST ## 6 54490 41290 61890 MADISON ST ## 7 12940 9140 61090 62nd St ## 8 8790 54580 22590 37 AVE ## 9 58330 65730 60430 Manhattan Ave ## 10 35730 14630 14680 E 21st St ## .. ... ... ... ... ...
select(park, Street.Code1:Street.Code3)
## Source: local data frame [91,003 x 3] ## ## Street.Code1 Street.Code2 Street.Code3 ## 1 60810 20390 20490 ## 2 28990 14810 14890 ## 3 10810 34770 34790 ## 4 0 40404 40404 ## 5 36630 26230 77930 ## 6 54490 41290 61890 ## 7 12940 9140 61090 ## 8 8790 54580 22590 ## 9 58330 65730 60430 ## 10 35730 14630 14680 ## .. ... ... ...
select(park, -(Street.Code1:Street.Code3))
## Source: local data frame [91,003 x 40] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN ## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD ## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN ## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN ## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN ## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN ## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV ## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN ## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN ## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD ## .. ... ... ... ... ... ... ... ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Vehicle.Expiration.Date (int), ## Violation.Location (int), Violation.Precinct (int), Issuer.Precinct (int), Issuer.Code (int), ## Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), Time.First.Observed (chr), Violation.County ## (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number (chr), Street.Name (chr), Intersecting.Street ## (chr), Date.First.Observed (int), Law.Section (int), Sub.Division (chr), Violation.Legal.Code (chr), ## Days.Parking.In.Effect.... (chr), From.Hours.In.Effect (chr), To.Hours.In.Effect (chr), Vehicle.Color ## (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), Meter.Number (chr), Feet.From.Curb (int), ## Violation.Post.Code (chr), Violation.Description (chr), No.Standing.or.Stopping.Violation (lgl), ## Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
rename(park, Days.Parking.In.Effect = Days.Parking.In.Effect....)
## Source: local data frame [91,003 x 43] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN ## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD ## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN ## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN ## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN ## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN ## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV ## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN ## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN ## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD ## .. ... ... ... ... ... ... ... ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int), ## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int), ## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), ## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number ## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int), ## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect (chr), From.Hours.In.Effect (chr), ## To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), ## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr), ## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
select(park, 1:6) %>% arrange(Issue.Date, Registration.State, Plate.Type, Violation.Code)
## Source: local data frame [91,003 x 6] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code ## 1 1.365e+09 R595749 IL PAS 2000-02-20 20 ## 2 1.367e+09 72307MA NY COM 2000-04-05 14 ## 3 1.362e+09 63538JM NY COM 2000-11-06 47 ## 4 1.344e+09 PRK4769 GA PAS 2001-12-23 46 ## 5 1.363e+09 49784JG NY COM 2003-12-24 20 ## 6 1.366e+09 49715JG NY COM 2010-03-12 21 ## 7 1.361e+09 96171MA NY COM 2010-09-11 45 ## 8 1.356e+09 GEN6640 NY PAS 2010-10-10 20 ## 9 1.358e+09 DDF4985 NY PAS 2010-10-13 14 ## 10 1.356e+09 DXY6498 NY PAS 2011-11-02 14 ## .. ... ... ... ... ... ...
select(park, 2:5) %>% mutate(month = month(Issue.Date), day = day(Issue.Date), year = year(Issue.Date), wday = wday(Issue.Date, label=TRUE))
## Source: local data frame [91,003 x 8] ## ## Plate.ID Registration.State Plate.Type Issue.Date month day year wday ## 1 FXX9781 NY PAS 2014-02-20 2 20 2014 Thurs ## 2 FLZ6021 NY PAS 2013-08-12 8 12 2013 Mon ## 3 53902MB NY COM 2013-10-24 10 24 2013 Thurs ## 4 FYM2426 NY PAS 2013-09-16 9 16 2013 Mon ## 5 GPV3714 NY PAS 2014-06-10 6 10 2014 Tues ## 6 XBAV11 NJ PAS 2013-12-27 12 27 2013 Fri ## 7 59305JY NY COM 2013-09-18 9 18 2013 Wed ## 8 2146518 IN PAS 2013-11-22 11 22 2013 Fri ## 9 49138M2 MD PAS 2013-12-18 12 18 2013 Wed ## 10 GHD5283 NY PAS 2013-09-24 9 24 2013 Tues ## .. ... ... ... ... ... ... ... ...
transmute(park, month = month(Issue.Date), day = day(Issue.Date), year = year(Issue.Date), wday = wday(Issue.Date, label=TRUE))
## Source: local data frame [91,003 x 4] ## ## month day year wday ## 1 2 20 2014 Thurs ## 2 8 12 2013 Mon ## 3 10 24 2013 Thurs ## 4 9 16 2013 Mon ## 5 6 10 2014 Tues ## 6 12 27 2013 Fri ## 7 9 18 2013 Wed ## 8 11 22 2013 Fri ## 9 12 18 2013 Wed ## 10 9 24 2013 Tues ## .. ... ... ... ...
distinct(park)
## Source: local data frame [91,003 x 43] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type ## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN ## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD ## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN ## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN ## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN ## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN ## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV ## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN ## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN ## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD ## .. ... ... ... ... ... ... ... ## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int), ## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int), ## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), ## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number ## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int), ## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect ## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), ## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr), ## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
select(park, 1:6) %>% sample_n(10)
## Source: local data frame [10 x 6] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code ## 1 7.382e+09 S217629 IL PAS 2013-11-25 21 ## 2 7.213e+09 DME8369 NY PAS 2014-01-31 71 ## 3 7.789e+09 GGB4259 NY PAS 2014-05-24 38 ## 4 7.939e+09 58679JZ NY COM 2014-04-01 48 ## 5 7.306e+09 W35CRT NJ PAS 2013-08-21 37 ## 6 7.888e+09 67835JU NY COM 2013-09-18 31 ## 7 7.537e+09 DLV3548 NY PAS 2013-11-16 71 ## 8 7.422e+09 92979JE NY COM 2014-06-13 19 ## 9 7.685e+09 GCM1989 NY PAS 2013-12-20 16 ## 10 7.099e+09 GJX6784 NY PAS 2013-12-25 14
select(park, 1:6) %>% sample_frac(0.0001)
## Source: local data frame [9 x 6] ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code ## 1 5.072e+09 GBK7354 NY PAS 2013-11-20 7 ## 2 7.956e+09 EEH8346 NY PAS 2014-04-22 37 ## 3 7.422e+09 CXB2013 NY PAS 2013-12-24 21 ## 4 7.528e+09 G79DBS NJ PAS 2013-10-29 14 ## 5 7.149e+09 FRW5407 NY PAS 2013-09-21 70 ## 6 7.028e+09 93050JY NY COM 2013-09-10 19 ## 7 7.092e+09 AEW9877 NY PAS 2014-06-23 37 ## 8 1.360e+09 FEX6097 NY PAS 2013-12-04 20 ## 9 7.826e+09 54973MD NY COM 2013-10-16 69
summarize(park, n(), min(Issue.Date), max(Issue.Date))
## Source: local data frame [1 x 3] ## ## n() min(Issue.Date) max(Issue.Date) ## 1 91003 2000-02-20 2031-07-13
select(park, 1:6) %>% group_by(Registration.State)
## Source: local data frame [91,003 x 6] ## Groups: Registration.State ## ## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code ## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 ## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 ## 3 1.354e+09 53902MB NY COM 2013-10-24 14 ## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 ## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 ## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 ## 7 7.004e+09 59305JY NY COM 2013-09-18 38 ## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 ## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 ## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 ## .. ... ... ... ... ... ...
select(park, 1:6) %>% group_by(Registration.State) %>% summarize(n(), min(Issue.Date), max(Issue.Date))
## Source: local data frame [62 x 4] ## ## Registration.State n() min(Issue.Date) max(Issue.Date) ## 1 99 348 2013-01-14 2014-06-16 ## 2 AB 3 2013-09-13 2014-06-19 ## 3 AK 13 2013-08-08 2014-06-09 ## 4 AL 52 2013-07-29 2014-06-24 ## 5 AR 22 2013-08-02 2014-06-02 ## 6 AZ 250 2013-07-22 2014-06-24 ## 7 BC 5 2013-12-18 2014-06-23 ## 8 CA 193 2013-07-24 2014-06-25 ## 9 CO 52 2013-07-30 2014-06-21 ## 10 CT 1388 2013-07-19 2014-06-25 ## 11 DC 44 2013-08-02 2014-06-18 ## 12 DE 127 2013-06-11 2014-06-24 ## 13 DP 49 2013-08-02 2014-06-18 ## 14 FL 1152 2013-06-26 2014-06-25 ## 15 GA 293 2001-12-23 2014-06-24 ## 16 GV 10 2013-07-26 2014-05-23 ## 17 HI 3 2013-08-02 2014-04-24 ## 18 IA 68 2013-07-25 2014-06-13 ## 19 ID 69 2013-08-07 2014-06-19 ## 20 IL 318 2000-02-20 2014-06-25 ## 21 IN 508 2013-07-21 2014-06-25 ## 22 KS 19 2013-08-12 2014-06-18 ## 23 KY 37 2013-06-25 2014-06-19 ## 24 LA 22 2013-09-04 2014-06-07 ## 25 MA 780 2013-07-17 2014-06-25 ## 26 MD 518 2012-11-06 2014-06-25 ## 27 ME 185 2012-08-26 2014-06-23 ## 28 MI 171 2013-07-30 2014-06-24 ## 29 MN 148 2013-07-29 2014-06-24 ## 30 MO 26 2013-07-28 2014-06-09 ## 31 MS 37 2013-08-02 2014-06-23 ## 32 MT 7 2013-10-08 2014-06-05 ## 33 MX 1 2013-12-21 2013-12-21 ## 34 NB 4 2013-08-16 2014-02-25 ## 35 NC 481 2013-03-24 2016-02-17 ## 36 ND 5 2013-08-27 2014-06-06 ## 37 NE 16 2013-07-30 2014-06-04 ## 38 NH 104 2013-07-24 2029-12-29 ## 39 NJ 8674 2013-01-06 2017-05-21 ## 40 NM 27 2013-07-31 2014-06-20 ## 41 NS 1 2014-04-16 2014-04-16 ## 42 NV 21 2013-09-08 2014-05-22 ## 43 NY 70351 2000-04-05 2031-07-13 ## 44 OH 228 2013-07-23 2014-06-24 ## 45 OK 208 2013-07-12 2014-06-25 ## 46 ON 40 2013-07-19 2014-06-24 ## 47 OR 37 2013-07-30 2014-06-24 ## 48 PA 2263 2013-01-08 2015-05-17 ## 49 PR 2 2013-11-07 2014-02-27 ## 50 QB 49 2013-08-05 2014-06-20 ## 51 RI 123 2013-07-20 2014-06-25 ## 52 SC 196 2013-07-30 2014-06-24 ## 53 SD 5 2013-09-12 2014-04-22 ## 54 TN 165 2013-07-30 2014-06-23 ## 55 TX 251 2013-07-10 2014-06-24 ## 56 UT 9 2013-08-09 2014-04-07 ## 57 VA 628 2013-03-04 2014-12-19 ## 58 VT 79 2013-07-29 2014-06-20 ## 59 WA 56 2013-08-17 2014-06-09 ## 60 WI 41 2013-07-19 2014-06-25 ## 61 WV 19 2013-08-12 2014-06-06 ## 62 WY 2 2013-12-18 2014-04-10
select(park, 1:6) %>% filter(Plate.Type != 999) %>% group_by(Plate.Type, Violation.Code) %>% summarize(n = n(), n_states = n_distinct(Registration.State))
## Source: local data frame [690 x 4] ## Groups: Plate.Type ## ## Plate.Type Violation.Code n n_states ## 1 AGC 19 1 1 ## 2 AGR 17 1 1 ## 3 AGR 71 1 1 ## 4 AGR 82 1 1 ## 5 APP 14 4 2 ## 6 APP 19 2 2 ## 7 APP 31 1 1 ## 8 APP 38 2 1 ## 9 APP 40 1 1 ## 10 APP 46 4 2 ## .. ... ... . ...
Using either the small or large Parking Violation dataset, try to create the following data frames:
A geocoding data frame with just three columns: violation precinct, address (where address is house number and street name combined), and intersecting street. You should exclude any entry without an address. Also consider adding a 4th column that is an indicator variable for addresses without a house number.
paste
and ymd_hm
).
Above materials are derived in part from the following sources: