dplyr is based on the concepts of functions as verbs that manipulate data frames.
Single table functions / verbs:
filter()
: pick rows matching criteria
slice()
: pick rows using index(es)
select()
: pick columns by name
rename()
: rename specific columns
arrange()
: reorder rows
mutate()
: add new variables
transmute()
: create new data frame with variables
sample_n()
/ sample_frac()
: randomly sample rows
summarise()
: reduce variables to valuesFirst argument is a data frame
Subsequent arguments say what to do with data frame
Always return a data frame
Avoid modify in place
Nested:
f( g( h(x), z=1), y=1 )
Piped:
h(x) %>% g(z=1) %>% g(y=1)
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:dplyr':
##
## between, last
library(lubridate)
## Loading required package: methods
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:data.table':
##
## hour, mday, month, quarter, wday, week, yday, year
park = read.csv("/home/vis/cr173/Sta523/data/parking/NYParkingViolations_small.csv",
stringsAsFactors=FALSE) %>%
as.data.frame() %>%
tbl_df()
class(park)
## [1] "tbl_df" "tbl" "data.frame"
park$Issue.Date = mdy(park$Issue.Date)
park
## Source: local data frame [91,003 x 43]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN
## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD
## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN
## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN
## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN
## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN
## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV
## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN
## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN
## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD
## .. ... ... ... ... ... ... ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
filter(park, Issue.Date > "2013/09/01", Issue.Date < "2014/6/30")
## Source: local data frame [80,761 x 43]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN
## 2 1.354e+09 53902MB NY COM 2013-10-24 14 VAN
## 3 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN
## 4 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN
## 5 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN
## 6 7.004e+09 59305JY NY COM 2013-09-18 38 DELV
## 7 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN
## 8 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN
## 9 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD
## 10 7.142e+09 GBR2885 NY PAS 2013-11-15 20 SUBN
## .. ... ... ... ... ... ... ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
filter(park, Registration.State == "CA" | Registration.State == "AZ")
## Source: local data frame [443 x 43]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 8.001e+09 4GXP803 CA PAS 2014-06-13 21 4DSD
## 2 7.555e+09 6RXT702 CA PAS 2013-12-03 20 4DSD
## 3 7.502e+09 AE32447 AZ PAS 2013-09-04 78 DELV
## 4 7.215e+09 572WWB CA PAS 2013-08-01 21 4DSD
## 5 7.539e+09 AD56343 AZ PAS 2014-02-21 78 DELV
## 6 1.356e+09 AD55598 AZ PAS 2013-12-20 14 VAN
## 7 7.229e+09 6UXK262 CA PAS 2013-09-14 46 SUBN
## 8 7.042e+09 5UIT291 CA PAS 2014-04-24 14 4DSD
## 9 7.381e+09 AE85624 AZ PAS 2014-06-06 38 VAN
## 10 1.356e+09 AE97488 AZ PAS 2013-11-20 14 VAN
## .. ... ... ... ... ... ... ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
slice(park, 3:8)
## Source: local data frame [6 x 43]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 1.354e+09 53902MB NY COM 2013-10-24 14 VAN
## 2 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN
## 3 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN
## 4 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN
## 5 7.004e+09 59305JY NY COM 2013-09-18 38 DELV
## 6 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
slice(park, (n()-5):n())
## Source: local data frame [6 x 43]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 7.293e+09 69752MD NY COM 2013-10-18 21 VAN
## 2 7.099e+09 EUL4171 NY PAS 2013-10-18 21 4DSD
## 3 7.225e+09 FYW4417 NY PAS 2014-05-20 46 SUBN
## 4 7.254e+09 57224MA NY COM 2013-09-17 82 VAN
## 5 7.278e+09 PRU5820 GA PAS 2014-02-26 31 4DSD
## 6 7.857e+09 93951JX NY COM 2013-08-30 69 VAN
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
select(park, contains("street"))
## Source: local data frame [91,003 x 5]
##
## Street.Code1 Street.Code2 Street.Code3 Street.Name Intersecting.Street
## 1 60810 20390 20490 ROCKAWAY BLVD
## 2 28990 14810 14890 Austin St
## 3 10810 34770 34790 8 AVE
## 4 0 40404 40404 GREENPORT AVE
## 5 36630 26230 77930 E 40 ST
## 6 54490 41290 61890 MADISON ST
## 7 12940 9140 61090 62nd St
## 8 8790 54580 22590 37 AVE
## 9 58330 65730 60430 Manhattan Ave
## 10 35730 14630 14680 E 21st St
## .. ... ... ... ... ...
select(park, Street.Code1:Street.Code3)
## Source: local data frame [91,003 x 3]
##
## Street.Code1 Street.Code2 Street.Code3
## 1 60810 20390 20490
## 2 28990 14810 14890
## 3 10810 34770 34790
## 4 0 40404 40404
## 5 36630 26230 77930
## 6 54490 41290 61890
## 7 12940 9140 61090
## 8 8790 54580 22590
## 9 58330 65730 60430
## 10 35730 14630 14680
## .. ... ... ...
select(park, -(Street.Code1:Street.Code3))
## Source: local data frame [91,003 x 40]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN
## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD
## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN
## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN
## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN
## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN
## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV
## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN
## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN
## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD
## .. ... ... ... ... ... ... ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Vehicle.Expiration.Date (int),
## Violation.Location (int), Violation.Precinct (int), Issuer.Precinct (int), Issuer.Code (int),
## Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr), Time.First.Observed (chr), Violation.County
## (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number (chr), Street.Name (chr), Intersecting.Street
## (chr), Date.First.Observed (int), Law.Section (int), Sub.Division (chr), Violation.Legal.Code (chr),
## Days.Parking.In.Effect.... (chr), From.Hours.In.Effect (chr), To.Hours.In.Effect (chr), Vehicle.Color
## (chr), Unregistered.Vehicle. (int), Vehicle.Year (int), Meter.Number (chr), Feet.From.Curb (int),
## Violation.Post.Code (chr), Violation.Description (chr), No.Standing.or.Stopping.Violation (lgl),
## Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
rename(park, Days.Parking.In.Effect = Days.Parking.In.Effect....)
## Source: local data frame [91,003 x 43]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN
## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD
## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN
## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN
## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN
## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN
## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV
## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN
## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN
## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD
## .. ... ... ... ... ... ... ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect (chr), From.Hours.In.Effect (chr),
## To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
select(park, 1:6) %>% arrange(Issue.Date, Registration.State, Plate.Type, Violation.Code)
## Source: local data frame [91,003 x 6]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1 1.365e+09 R595749 IL PAS 2000-02-20 20
## 2 1.367e+09 72307MA NY COM 2000-04-05 14
## 3 1.362e+09 63538JM NY COM 2000-11-06 47
## 4 1.344e+09 PRK4769 GA PAS 2001-12-23 46
## 5 1.363e+09 49784JG NY COM 2003-12-24 20
## 6 1.366e+09 49715JG NY COM 2010-03-12 21
## 7 1.361e+09 96171MA NY COM 2010-09-11 45
## 8 1.356e+09 GEN6640 NY PAS 2010-10-10 20
## 9 1.358e+09 DDF4985 NY PAS 2010-10-13 14
## 10 1.356e+09 DXY6498 NY PAS 2011-11-02 14
## .. ... ... ... ... ... ...
select(park, 2:5) %>% mutate(month = month(Issue.Date),
day = day(Issue.Date),
year = year(Issue.Date),
wday = wday(Issue.Date, label=TRUE))
## Source: local data frame [91,003 x 8]
##
## Plate.ID Registration.State Plate.Type Issue.Date month day year wday
## 1 FXX9781 NY PAS 2014-02-20 2 20 2014 Thurs
## 2 FLZ6021 NY PAS 2013-08-12 8 12 2013 Mon
## 3 53902MB NY COM 2013-10-24 10 24 2013 Thurs
## 4 FYM2426 NY PAS 2013-09-16 9 16 2013 Mon
## 5 GPV3714 NY PAS 2014-06-10 6 10 2014 Tues
## 6 XBAV11 NJ PAS 2013-12-27 12 27 2013 Fri
## 7 59305JY NY COM 2013-09-18 9 18 2013 Wed
## 8 2146518 IN PAS 2013-11-22 11 22 2013 Fri
## 9 49138M2 MD PAS 2013-12-18 12 18 2013 Wed
## 10 GHD5283 NY PAS 2013-09-24 9 24 2013 Tues
## .. ... ... ... ... ... ... ... ...
transmute(park,
month = month(Issue.Date),
day = day(Issue.Date),
year = year(Issue.Date),
wday = wday(Issue.Date, label=TRUE))
## Source: local data frame [91,003 x 4]
##
## month day year wday
## 1 2 20 2014 Thurs
## 2 8 12 2013 Mon
## 3 10 24 2013 Thurs
## 4 9 16 2013 Mon
## 5 6 10 2014 Tues
## 6 12 27 2013 Fri
## 7 9 18 2013 Wed
## 8 11 22 2013 Fri
## 9 12 18 2013 Wed
## 10 9 24 2013 Tues
## .. ... ... ... ...
distinct(park)
## Source: local data frame [91,003 x 43]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code Vehicle.Body.Type
## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20 SUBN
## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37 4DSD
## 3 1.354e+09 53902MB NY COM 2013-10-24 14 VAN
## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21 SDN
## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71 SUBN
## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78 VAN
## 7 7.004e+09 59305JY NY COM 2013-09-18 38 DELV
## 8 1.362e+09 2146518 IN PAS 2013-11-22 41 VAN
## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38 SUBN
## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46 4DSD
## .. ... ... ... ... ... ... ...
## Variables not shown: Vehicle.Make (chr), Issuing.Agency (chr), Street.Code1 (int), Street.Code2 (int),
## Street.Code3 (int), Vehicle.Expiration.Date (int), Violation.Location (int), Violation.Precinct (int),
## Issuer.Precinct (int), Issuer.Code (int), Issuer.Command (chr), Issuer.Squad (chr), Violation.Time (chr),
## Time.First.Observed (chr), Violation.County (chr), Violation.In.Front.Of.Or.Opposite (chr), House.Number
## (chr), Street.Name (chr), Intersecting.Street (chr), Date.First.Observed (int), Law.Section (int),
## Sub.Division (chr), Violation.Legal.Code (chr), Days.Parking.In.Effect.... (chr), From.Hours.In.Effect
## (chr), To.Hours.In.Effect (chr), Vehicle.Color (chr), Unregistered.Vehicle. (int), Vehicle.Year (int),
## Meter.Number (chr), Feet.From.Curb (int), Violation.Post.Code (chr), Violation.Description (chr),
## No.Standing.or.Stopping.Violation (lgl), Hydrant.Violation (lgl), Double.Parking.Violation (lgl)
select(park, 1:6) %>% sample_n(10)
## Source: local data frame [10 x 6]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1 8.003e+09 XS713J NJ PAS 2014-05-22 40
## 2 7.577e+09 358873 DE PAS 2014-03-31 38
## 3 1.358e+09 T497253C NY SRF 2013-10-30 20
## 4 7.094e+09 GJN3390 NY PAS 2014-03-11 71
## 5 7.578e+09 FFY7197 NY PAS 2014-05-03 71
## 6 7.800e+09 GCR5402 NY PAS 2013-09-18 38
## 7 7.890e+09 21946MC NY COM 2013-09-13 14
## 8 7.208e+09 61768MD NY COM 2013-11-13 37
## 9 7.099e+09 FRY2659 NY PAS 2013-10-20 74
## 10 7.856e+09 OL9734H NJ PAS 2013-07-29 14
select(park, 1:6) %>% sample_frac(0.0001)
## Source: local data frame [9 x 6]
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1 1.365e+09 GCX1342 NY PAS 2014-03-31 20
## 2 7.952e+09 AH679W NJ PAS 2014-05-16 38
## 3 7.958e+09 EWS4764 NY PAS 2014-06-11 20
## 4 7.653e+09 FGD4132 NY PAS 2013-11-14 37
## 5 7.611e+09 3T28J NY OMT 2014-04-17 23
## 6 7.553e+09 KK6H4K MO PAS 2014-02-21 38
## 7 7.751e+09 DAF8479 NY PAS 2013-12-21 20
## 8 7.620e+09 ESS7643 NY PAS 2014-05-13 38
## 9 7.731e+09 GFM7968 NY PAS 2014-01-17 16
summarize(park, n(), min(Issue.Date), max(Issue.Date))
## Source: local data frame [1 x 3]
##
## n() min(Issue.Date) max(Issue.Date)
## 1 91003 2000-02-20 2031-07-13
select(park, 1:6) %>% group_by(Registration.State)
## Source: local data frame [91,003 x 6]
## Groups: Registration.State
##
## Summons.Number Plate.ID Registration.State Plate.Type Issue.Date Violation.Code
## 1 1.359e+09 FXX9781 NY PAS 2014-02-20 20
## 2 7.486e+09 FLZ6021 NY PAS 2013-08-12 37
## 3 1.354e+09 53902MB NY COM 2013-10-24 14
## 4 1.342e+09 FYM2426 NY PAS 2013-09-16 21
## 5 1.372e+09 GPV3714 NY PAS 2014-06-10 71
## 6 1.362e+09 XBAV11 NJ PAS 2013-12-27 78
## 7 7.004e+09 59305JY NY COM 2013-09-18 38
## 8 1.362e+09 2146518 IN PAS 2013-11-22 41
## 9 7.541e+09 49138M2 MD PAS 2013-12-18 38
## 10 7.311e+09 GHD5283 NY PAS 2013-09-24 46
## .. ... ... ... ... ... ...
select(park, 1:6) %>%
group_by(Registration.State) %>%
summarize(n(), min(Issue.Date), max(Issue.Date))
## Source: local data frame [62 x 4]
##
## Registration.State n() min(Issue.Date) max(Issue.Date)
## 1 99 348 2013-01-14 2014-06-16
## 2 AB 3 2013-09-13 2014-06-19
## 3 AK 13 2013-08-08 2014-06-09
## 4 AL 52 2013-07-29 2014-06-24
## 5 AR 22 2013-08-02 2014-06-02
## 6 AZ 250 2013-07-22 2014-06-24
## 7 BC 5 2013-12-18 2014-06-23
## 8 CA 193 2013-07-24 2014-06-25
## 9 CO 52 2013-07-30 2014-06-21
## 10 CT 1388 2013-07-19 2014-06-25
## 11 DC 44 2013-08-02 2014-06-18
## 12 DE 127 2013-06-11 2014-06-24
## 13 DP 49 2013-08-02 2014-06-18
## 14 FL 1152 2013-06-26 2014-06-25
## 15 GA 293 2001-12-23 2014-06-24
## 16 GV 10 2013-07-26 2014-05-23
## 17 HI 3 2013-08-02 2014-04-24
## 18 IA 68 2013-07-25 2014-06-13
## 19 ID 69 2013-08-07 2014-06-19
## 20 IL 318 2000-02-20 2014-06-25
## 21 IN 508 2013-07-21 2014-06-25
## 22 KS 19 2013-08-12 2014-06-18
## 23 KY 37 2013-06-25 2014-06-19
## 24 LA 22 2013-09-04 2014-06-07
## 25 MA 780 2013-07-17 2014-06-25
## 26 MD 518 2012-11-06 2014-06-25
## 27 ME 185 2012-08-26 2014-06-23
## 28 MI 171 2013-07-30 2014-06-24
## 29 MN 148 2013-07-29 2014-06-24
## 30 MO 26 2013-07-28 2014-06-09
## 31 MS 37 2013-08-02 2014-06-23
## 32 MT 7 2013-10-08 2014-06-05
## 33 MX 1 2013-12-21 2013-12-21
## 34 NB 4 2013-08-16 2014-02-25
## 35 NC 481 2013-03-24 2016-02-17
## 36 ND 5 2013-08-27 2014-06-06
## 37 NE 16 2013-07-30 2014-06-04
## 38 NH 104 2013-07-24 2029-12-29
## 39 NJ 8674 2013-01-06 2017-05-21
## 40 NM 27 2013-07-31 2014-06-20
## 41 NS 1 2014-04-16 2014-04-16
## 42 NV 21 2013-09-08 2014-05-22
## 43 NY 70351 2000-04-05 2031-07-13
## 44 OH 228 2013-07-23 2014-06-24
## 45 OK 208 2013-07-12 2014-06-25
## 46 ON 40 2013-07-19 2014-06-24
## 47 OR 37 2013-07-30 2014-06-24
## 48 PA 2263 2013-01-08 2015-05-17
## 49 PR 2 2013-11-07 2014-02-27
## 50 QB 49 2013-08-05 2014-06-20
## 51 RI 123 2013-07-20 2014-06-25
## 52 SC 196 2013-07-30 2014-06-24
## 53 SD 5 2013-09-12 2014-04-22
## 54 TN 165 2013-07-30 2014-06-23
## 55 TX 251 2013-07-10 2014-06-24
## 56 UT 9 2013-08-09 2014-04-07
## 57 VA 628 2013-03-04 2014-12-19
## 58 VT 79 2013-07-29 2014-06-20
## 59 WA 56 2013-08-17 2014-06-09
## 60 WI 41 2013-07-19 2014-06-25
## 61 WV 19 2013-08-12 2014-06-06
## 62 WY 2 2013-12-18 2014-04-10
select(park, 1:6) %>%
filter(Plate.Type != 999) %>%
group_by(Plate.Type, Violation.Code) %>%
summarize(n = n(), n_states = n_distinct(Registration.State))
## Source: local data frame [690 x 4]
## Groups: Plate.Type
##
## Plate.Type Violation.Code n n_states
## 1 AGC 19 1 1
## 2 AGR 17 1 1
## 3 AGR 71 1 1
## 4 AGR 82 1 1
## 5 APP 14 4 2
## 6 APP 19 2 2
## 7 APP 31 1 1
## 8 APP 38 2 1
## 9 APP 40 1 1
## 10 APP 46 4 2
## .. ... ... . ...
Using either the small or large Parking Violation dataset, try to create the following data frames:
A geocoding data frame with just three columns: violation precinct, address (where address is house number and street name combined), and intersecting street. You should exclude any entry without an address. Also consider adding a 4th column that is an indicator variable for addresses without a house number.
paste
and ymd_hm
).
Above materials are derived in part from the following sources: