dplyr is based on the concepts of functions as verbs that manipulate data frames.
Single data frame functions / verbs:
tbl_df()
: add the tbl_df
classfilter()
: pick rows matching criteriaslice()
: pick rows using index(es)select()
: pick columns by namerename()
: rename specific columnsarrange()
: reorder rowsmutate()
: add new variablestransmute()
: create new data frame with variablesdistinct()
: filter for unique rowssample_n()
/ sample_frac()
: randomly sample rowssummarise()
: reduce variables to valuesFirst argument is a data frame
Subsequent arguments say what to do with data frame
Always return a data frame
Don’t modify in place
We will demonstrate dplyr’s functionality using the nycflights13 data.
suppressMessages(library(dplyr))
library(nycflights13)
flights = flights %>% tbl_df()
str(flights)
## Classes 'tbl_df', 'tbl' and 'data.frame': 336776 obs. of 16 variables:
## $ year : int 2013 2013 2013 2013 2013 2013 2013 2013 2013 2013 ...
## $ month : int 1 1 1 1 1 1 1 1 1 1 ...
## $ day : int 1 1 1 1 1 1 1 1 1 1 ...
## $ dep_time : int 517 533 542 544 554 554 555 557 557 558 ...
## $ dep_delay: num 2 4 2 -1 -6 -4 -5 -3 -3 -2 ...
## $ arr_time : int 830 850 923 1004 812 740 913 709 838 753 ...
## $ arr_delay: num 11 20 33 -18 -25 12 19 -14 -8 8 ...
## $ carrier : chr "UA" "UA" "AA" "B6" ...
## $ tailnum : chr "N14228" "N24211" "N619AA" "N804JB" ...
## $ flight : int 1545 1714 1141 725 461 1696 507 5708 79 301 ...
## $ origin : chr "EWR" "LGA" "JFK" "JFK" ...
## $ dest : chr "IAH" "IAH" "MIA" "BQN" ...
## $ air_time : num 227 227 160 183 116 150 158 53 140 138 ...
## $ distance : num 1400 1416 1089 1576 762 ...
## $ hour : num 5 5 5 5 5 5 5 5 5 5 ...
## $ minute : num 17 33 42 44 54 54 55 57 57 58 ...
flights
## Source: local data frame [336,776 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227
## 2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227
## 3 2013 1 1 542 2 923 33 AA N619AA 1141 JFK MIA 160
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183
## 5 2013 1 1 554 -6 812 -25 DL N668DN 461 LGA ATL 116
## 6 2013 1 1 554 -4 740 12 UA N39463 1696 EWR ORD 150
## 7 2013 1 1 555 -5 913 19 B6 N516JB 507 EWR FLL 158
## 8 2013 1 1 557 -3 709 -14 EV N829AS 5708 LGA IAD 53
## 9 2013 1 1 557 -3 838 -8 B6 N593JB 79 JFK MCO 140
## 10 2013 1 1 558 -2 753 8 AA N3ALAA 301 LGA ORD 138
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% filter(month == 3)
## Source: local data frame [28,834 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 3 1 4 125 318 142 B6 N706JB 11 JFK FLL 166
## 2 2013 3 1 50 52 526 48 B6 N794JB 707 JFK SJU 198
## 3 2013 3 1 117 152 223 149 B6 N328JB 608 JFK PWM 48
## 4 2013 3 1 454 -6 633 -15 US N177US 1117 EWR CLT 79
## 5 2013 3 1 505 -10 746 -24 UA N527UA 475 EWR IAH 199
## 6 2013 3 1 521 -9 813 -14 UA N76523 1714 LGA IAH 213
## 7 2013 3 1 537 -3 856 6 AA N5EPAA 1141 JFK MIA 173
## 8 2013 3 1 541 -4 1014 -9 B6 N653JB 725 JFK BQN 191
## 9 2013 3 1 549 -11 639 -24 US N749US 2114 LGA BOS 31
## 10 2013 3 1 550 -10 747 -14 EV N760EV 4911 EWR DTW 89
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% filter(month == 3, day <= 7)
## Source: local data frame [6,530 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 3 1 4 125 318 142 B6 N706JB 11 JFK FLL 166
## 2 2013 3 1 50 52 526 48 B6 N794JB 707 JFK SJU 198
## 3 2013 3 1 117 152 223 149 B6 N328JB 608 JFK PWM 48
## 4 2013 3 1 454 -6 633 -15 US N177US 1117 EWR CLT 79
## 5 2013 3 1 505 -10 746 -24 UA N527UA 475 EWR IAH 199
## 6 2013 3 1 521 -9 813 -14 UA N76523 1714 LGA IAH 213
## 7 2013 3 1 537 -3 856 6 AA N5EPAA 1141 JFK MIA 173
## 8 2013 3 1 541 -4 1014 -9 B6 N653JB 725 JFK BQN 191
## 9 2013 3 1 549 -11 639 -24 US N749US 2114 LGA BOS 31
## 10 2013 3 1 550 -10 747 -14 EV N760EV 4911 EWR DTW 89
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% filter(dest == "LAX" | dest == "RDU", month==3)
## Source: local data frame [1,935 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 3 1 607 -3 832 -53 UA N557UA 797 JFK LAX 292
## 2 2013 3 1 608 -7 737 -13 MQ N723MQ 4518 LGA RDU 64
## 3 2013 3 1 623 -7 753 -17 EV N14959 5667 EWR RDU 64
## 4 2013 3 1 629 -3 844 -68 UA N77518 1702 EWR LAX 294
## 5 2013 3 1 657 -3 953 -41 DL N712TW 763 JFK LAX 326
## 6 2013 3 1 714 -1 939 -58 B6 N804JB 671 JFK LAX 297
## 7 2013 3 1 716 6 958 -37 VX N636VA 399 JFK LAX 315
## 8 2013 3 1 727 -3 1007 -53 AA N319AA 33 JFK LAX 315
## 9 2013 3 1 803 -7 923 -32 MQ N853MQ 4406 JFK RDU 67
## 10 2013 3 1 823 -1 954 -20 EV N22909 4548 EWR RDU 66
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% slice(1:10)
## Source: local data frame [10 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227
## 2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227
## 3 2013 1 1 542 2 923 33 AA N619AA 1141 JFK MIA 160
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183
## 5 2013 1 1 554 -6 812 -25 DL N668DN 461 LGA ATL 116
## 6 2013 1 1 554 -4 740 12 UA N39463 1696 EWR ORD 150
## 7 2013 1 1 555 -5 913 19 B6 N516JB 507 EWR FLL 158
## 8 2013 1 1 557 -3 709 -14 EV N829AS 5708 LGA IAD 53
## 9 2013 1 1 557 -3 838 -8 B6 N593JB 79 JFK MCO 140
## 10 2013 1 1 558 -2 753 8 AA N3ALAA 301 LGA ORD 138
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% slice((n()-5):n())
## Source: local data frame [6 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 9 30 NA NA NA NA EV N740EV 5274 LGA BNA NA
## 2 2013 9 30 NA NA NA NA 9E 3393 JFK DCA NA
## 3 2013 9 30 NA NA NA NA 9E 3525 LGA SYR NA
## 4 2013 9 30 NA NA NA NA MQ N535MQ 3461 LGA BNA NA
## 5 2013 9 30 NA NA NA NA MQ N511MQ 3572 LGA CLE NA
## 6 2013 9 30 NA NA NA NA MQ N839MQ 3531 LGA RDU NA
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% select(year, month, day)
## Source: local data frame [336,776 x 3]
##
## year month day
## (int) (int) (int)
## 1 2013 1 1
## 2 2013 1 1
## 3 2013 1 1
## 4 2013 1 1
## 5 2013 1 1
## 6 2013 1 1
## 7 2013 1 1
## 8 2013 1 1
## 9 2013 1 1
## 10 2013 1 1
## .. ... ... ...
flights %>% select(-year, -month, -day)
## Source: local data frame [336,776 x 13]
##
## dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
## (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl) (dbl) (dbl) (dbl)
## 1 517 2 830 11 UA N14228 1545 EWR IAH 227 1400 5 17
## 2 533 4 850 20 UA N24211 1714 LGA IAH 227 1416 5 33
## 3 542 2 923 33 AA N619AA 1141 JFK MIA 160 1089 5 42
## 4 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183 1576 5 44
## 5 554 -6 812 -25 DL N668DN 461 LGA ATL 116 762 5 54
## 6 554 -4 740 12 UA N39463 1696 EWR ORD 150 719 5 54
## 7 555 -5 913 19 B6 N516JB 507 EWR FLL 158 1065 5 55
## 8 557 -3 709 -14 EV N829AS 5708 LGA IAD 53 229 5 57
## 9 557 -3 838 -8 B6 N593JB 79 JFK MCO 140 944 5 57
## 10 558 -2 753 8 AA N3ALAA 301 LGA ORD 138 733 5 58
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
flights %>% select(year:day)
## Source: local data frame [336,776 x 3]
##
## year month day
## (int) (int) (int)
## 1 2013 1 1
## 2 2013 1 1
## 3 2013 1 1
## 4 2013 1 1
## 5 2013 1 1
## 6 2013 1 1
## 7 2013 1 1
## 8 2013 1 1
## 9 2013 1 1
## 10 2013 1 1
## .. ... ... ...
flights %>% select(-(year:day))
## Source: local data frame [336,776 x 13]
##
## dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time distance hour minute
## (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl) (dbl) (dbl) (dbl)
## 1 517 2 830 11 UA N14228 1545 EWR IAH 227 1400 5 17
## 2 533 4 850 20 UA N24211 1714 LGA IAH 227 1416 5 33
## 3 542 2 923 33 AA N619AA 1141 JFK MIA 160 1089 5 42
## 4 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183 1576 5 44
## 5 554 -6 812 -25 DL N668DN 461 LGA ATL 116 762 5 54
## 6 554 -4 740 12 UA N39463 1696 EWR ORD 150 719 5 54
## 7 555 -5 913 19 B6 N516JB 507 EWR FLL 158 1065 5 55
## 8 557 -3 709 -14 EV N829AS 5708 LGA IAD 53 229 5 57
## 9 557 -3 838 -8 B6 N593JB 79 JFK MCO 140 944 5 57
## 10 558 -2 753 8 AA N3ALAA 301 LGA ORD 138 733 5 58
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
flights %>% select(contains("dep"),
contains("arr"))
## Source: local data frame [336,776 x 5]
##
## dep_time dep_delay arr_time arr_delay carrier
## (int) (dbl) (int) (dbl) (chr)
## 1 517 2 830 11 UA
## 2 533 4 850 20 UA
## 3 542 2 923 33 AA
## 4 544 -1 1004 -18 B6
## 5 554 -6 812 -25 DL
## 6 554 -4 740 12 UA
## 7 555 -5 913 19 B6
## 8 557 -3 709 -14 EV
## 9 557 -3 838 -8 B6
## 10 558 -2 753 8 AA
## .. ... ... ... ... ...
flights %>% select(contains("dep"),
starts_with("arr"))
## Source: local data frame [336,776 x 4]
##
## dep_time dep_delay arr_time arr_delay
## (int) (dbl) (int) (dbl)
## 1 517 2 830 11
## 2 533 4 850 20
## 3 542 2 923 33
## 4 544 -1 1004 -18
## 5 554 -6 812 -25
## 6 554 -4 740 12
## 7 555 -5 913 19
## 8 557 -3 709 -14
## 9 557 -3 838 -8
## 10 558 -2 753 8
## .. ... ... ... ...
flights %>% rename(tail_number = tailnum)
## Source: local data frame [336,776 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tail_number flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227
## 2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227
## 3 2013 1 1 542 2 923 33 AA N619AA 1141 JFK MIA 160
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183
## 5 2013 1 1 554 -6 812 -25 DL N668DN 461 LGA ATL 116
## 6 2013 1 1 554 -4 740 12 UA N39463 1696 EWR ORD 150
## 7 2013 1 1 555 -5 913 19 B6 N516JB 507 EWR FLL 158
## 8 2013 1 1 557 -3 709 -14 EV N829AS 5708 LGA IAD 53
## 9 2013 1 1 557 -3 838 -8 B6 N593JB 79 JFK MCO 140
## 10 2013 1 1 558 -2 753 8 AA N3ALAA 301 LGA ORD 138
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% filter(month==3,day==2) %>% arrange(origin, dest)
## Source: local data frame [765 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 3 2 1336 7 1426 -6 EV N11535 4263 EWR ALB 32
## 2 2013 3 2 628 -1 837 -12 DL N357NB 575 EWR ATL 109
## 3 2013 3 2 637 -3 903 -12 EV N16919 4209 EWR ATL 120
## 4 2013 3 2 743 -2 945 -25 DL N338NB 807 EWR ATL 106
## 5 2013 3 2 857 -3 1117 -9 DL N306DQ 485 EWR ATL 110
## 6 2013 3 2 1027 -3 1234 -13 DL N301DQ 2343 EWR ATL 109
## 7 2013 3 2 1134 -11 1332 -27 DL N301NB 401 EWR ATL 102
## 8 2013 3 2 1412 -3 1636 6 DL N368NB 935 EWR ATL 108
## 9 2013 3 2 1633 -3 1848 -20 EV N16183 3273 EWR ATL 108
## 10 2013 3 2 1655 -5 1857 -27 DL N305DQ 2042 EWR ATL 103
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% filter(month==3,day==2) %>% arrange(desc(origin), dest)
## Source: local data frame [765 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 3 2 555 -5 822 7 FL N928AT 345 LGA ATL 113
## 2 2013 3 2 556 -4 822 -7 DL N623DL 461 LGA ATL 110
## 3 2013 3 2 758 -2 1009 -22 DL N680DA 2047 LGA ATL 105
## 4 2013 3 2 808 -2 1022 -21 FL N996AT 361 LGA ATL 107
## 5 2013 3 2 828 -7 1050 -15 MQ N510MQ 4610 LGA ATL 110
## 6 2013 3 2 857 -3 1126 -16 DL N663DN 1747 LGA ATL 110
## 7 2013 3 2 957 -3 1224 -10 DL N942DL 1847 LGA ATL 105
## 8 2013 3 2 1158 -2 1411 -19 MQ N511MQ 4658 LGA ATL 105
## 9 2013 3 2 1159 -1 1429 -9 DL N910DE 1947 LGA ATL 107
## 10 2013 3 2 1259 -1 1510 -25 DL N902DE 781 LGA ATL 106
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
library(lubridate)
## Loading required package: methods
flights %>% select(1:3) %>% mutate(date = paste(month,day,year,sep="/") %>% mdy())
## Source: local data frame [336,776 x 4]
##
## year month day date
## (int) (int) (int) (time)
## 1 2013 1 1 2013-01-01
## 2 2013 1 1 2013-01-01
## 3 2013 1 1 2013-01-01
## 4 2013 1 1 2013-01-01
## 5 2013 1 1 2013-01-01
## 6 2013 1 1 2013-01-01
## 7 2013 1 1 2013-01-01
## 8 2013 1 1 2013-01-01
## 9 2013 1 1 2013-01-01
## 10 2013 1 1 2013-01-01
## .. ... ... ... ...
flights %>% select(1:3) %>% transmute(date = paste(month,day,year,sep="/") %>% mdy())
## Source: local data frame [336,776 x 1]
##
## date
## (time)
## 1 2013-01-01
## 2 2013-01-01
## 3 2013-01-01
## 4 2013-01-01
## 5 2013-01-01
## 6 2013-01-01
## 7 2013-01-01
## 8 2013-01-01
## 9 2013-01-01
## 10 2013-01-01
## .. ...
flights %>% select(origin, dest) %>% distinct() %>% arrange(origin,dest)
## Source: local data frame [224 x 2]
##
## origin dest
## (chr) (chr)
## 1 EWR ALB
## 2 EWR ANC
## 3 EWR ATL
## 4 EWR AUS
## 5 EWR AVL
## 6 EWR BDL
## 7 EWR BNA
## 8 EWR BOS
## 9 EWR BQN
## 10 EWR BTV
## .. ... ...
flights %>% sample_n(10)
## Source: local data frame [10 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 12 12 1936 31 2123 23 9E N292PQ 2931 JFK PIT 74
## 2 2013 3 12 2233 93 2331 71 MQ N521MQ 3744 EWR ORD 102
## 3 2013 11 6 1529 -1 1905 4 DL N187DN 417 JFK LAX 354
## 4 2013 11 18 1953 -7 2111 -6 9E N8960A 2950 JFK BWI 44
## 5 2013 8 1 818 -3 926 -12 UA N17730 1199 EWR BOS 40
## 6 2013 6 25 2058 90 2313 102 EV N12569 4480 EWR CLT 82
## 7 2013 8 30 813 -2 1058 -23 UA N825UA 504 EWR PDX 327
## 8 2013 1 29 1453 -7 1633 -22 MQ N725MQ 4429 LGA CMH 79
## 9 2013 5 23 1816 241 1941 226 AA N4WNAA 337 LGA ORD 116
## 10 2013 8 31 1054 -1 1433 -12 B6 N644JB 403 JFK SJU 198
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% sample_frac(0.001)
## Source: local data frame [337 x 16]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 9 1 822 -7 1108 -31 AA N3KNAA 1938 LGA MIA 146
## 2 2013 2 7 2228 143 108 117 UA N826UA 754 EWR TPA 147
## 3 2013 8 1 1111 -14 1205 -26 B6 N236JB 2380 EWR BOS 36
## 4 2013 8 3 1701 2 1834 2 UA N71411 1190 EWR ORD 127
## 5 2013 2 28 738 33 953 13 MQ N737MQ 4534 LGA XNA 179
## 6 2013 10 12 1706 -9 1935 -46 UA N461UA 387 EWR PDX 309
## 7 2013 5 17 1902 -3 2200 -24 B6 N635JB 111 JFK FLL 140
## 8 2013 5 27 1506 -4 1742 -33 UA N451UA 490 EWR AUS 198
## 9 2013 8 21 1107 -8 1341 -23 UA N512UA 703 JFK LAX 313
## 10 2013 10 20 1802 2 2040 10 DL N342NW 926 EWR ATL 112
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% mutate(date = paste(month,day,year,sep="/") %>% mdy()) %>%
summarize(n(), min(date), max(date))
## Source: local data frame [1 x 3]
##
## n() min(date) max(date)
## (int) (time) (time)
## 1 336776 2013-01-01 2013-12-31
flights %>% group_by(origin)
## Source: local data frame [336,776 x 16]
## Groups: origin [3]
##
## year month day dep_time dep_delay arr_time arr_delay carrier tailnum flight origin dest air_time
## (int) (int) (int) (int) (dbl) (int) (dbl) (chr) (chr) (int) (chr) (chr) (dbl)
## 1 2013 1 1 517 2 830 11 UA N14228 1545 EWR IAH 227
## 2 2013 1 1 533 4 850 20 UA N24211 1714 LGA IAH 227
## 3 2013 1 1 542 2 923 33 AA N619AA 1141 JFK MIA 160
## 4 2013 1 1 544 -1 1004 -18 B6 N804JB 725 JFK BQN 183
## 5 2013 1 1 554 -6 812 -25 DL N668DN 461 LGA ATL 116
## 6 2013 1 1 554 -4 740 12 UA N39463 1696 EWR ORD 150
## 7 2013 1 1 555 -5 913 19 B6 N516JB 507 EWR FLL 158
## 8 2013 1 1 557 -3 709 -14 EV N829AS 5708 LGA IAD 53
## 9 2013 1 1 557 -3 838 -8 B6 N593JB 79 JFK MCO 140
## 10 2013 1 1 558 -2 753 8 AA N3ALAA 301 LGA ORD 138
## .. ... ... ... ... ... ... ... ... ... ... ... ... ...
## Variables not shown: distance (dbl), hour (dbl), minute (dbl)
flights %>% group_by(origin) %>%
mutate(date = paste(month,day,year,sep="/") %>% mdy()) %>%
summarize(n(), min(date), max(date))
## Source: local data frame [3 x 4]
##
## origin n() min(date) max(date)
## (chr) (int) (time) (time)
## 1 EWR 120835 2013-01-01 2013-12-31
## 2 JFK 111279 2013-01-01 2013-12-31
## 3 LGA 104662 2013-01-01 2013-12-31
Which plane (check the tail number) flew out of each New York airport the most?
What was the shortest flight out of each airport in terms of distance? In terms of duration?
How many flights to Los Angeles (LAX) did each of the legacy carriers (AA, UA, DL or US) have in May from JFK, and what was their average duration?
Which date should you fly on if you want to have the lowest possible average depature delay? What about arrival delay?
Create a time series plot of each of the legacy carriers’ average departure delay by day and origin airport.