library(DBI)
library(dbplyr)
library(dplyr)
library(nycflights13)
library(Lahman)
Create an in-memory database and add airports
from nycflights13
to the database.
con <- DBI::dbConnect(RSQLite::SQLite(), dbname = ":memory:")
copy_to(con, df = nycflights13::airports, "airports")
airports_db <- tbl(con, "airports")
Also, add some baseball data from the Lahman
package to our in-memory database.
dbWriteTable(con, "batting", Batting)
dbWriteTable(con, "pitching", Pitching)
dbWriteTable(con, "teams", Teams)
What are the corresponding SQL verbs based on the dplyr
structure below?
airport_car <- airports_db %>%
filter(lat >= 33.7666, lat <= 36.588, lon >= -84.3201, lon <= -75.4129) %>%
arrange(desc(alt)) %>%
select(name, alt)
airport_car %>%
show_query()
#> <SQL>
#> SELECT `name`, `alt`
#> FROM (SELECT *
#> FROM (SELECT *
#> FROM `airports`
#> WHERE ((`lat` >= 33.7666) AND (`lat` <= 36.588) AND (`lon` >= -84.3201) AND (`lon` <= -75.4129)))
#> ORDER BY `alt` DESC)
Add Salaries
from package Lahman
as a table to your in-memory database.
dbWriteTable(con, "salaries", Salaries)
Compute the team salaries for each team in 2016 and display the 5 teams with the highest payroll. Which team had the lowest payroll in that year?
dbGetQuery(con, paste("SELECT teamID, SUM(salary) as payroll",
"FROM salaries",
"WHERE yearID = 2016",
"GROUP BY teamID",
"ORDER BY payroll DESC",
"LIMIT 5"))
Which 10 teams had the highest winning percentage since 1990? Hint: https://www.w3schools.com/sql/func_sqlserver_cast.asp
dbGetQuery(con, paste("SELECT yearID, teamID, W, L, CAST(W as FLOAT) / G as wpct",
"FROM teams",
"WHERE yearID >= 1990",
"GROUP BY yearID, teamID",
"ORDER BY wpct DESC",
"LIMIT 10"))
How would you combine the batting and salaries tables to match up the players and years? Take a look at ?dplyr::join
. Try to combine the R data frame objects Batting
and Salaries
.
left_join(Batting, Salaries, by = c("playerID", "yearID"))