Low level package for interfacing R with Database management systems (DBMS) that provides a common interface to achieve the following functionality:
Provides the implementation necessary to use DBI to interface with SQLite databases.
library(RSQLite) ## Loading required package: DBI con = dbConnect(RSQLite::SQLite(), ":memory:") str(con) ## Formal class 'SQLiteConnection' [package "RSQLite"] with 5 slots ## ..@ Id :<externalptr> ## ..@ dbname : chr ":memory:" ## ..@ loadable.extensions: logi TRUE ## ..@ flags : int 6 ## ..@ vfs : chr ""
employees = data.frame(name = c("Alice","Bob","Carol","Dave","Eve","Frank"), email = c("alice@company.com", "bob@company.com", "carol@company.com", "dave@company.com", "eve@company.com", "frank@comany.com"), salary = c(52000, 40000, 30000, 33000, 44000, 37000), dept = c("Accounting", "Accounting","Sales", "Accounting","Sales","Sales"), stringsAsFactors = FALSE)
dbWriteTable(con, "employees", employees) ## [1] TRUE dbListTables(con) ## [1] "employees"
dbWriteTable(con, "employs", employees) ## [1] TRUE dbListTables(con) ## [1] "employees" "employs" dbRemoveTable(con,"employs") ## [1] TRUE dbListTables(con) ## [1] "employees"
res = dbSendQuery(con, "SELECT * FROM employees") dbFetch(res) ## name email salary dept ## 1 Alice alice@company.com 52000 Accounting ## 2 Bob bob@company.com 40000 Accounting ## 3 Carol carol@company.com 30000 Sales ## 4 Dave dave@company.com 33000 Accounting ## 5 Eve eve@company.com 44000 Sales ## 6 Frank frank@comany.com 37000 Sales dbClearResult(res) ## [1] TRUE
dbDisconnect(con) ## [1] TRUE
The following is specific to SQLite
sqlite> .tables employees
sqlite> .schema employees CREATE TABLE employees ( "name" TEXT, "email" TEXT, "salary" REAL, "dept" TEXT );
sqlite> .indices employees
sqlite> SELECT * FROM employees; Alice|alice@company.com|52000.0|Accounting Bob|bob@company.com|40000.0|Accounting Carol|carol@company.com|30000.0|Sales Dave|dave@company.com|33000.0|Accounting Eve|eve@company.com|44000.0|Sales Frank|frank@comany.com|37000.0|Sales
We can make this table output a little nicer with some additonal SQLite options:
sqlite> .mode column sqlite> .headers on
sqlite> SELECT * FROM employees; name email salary dept ---------- ----------------- ---------- ---------- Alice alice@company.com 52000.0 Accounting Bob bob@company.com 40000.0 Accounting Carol carol@company.com 30000.0 Sales Dave dave@company.com 33000.0 Accounting Eve eve@company.com 44000.0 Sales Frank frank@comany.com 37000.0 Sales
We can subset for certain columns (and rename them) using SELECT
sqlite> SELECT name AS first_name, salary FROM employees; first_name salary ---------- ---------- Alice 52000.0 Bob 40000.0 Carol 30000.0 Dave 33000.0 Eve 44000.0 Frank 37000.0
We can sort our results by adding ORDER BY
to our SELECT
statement
sqlite> SELECT name AS first_name, salary FROM employees ORDER BY salary; first_name salary ---------- ---------- Carol 30000.0 Dave 33000.0 Frank 37000.0 Bob 40000.0 Eve 44000.0 Alice 52000.0
We can sort in the opposite order by adding DESC
SELECT name AS first_name, salary FROM employees ORDER BY salary DESC; first_name salary ---------- ---------- Alice 52000.0 Eve 44000.0 Bob 40000.0 Frank 37000.0 Dave 33000.0 Carol 30000.0
We can filter rows by adding WHERE
to our statements
sqlite> SELECT * FROM employees WHERE salary < 40000; name email salary dept ---------- ----------------- ---------- ---------- Carol carol@company.com 30000.0 Sales Dave dave@company.com 33000.0 Accounting Frank frank@comany.com 37000.0 Sales sqlite> SELECT * FROM employees WHERE salary < 40000 AND dept = "Sales"; name email salary dept ---------- ----------------- ---------- ---------- Carol carol@company.com 30000.0 Sales Frank frank@comany.com 37000.0 Sales
We can create groups for the purpose of summarizing using GROUP BY
. As with dplyr it is not terribly useful by itself.
sqlite> SELECT * FROM employees GROUP BY dept; name email salary dept ---------- ---------------- ---------- ---------- Dave dave@company.com 33000.0 Accounting Frank frank@comany.com 37000.0 Sales sqlite> SELECT dept, AVG(salary) AS mean_salary, COUNT(*) AS n FROM employees GROUP BY dept; dept mean_salary n ---------- ---------------- ---------- Accounting 41666.6666666667 3 Sales 37000.0 3
We can limit the number of rows we get by using LIMIT
.
sqlite> SELECT * FROM employees LIMIT 3; name email salary dept ---------- ----------------- ---------- ---------- Alice alice@company.com 52000.0 Accounting Bob bob@company.com 40000.0 Accounting Carol carol@company.com 30000.0 Sales sqlite> SELECT * FROM employees ORDER BY name DESC LIMIT 3; name email salary dept ---------- ---------------- ---------- ---------- Frank frank@comany.com 37000.0 Sales Eve eve@company.com 44000.0 Sales Dave dave@company.com 33000.0 Accounting
sqlite> .mode csv sqlite> .import phone.csv phone sqlite> .tables employees phone sqlite> .mode column sqlite> SELECT * FROM phone; name phone ---------- ------------ Bob 919 555-1111 Carol 919 555-2222 Eve 919 555-3333 Frank 919 555-4444
By default SQLite uses a CROSS JOIN
which is not terribly useful
sqlite> SELECT * FROM employees JOIN phone; name email salary dept name phone ---------- ----------------- ---------- ---------- ---------- ------------ Alice alice@company.com 52000.0 Accounting Bob 919 555-1111 Alice alice@company.com 52000.0 Accounting Carol 919 555-2222 Alice alice@company.com 52000.0 Accounting Eve 919 555-3333 Alice alice@company.com 52000.0 Accounting Frank 919 555-4444 Bob bob@company.com 40000.0 Accounting Bob 919 555-1111 Bob bob@company.com 40000.0 Accounting Carol 919 555-2222 Bob bob@company.com 40000.0 Accounting Eve 919 555-3333 Bob bob@company.com 40000.0 Accounting Frank 919 555-4444 Carol carol@company.com 30000.0 Sales Bob 919 555-1111 Carol carol@company.com 30000.0 Sales Carol 919 555-2222 Carol carol@company.com 30000.0 Sales Eve 919 555-3333 Carol carol@company.com 30000.0 Sales Frank 919 555-4444 Dave dave@company.com 33000.0 Accounting Bob 919 555-1111 Dave dave@company.com 33000.0 Accounting Carol 919 555-2222 Dave dave@company.com 33000.0 Accounting Eve 919 555-3333 Dave dave@company.com 33000.0 Accounting Frank 919 555-4444 Eve eve@company.com 44000.0 Sales Bob 919 555-1111 Eve eve@company.com 44000.0 Sales Carol 919 555-2222 Eve eve@company.com 44000.0 Sales Eve 919 555-3333 Eve eve@company.com 44000.0 Sales Frank 919 555-4444 Frank frank@comany.com 37000.0 Sales Bob 919 555-1111 Frank frank@comany.com 37000.0 Sales Carol 919 555-2222 Frank frank@comany.com 37000.0 Sales Eve 919 555-3333 Frank frank@comany.com 37000.0 Sales Frank 919 555-4444
If you want SQLite to find the columns to merge on automatically then we prefix the join with NATURAL
.
sqlite> SELECT * FROM employees NATURAL JOIN phone; name email salary dept phone ---------- --------------- ---------- ---------- ------------ Bob bob@company.com 40000.0 Accounting 919 555-1111 Carol carol@company.c 30000.0 Sales 919 555-2222 Eve eve@company.com 44000.0 Sales 919 555-3333 Frank frank@comany.co 37000.0 Sales 919 555-4444
sqlite> SELECT * FROM employees JOIN phone ON employees.name = phone.name; name email salary dept name phone ---------- --------------- ---------- ---------- ---------- ------------ Bob bob@company.com 40000.0 Accounting Bob 919 555-1111 Carol carol@company.c 30000.0 Sales Carol 919 555-2222 Eve eve@company.com 44000.0 Sales Eve 919 555-3333 Frank frank@comany.co 37000.0 Sales Frank 919 555-4444
sqlite> SELECT * FROM employees NATURAL LEFT JOIN phone; name email salary dept phone ---------- ----------------- ---------- ---------- ---------- Alice alice@company.com 52000.0 Accounting Bob bob@company.com 40000.0 Accounting 919 555-11 Carol carol@company.com 30000.0 Sales 919 555-22 Dave dave@company.com 33000.0 Accounting Eve eve@company.com 44000.0 Sales 919 555-33 Frank frank@comany.com 37000.0 Sales 919 555-44
sqlite> SELECT * FROM employees LEFT JOIN phone ON employees.name = phone.name; name email salary dept name phone ---------- ----------------- ---------- ---------- ---------- ---------- Alice alice@company.com 52000.0 Accounting Bob bob@company.com 40000.0 Accounting Bob 919 555-11 Carol carol@company.com 30000.0 Sales Carol 919 555-22 Dave dave@company.com 33000.0 Accounting Eve eve@company.com 44000.0 Sales Eve 919 555-33 Frank frank@comany.com 37000.0 Sales Frank 919 555-44
Currently SQLite does not support RIGHT JOIN
or OUTER JOIN
.
sqlite> CREATE INDEX index_name ON employees (name); sqlite> .indices index_name sqlite> CREATE INDEX index_name_email ON employees (name,email); sqlite> .indices index_name index_name_email
We can nest tables within tables for the purpose of queries.
SELECT * FROM (SELECT * FROM employees NATURAL LEFT JOIN phone) WHERE phone IS NULL; name email salary dept phone ---------- ----------------- ---------- ---------- ---------- Alice alice@company.com 52000.0 Accounting Dave dave@company.com 33000.0 Accounting
sqlite> SELECT * FROM (SELECT * FROM employees NATURAL LEFT JOIN phone) WHERE phone IS NOT NULL; name email salary dept phone ---------- --------------- ---------- ---------- ------------ Bob bob@company.com 40000.0 Accounting 919 555-1111 Carol carol@company.c 30000.0 Sales 919 555-2222 Eve eve@company.com 44000.0 Sales 919 555-3333 Frank frank@comany.co 37000.0 Sales 919 555-4444
Lets try to create a table that has a new column - abv_avg
which contains how much more (or less) than the average, for their department, each person is paid.
Hint - This will require joining a subquery.
employees.sqlite
is available in /data/Sta523
on saxon.