Low level package for interfacing R with Database management systems (DBMS) that provides a common interface to achieve the following functionality:
Provides the implementation necessary to use DBI to interface with SQLite databases.
library(RSQLite)
## Loading required package: DBI
con = dbConnect(RSQLite::SQLite(), ":memory:")
str(con)
## Formal class 'SQLiteConnection' [package "RSQLite"] with 5 slots
## ..@ Id :<externalptr>
## ..@ dbname : chr ":memory:"
## ..@ loadable.extensions: logi TRUE
## ..@ flags : int 6
## ..@ vfs : chr ""
employees = data.frame(name = c("Alice","Bob","Carol","Dave","Eve","Frank"),
email = c("alice@company.com", "bob@company.com",
"carol@company.com", "dave@company.com",
"eve@company.com", "frank@comany.com"),
salary = c(52000, 40000, 30000, 33000, 44000, 37000),
dept = c("Accounting", "Accounting","Sales",
"Accounting","Sales","Sales"),
stringsAsFactors = FALSE)
dbWriteTable(con, "employees", employees)
## [1] TRUE
dbListTables(con)
## [1] "employees"
dbWriteTable(con, "employs", employees)
## [1] TRUE
dbListTables(con)
## [1] "employees" "employs"
dbRemoveTable(con,"employs")
## [1] TRUE
dbListTables(con)
## [1] "employees"
res = dbSendQuery(con, "SELECT * FROM employees")
dbFetch(res)
## name email salary dept
## 1 Alice alice@company.com 52000 Accounting
## 2 Bob bob@company.com 40000 Accounting
## 3 Carol carol@company.com 30000 Sales
## 4 Dave dave@company.com 33000 Accounting
## 5 Eve eve@company.com 44000 Sales
## 6 Frank frank@comany.com 37000 Sales
dbClearResult(res)
## [1] TRUE
dbDisconnect(con)
## [1] TRUE
The following is specific to SQLite
sqlite> .tables
employees
sqlite> .schema employees
CREATE TABLE employees
( "name" TEXT,
"email" TEXT,
"salary" REAL,
"dept" TEXT
);
sqlite> .indices employees
sqlite> SELECT * FROM employees;
Alice|alice@company.com|52000.0|Accounting
Bob|bob@company.com|40000.0|Accounting
Carol|carol@company.com|30000.0|Sales
Dave|dave@company.com|33000.0|Accounting
Eve|eve@company.com|44000.0|Sales
Frank|frank@comany.com|37000.0|Sales
We can make this table output a little nicer with some additonal SQLite options:
sqlite> .mode column
sqlite> .headers on
sqlite> SELECT * FROM employees;
name email salary dept
---------- ----------------- ---------- ----------
Alice alice@company.com 52000.0 Accounting
Bob bob@company.com 40000.0 Accounting
Carol carol@company.com 30000.0 Sales
Dave dave@company.com 33000.0 Accounting
Eve eve@company.com 44000.0 Sales
Frank frank@comany.com 37000.0 Sales
We can subset for certain columns (and rename them) using SELECT
sqlite> SELECT name AS first_name, salary FROM employees;
first_name salary
---------- ----------
Alice 52000.0
Bob 40000.0
Carol 30000.0
Dave 33000.0
Eve 44000.0
Frank 37000.0
We can sort our results by adding ORDER BY
to our SELECT
statement
sqlite> SELECT name AS first_name, salary FROM employees ORDER BY salary;
first_name salary
---------- ----------
Carol 30000.0
Dave 33000.0
Frank 37000.0
Bob 40000.0
Eve 44000.0
Alice 52000.0
We can sort in the opposite order by adding DESC
SELECT name AS first_name, salary FROM employees ORDER BY salary DESC;
first_name salary
---------- ----------
Alice 52000.0
Eve 44000.0
Bob 40000.0
Frank 37000.0
Dave 33000.0
Carol 30000.0
We can filter rows by adding WHERE
to our statements
sqlite> SELECT * FROM employees WHERE salary < 40000;
name email salary dept
---------- ----------------- ---------- ----------
Carol carol@company.com 30000.0 Sales
Dave dave@company.com 33000.0 Accounting
Frank frank@comany.com 37000.0 Sales
sqlite> SELECT * FROM employees WHERE salary < 40000 AND dept = "Sales";
name email salary dept
---------- ----------------- ---------- ----------
Carol carol@company.com 30000.0 Sales
Frank frank@comany.com 37000.0 Sales
We can create groups for the purpose of summarizing using GROUP BY
. As with dplyr it is not terribly useful by itself.
sqlite> SELECT * FROM employees GROUP BY dept;
name email salary dept
---------- ---------------- ---------- ----------
Dave dave@company.com 33000.0 Accounting
Frank frank@comany.com 37000.0 Sales
sqlite> SELECT dept, AVG(salary) AS mean_salary, COUNT(*) AS n FROM employees GROUP BY dept;
dept mean_salary n
---------- ---------------- ----------
Accounting 41666.6666666667 3
Sales 37000.0 3
We can limit the number of rows we get by using LIMIT
.
sqlite> SELECT * FROM employees LIMIT 3;
name email salary dept
---------- ----------------- ---------- ----------
Alice alice@company.com 52000.0 Accounting
Bob bob@company.com 40000.0 Accounting
Carol carol@company.com 30000.0 Sales
sqlite> SELECT * FROM employees ORDER BY name DESC LIMIT 3;
name email salary dept
---------- ---------------- ---------- ----------
Frank frank@comany.com 37000.0 Sales
Eve eve@company.com 44000.0 Sales
Dave dave@company.com 33000.0 Accounting
sqlite> .mode csv
sqlite> .import phone.csv phone
sqlite> .tables
employees phone
sqlite> .mode column
sqlite> SELECT * FROM phone;
name phone
---------- ------------
Bob 919 555-1111
Carol 919 555-2222
Eve 919 555-3333
Frank 919 555-4444
By default SQLite uses a CROSS JOIN
which is not terribly useful
sqlite> SELECT * FROM employees JOIN phone;
name email salary dept name phone
---------- ----------------- ---------- ---------- ---------- ------------
Alice alice@company.com 52000.0 Accounting Bob 919 555-1111
Alice alice@company.com 52000.0 Accounting Carol 919 555-2222
Alice alice@company.com 52000.0 Accounting Eve 919 555-3333
Alice alice@company.com 52000.0 Accounting Frank 919 555-4444
Bob bob@company.com 40000.0 Accounting Bob 919 555-1111
Bob bob@company.com 40000.0 Accounting Carol 919 555-2222
Bob bob@company.com 40000.0 Accounting Eve 919 555-3333
Bob bob@company.com 40000.0 Accounting Frank 919 555-4444
Carol carol@company.com 30000.0 Sales Bob 919 555-1111
Carol carol@company.com 30000.0 Sales Carol 919 555-2222
Carol carol@company.com 30000.0 Sales Eve 919 555-3333
Carol carol@company.com 30000.0 Sales Frank 919 555-4444
Dave dave@company.com 33000.0 Accounting Bob 919 555-1111
Dave dave@company.com 33000.0 Accounting Carol 919 555-2222
Dave dave@company.com 33000.0 Accounting Eve 919 555-3333
Dave dave@company.com 33000.0 Accounting Frank 919 555-4444
Eve eve@company.com 44000.0 Sales Bob 919 555-1111
Eve eve@company.com 44000.0 Sales Carol 919 555-2222
Eve eve@company.com 44000.0 Sales Eve 919 555-3333
Eve eve@company.com 44000.0 Sales Frank 919 555-4444
Frank frank@comany.com 37000.0 Sales Bob 919 555-1111
Frank frank@comany.com 37000.0 Sales Carol 919 555-2222
Frank frank@comany.com 37000.0 Sales Eve 919 555-3333
Frank frank@comany.com 37000.0 Sales Frank 919 555-4444
If you want SQLite to find the columns to merge on automatically then we prefix the join with NATURAL
.
sqlite> SELECT * FROM employees NATURAL JOIN phone;
name email salary dept phone
---------- --------------- ---------- ---------- ------------
Bob bob@company.com 40000.0 Accounting 919 555-1111
Carol carol@company.c 30000.0 Sales 919 555-2222
Eve eve@company.com 44000.0 Sales 919 555-3333
Frank frank@comany.co 37000.0 Sales 919 555-4444
sqlite> SELECT * FROM employees JOIN phone ON employees.name = phone.name;
name email salary dept name phone
---------- --------------- ---------- ---------- ---------- ------------
Bob bob@company.com 40000.0 Accounting Bob 919 555-1111
Carol carol@company.c 30000.0 Sales Carol 919 555-2222
Eve eve@company.com 44000.0 Sales Eve 919 555-3333
Frank frank@comany.co 37000.0 Sales Frank 919 555-4444
sqlite> SELECT * FROM employees NATURAL LEFT JOIN phone;
name email salary dept phone
---------- ----------------- ---------- ---------- ----------
Alice alice@company.com 52000.0 Accounting
Bob bob@company.com 40000.0 Accounting 919 555-11
Carol carol@company.com 30000.0 Sales 919 555-22
Dave dave@company.com 33000.0 Accounting
Eve eve@company.com 44000.0 Sales 919 555-33
Frank frank@comany.com 37000.0 Sales 919 555-44
sqlite> SELECT * FROM employees LEFT JOIN phone ON employees.name = phone.name;
name email salary dept name phone
---------- ----------------- ---------- ---------- ---------- ----------
Alice alice@company.com 52000.0 Accounting
Bob bob@company.com 40000.0 Accounting Bob 919 555-11
Carol carol@company.com 30000.0 Sales Carol 919 555-22
Dave dave@company.com 33000.0 Accounting
Eve eve@company.com 44000.0 Sales Eve 919 555-33
Frank frank@comany.com 37000.0 Sales Frank 919 555-44
Currently SQLite does not support RIGHT JOIN
or OUTER JOIN
.
sqlite> CREATE INDEX index_name ON employees (name);
sqlite> .indices
index_name
sqlite> CREATE INDEX index_name_email ON employees (name,email);
sqlite> .indices
index_name
index_name_email
We can nest tables within tables for the purpose of queries.
SELECT * FROM (SELECT * FROM employees NATURAL LEFT JOIN phone) WHERE phone IS NULL;
name email salary dept phone
---------- ----------------- ---------- ---------- ----------
Alice alice@company.com 52000.0 Accounting
Dave dave@company.com 33000.0 Accounting
sqlite> SELECT * FROM (SELECT * FROM employees NATURAL LEFT JOIN phone) WHERE phone IS NOT NULL;
name email salary dept phone
---------- --------------- ---------- ---------- ------------
Bob bob@company.com 40000.0 Accounting 919 555-1111
Carol carol@company.c 30000.0 Sales 919 555-2222
Eve eve@company.com 44000.0 Sales 919 555-3333
Frank frank@comany.co 37000.0 Sales 919 555-4444
Lets try to create a table that has a new column - abv_avg
which contains how much more (or less) than the average, for their department, each person is paid.
Hint - This will require joining a subquery.
employees.sqlite
is available in /data/Sta523
on saxon.