---
title: "Data types in R"
author: "Colin Rundel"
date: "2018-08-30"
output:
xaringan::moon_reader:
css: "slides.css"
lib_dir: libs
nature:
highlightStyle: github
highlightLines: true
countIncrementalSlides: false
---
exclude: true
```{r, message=FALSE, warning=FALSE, include=FALSE}
options(
htmltools.dir.version = FALSE, # for blogdown
width=80
)
htmltools::tagList(rmarkdown::html_dependency_font_awesome())
```
---
class: middle
count: false
# Atomic Vectors
---
## Atomic Vectors
R has six atomic vector types:
`typeof` | `mode` | `storage.mode`
:-----------|:------------|:----------------
logical | logical | logical
double | numeric | double
integer | numeric | integer
character | character | character
complex | complex | complex
raw | raw | raw
---
## Vector types
`logical` - boolean values `TRUE` and `FALSE`
.pull-left[
```{r}
typeof(TRUE)
```
]
.pull-right[
```{r}
mode(TRUE)
```
]
`character` - text strings
.pull-left[
```{r}
typeof("hello")
typeof('world')
```
]
.pull-right[
```{r}
mode("hello")
mode('world')
```
]
---
`double` - floating point numerical values (default numerical type)
.pull-left[
```{r}
typeof(1.33)
typeof(7)
```
]
.pull-right[
```{r}
mode(1.33)
mode(7)
```
]
`integer` - integer numerical values (indicated with an `L`)
.pull-left[
```{r}
typeof( 7L )
typeof( 1:3 )
```
]
.pull-right[
```{r}
mode( 7L )
mode( 1:3 )
```
]
---
## Concatenation
Atomic vectors can be constructed using the concatenate, `c()`, function.
```{r}
c(1,2,3)
```
--
```{r}
c("Hello", "World!")
```
--
```{r}
c(1,c(2, c(3)))
```
**Note** - atomic vectors are *always* flat.
---
class: split-thirds
## Testing types
* `typeof(x)` - returns a character vector (length 1) of the *type* of object `x`.
* `mode(x)` - returns a character vector (length 1) of the *mode* of object `x`.
* `storage.mode(x)` - returns a character vector (length 1) of the *storage mode* of object `x`.
.col1[
```{r}
typeof(1)
typeof(1L)
typeof("A")
typeof(TRUE)
```
]
.col2[
```{r}
mode(1)
mode(1L)
mode("A")
mode(TRUE)
```
]
.col3[
```{r}
storage.mode(1)
storage.mode(1L)
storage.mode("A")
storage.mode(TRUE)
```
]
---
## Logical Predicates
* `is.logical(x)` - returns `TRUE` if `x` has *type* logical.
* `is.character(x)` - returns `TRUE` if `x` has *type* character.
* `is.double(x)` - returns `TRUE` if `x` has *type* double.
* `is.integer(x)` - returns `TRUE` if `x` has *type* integer.
* `is.numeric(x)` - returns `TRUE` if `x` has *mode* numeric.
.col1[
```{r}
is.integer(1)
is.integer(1L)
is.integer(3:7)
```
]
.col2[
```{r}
is.double(1)
is.double(1L)
is.double(3:8)
```
]
.col3[
```{r}
is.numeric(1)
is.numeric(1L)
is.numeric(3:7)
```
]
---
## Other useful predicates
* `is.atomic(x)` - returns `TRUE` if `x` is an *atomic vector*.
* `is.vector(x)` - returns `TRUE` if `x` is either type of vector (i.e. either *atomic vector* or *list*).
```{r}
is.atomic(c(1,2,3))
is.vector(c(1,2,3))
is.atomic(list(1,2,3))
is.vector(list(1,2,3))
```
---
## Type Coercion
R is a dynamically typed language -- it will automatically convert between most type without raising warnings or errors.
```{r}
c(1,"Hello")
```
--
```{r}
c(FALSE, 3L)
```
--
```{r}
c(1.2, 3L)
```
---
## Operator coercion
Functions and operators will attempt to coerce object to an appropriate type
```{r}
3.1+1L
```
--
```{r}
log(TRUE)
```
--
```{r}
TRUE & 7
```
--
```{r}
FALSE | !5
```
---
## Explicit Coercion
Most of the `is` functions we just saw have an `as` variant which can be used for *explicit* coercion.
.pull-left[
```{r}
as.logical(5.2)
as.character(TRUE)
as.integer(pi)
```
]
.pull-right[
```{r}
as.numeric(FALSE)
as.double("7.2")
as.double("one")
```
]
---
## Missing Values
R uses `NA` to represent missing values in its data structures, what may not be obvious is that there are different `NA`s for the different types.
.pull-left[
```{r}
typeof(NA)
typeof(NA+1)
typeof(NA+1L)
```
]
.pull-right[
```{r}
typeof(NA_character_)
typeof(NA_real_)
typeof(NA_integer_)
```
]
---
## Stickiness of Missing Values
Because `NA`s represent missing values it makes sense that any calculation using them should also be missing.
.pull-left[
```{r}
1 + NA
1 / NA
NA * 5
```
]
.pull-right[
```{r}
mean(c(1,2,3,NA))
sqrt(NA)
3^NA
```
]
---
## Conditionals and missing values
`NA`s can be problematic in some cases (particularly for control flow)
```{r error=TRUE}
1 == NA
```
--
```{r error=TRUE}
if (2 != NA)
"Here"
```
--
```{r error=TRUE}
if (all(c(1,2,NA,4) >= 1))
"There"
```
--
```{r error=TRUE}
if (any(c(1,2,NA,4) >= 1))
"There"
```
---
## Testing for `NA`
To explicitly test if a value is missing it is necessary to use `is.na` (often along with `any` or `all`).
.pull-left[
```{r}
is.na(NA)
is.na(1)
is.na(c(1,2,3,NA))
```
]
.pull-right[
```{r}
any(is.na(c(1,2,3,NA)))
all(is.na(c(1,2,3,NA)))
```
]
---
## Other Special (double) values
* `NaN` - Not a number
* `Inf` - Positive infinity
* `-Inf` - Negative infinity
.pull-left[
```{r}
pi / 0
0 / 0
1/0 + 1/0
```
]
.pull-right[
```{r}
1/0 - 1/0
NaN / NA
NaN * NA
```
]
---
## Testing for `inf` and `NaN`
`NaN` and `Inf` don't have the same testing issues that `NA` has, but there are still convenience functions for testing for
.pull-left[
```{r}
NA
1/0+1/0
1/0-1/0
1/0-1/0
```
]
.pull-right[
```{r}
is.finite(NA)
is.finite(1/0+1/0)
is.finite(1/0-1/0)
is.nan(1/0-1/0)
```
]
---
## Coercion for infinity and NaN
First remember that `Inf`, `-Inf`, and `NaN` have type double, however their coercion behavior is not the same as for other double values.
```{r}
as.integer(Inf)
as.integer(NaN)
```
.pull-left[
```{r}
as.logical(Inf)
as.logical(NaN)
```
]
.pull-right[
```{r}
as.character(Inf)
as.character(NaN)
```
]
---
## Exercise 1
**Part 1**
What is the type of the following vectors? Explain why they have that type.
* `c(1, NA+1L, "C")`
* `c(1L / 0, NA)`
* `c(1:3, 5)`
* `c(3L, NaN+1L)`
* `c(NA, TRUE)`
**Part 2**
Considering only the four (common) data types, what is R's implicit type conversion hierarchy (from highest priority to lowest priority)?
*Hint* - think about the pairwise interactions between types.
---
class: middle
count: false
# Generic Vectors
---
## Lists
Lists are _generic vectors_, in that they are 1 dimensional (i.e. have a length) and can contain any type of R object.
```{r}
list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2)
```
---
## Structure
Often we want a more compact representation of a complex object, the `str` function is useful for this particular task
```{r}
str( list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2) )
```
---
## Recursive lists
Lists can contain other lists, meaning they don't have to be flat
```{r}
str( list(1, list(2, list(3, 4), 5)) )
```
---
## List Coercion
By default a vector will be coerced to a list (as a list is more generic) if needed
```{r}
str( c(1, list(4, list(6, 7))) )
```
--
We can coerce a list into an atomic vector using `unlist` - the usual type coercion rules then apply to determine its type.
```{r}
unlist(list(1:3, list(4:5, 6)))
unlist( list(1, list(2, list(3, "Hello"))) )
```
---
## Named lists
Because of their more complex structure we often want to name the elements of a list (we can also do this with vectors). This can make reading and accessing the list more straight forward.
```{r}
str(list(A = 1, B = list(C = 2, D = 3)))
list("knock knock" = "who's there?")
names(list(ABC=1, DEF=list(H=2, I=3)))
```
---
## Exercise 2
Represent the following JSON data as a list in R.
```json
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address":
{
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": 10021
},
"phoneNumber":
[
{
"type": "home",
"number": "212 555-1239"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
```
---
class: middle
count: false
# Functions
---
## When to use functions
The goal of a function should be to encapsulate a *small* *reusable* piece of code.
* Name should make it clear what the function does (think in terms of simple verbs).
* Functionality should be simple enough to be quickly understood.
* The smaller and more modular the code the easier it will be to reuse elsewhere.
* Better to change code in one location than code everywhere.
---
## Function Parts
The two parts of a function are the arguments (`formals`) and the code (`body`).
```{r}
gcd = function(long1, lat1, long2, lat2) {
R = 6371 # Earth mean radius in km
# distance in km
acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1)) * R
}
```
--
.pull-left[
```{r}
formals(gcd)
```
]
.pull-right[
```{r}
body(gcd)
```
]
---
## Return values
There are two ways of returning values in R: explicit or implicit return values.
*Explicit* - includes one or more `return` statements
```{r}
f = function(x) {
return(x*x)
}
```
*Implicit* - value of the last statement is returned.
```{r}
f = function(x) {
x*x
}
```
---
## Returning multiple values
If we want a function to return more than one value we can group things using either a vector or a list.
```{r}
f = function(x) {
c(x, x^2, x^3)
}
f(2)
f(2:3)
```
---
## Argument names
When defining a function we are also implicitly defining names for the arguments, when calling the function we can use these names to pass arguments in a different order.
```{r}
f = function(x,y,z) {
paste0("x=",x," y=",y," z=",z)
}
```
.pull-left[
```{r,error=TRUE}
f(1,2,3)
f(z=1,x=2,y=3)
```
]
.pull-right[
```{r,error=TRUE}
f(y=2,1,3)
f(y=2,1,x=3)
```
]
```{r,error=TRUE}
f(1,2,3,m=1)
```
---
## Argument defaults
It is also possible to give function arguments default values so that they don't need to be provided every time the function is called.
```{r error=TRUE}
f = function(x,y=1,z=1) {
paste0("x=",x," y=",y," z=",z)
}
```
```{r error=TRUE}
f()
f(x=3)
f(y=2,2)
```
---
## Scope
R has generous scoping rules, if it can't find a variable in the functions body, it will look for it in the next higher scope, and so on.
```{r}
y = 1
f = function(x) {
x+y
}
f(3)
```
```{r}
g = function(x) {
y=2
x+y
}
g(3)
```
---
##
Additionally, variables defined within a scope only persist for the duration of that scope, and do not overwrite variables at higher scopes (unless you use the global assignment operator `<<-`, *which you shouldn't*)
```{r}
x = 1
y = 1
z = 1
f = function() {
y = 2
g = function() {
z = 3
return(x + y + z)
}
return(g())
}
f()
c(x,y,z)
```
---
## Lazy evaluation
Arguments to R functions are lazily evaluated - meaning they are not evaluated until they are used
```{r, error=TRUE}
f = function(x)
{
cat("Hello world!\n")
x
}
f(stop())
```
---
## Everything is a function
```{r}
`+`
typeof(`+`)
x = 4:1
`+`(x,2)
```
---
## Getting Help
Prefixing any function name with a `?` will open the related help file for that function.
```{r, eval=FALSE}
?`+`
?sum
```
For functions not in the base package, you can generally see their implementation by entering the function name without parentheses (or using the `body` function).
```{r}
lm
```
---
## Less Helpful Examples
```{r}
list
`[`
sum
`+`
```
---
# Acknowledgments
## Acknowledgments
Above materials are derived in part from the following sources:
* Hadley Wickham - [Advanced R](http://adv-r.had.co.nz/)
* [R Language Definition](http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html)