---
title: "Data types in R"
author: "Colin Rundel"
date: "2018-08-30"
output:
  xaringan::moon_reader:
    css: "slides.css"
    lib_dir: libs
    nature:
      highlightStyle: github
      highlightLines: true
      countIncrementalSlides: false
---
exclude: true

```{r, message=FALSE, warning=FALSE, include=FALSE}
options(
  htmltools.dir.version = FALSE, # for blogdown
  width=80
)
htmltools::tagList(rmarkdown::html_dependency_font_awesome())
```

---
class: middle
count: false

# Atomic Vectors

---

## Atomic Vectors

R has six atomic vector types: 

<br/>

  `typeof`  |  `mode`     |  `storage.mode`
:-----------|:------------|:----------------
logical     |  logical    |  logical
double      |  numeric    |  double
integer     |  numeric    |  integer
character   |  character  |  character
complex     |  complex    |  complex
raw         |  raw        |  raw

---

## Vector types

`logical` - boolean values `TRUE` and `FALSE`

.pull-left[
```{r}
typeof(TRUE)
```
]

.pull-right[
```{r}
mode(TRUE)
```
]

<br/>

`character` - text strings

<div>

.pull-left[
```{r}
typeof("hello")
typeof('world')
```
]

.pull-right[
```{r}
mode("hello")
mode('world')
```
]

</div>

---

`double` - floating point numerical values (default numerical type)

.pull-left[
```{r}
typeof(1.33)
typeof(7)
```
]

.pull-right[
```{r}
mode(1.33)
mode(7)
```
]

<br/>

`integer` - integer numerical values (indicated with an `L`)

<div>

.pull-left[
```{r}
typeof( 7L )
typeof( 1:3 )
```
]

.pull-right[
```{r}
mode( 7L )
mode( 1:3 )
```
]

</div>

---

## Concatenation

Atomic vectors can be constructed using the concatenate, `c()`, function.

```{r}
c(1,2,3)
```

--

```{r}
c("Hello", "World!")
```

--

```{r}
c(1,c(2, c(3)))
```

**Note** - atomic vectors are *always* flat.

---
class: split-thirds

## Testing types

* `typeof(x)` - returns a character vector (length 1) of the *type* of object `x`.

* `mode(x)` - returns a character vector (length 1) of the *mode* of object `x`.

* `storage.mode(x)` - returns a character vector (length 1) of the *storage mode* of object `x`.

.col1[
```{r}
typeof(1)
typeof(1L)
typeof("A")
typeof(TRUE)
```
]

.col2[
```{r}
mode(1)
mode(1L)
mode("A")
mode(TRUE)
```
]

.col3[
```{r}
storage.mode(1)
storage.mode(1L)
storage.mode("A")
storage.mode(TRUE)
```
]

---

## Logical Predicates

* `is.logical(x)` - returns `TRUE` if `x` has *type* logical.

* `is.character(x)` - returns `TRUE` if `x` has *type* character.

* `is.double(x)` - returns `TRUE` if `x` has *type* double.

* `is.integer(x)` - returns `TRUE` if `x` has *type* integer.

* `is.numeric(x)` - returns `TRUE` if `x` has *mode* numeric.

.col1[
```{r}
is.integer(1)
is.integer(1L)
is.integer(3:7)
```
]

.col2[
```{r}
is.double(1)
is.double(1L)
is.double(3:8)
```
]

.col3[
```{r}
is.numeric(1)
is.numeric(1L)
is.numeric(3:7)
```
]


---

## Other useful predicates

* `is.atomic(x)` - returns `TRUE` if `x` is an *atomic vector*.

* `is.vector(x)` - returns `TRUE` if `x` is either type of vector (i.e. either *atomic vector* or *list*).

```{r}
is.atomic(c(1,2,3))
is.vector(c(1,2,3))
is.atomic(list(1,2,3))
is.vector(list(1,2,3))
```


---

## Type Coercion

R is a dynamically typed language -- it will automatically convert between most type without raising warnings or errors.

```{r}
c(1,"Hello")
```

--

```{r}
c(FALSE, 3L)
```

--

```{r}
c(1.2, 3L)
```

---

## Operator coercion

Functions and operators will attempt to coerce object to an appropriate type

```{r}
3.1+1L
```

--

```{r}
log(TRUE)
```

--

```{r}
TRUE & 7
```

--

```{r}
FALSE | !5
```


---

## Explicit Coercion

Most of the `is` functions we just saw have an `as` variant which can be used for *explicit* coercion.

.pull-left[
```{r}
as.logical(5.2)
as.character(TRUE)
as.integer(pi)
```
]

.pull-right[
```{r}
as.numeric(FALSE)
as.double("7.2")
as.double("one")
```
]


---

## Missing Values

R uses `NA` to represent missing values in its data structures, what may not be obvious is that there are different `NA`s for the different types.

.pull-left[
```{r}
typeof(NA)
typeof(NA+1)
typeof(NA+1L)
```
]

.pull-right[
```{r}
typeof(NA_character_)
typeof(NA_real_)
typeof(NA_integer_)
```
]

---

## Stickiness of Missing Values

Because `NA`s represent missing values it makes sense that any calculation using them should also be missing.

.pull-left[
```{r}
1 + NA
1 / NA
NA * 5
```
]

.pull-right[
```{r}
mean(c(1,2,3,NA))
sqrt(NA)
3^NA
```
]

---

## Conditionals and missing values

`NA`s can be problematic in some cases (particularly for control flow)

```{r error=TRUE}
1 == NA
```

--

```{r error=TRUE}
if (2 != NA)
  "Here"
```

--
```{r error=TRUE}
if (all(c(1,2,NA,4) >= 1))
  "There"
```

--

```{r error=TRUE}
if (any(c(1,2,NA,4) >= 1))
  "There"
```


---

## Testing for `NA`

To explicitly test if a value is missing it is necessary to use `is.na` (often along with `any` or `all`).

.pull-left[
```{r}
is.na(NA)
is.na(1)
is.na(c(1,2,3,NA))
```
]

.pull-right[
```{r}
any(is.na(c(1,2,3,NA)))
all(is.na(c(1,2,3,NA)))
```
]


---

## Other Special (double) values

* `NaN` - Not a number

* `Inf` - Positive infinity

* `-Inf` - Negative infinity

.pull-left[
```{r}
pi / 0
0 / 0
1/0 + 1/0
```
]

.pull-right[
```{r}
1/0 - 1/0
NaN / NA
NaN * NA
```
]


---

## Testing for `inf` and `NaN`

`NaN` and `Inf` don't have the same testing issues that `NA` has, but there are still convenience functions for testing for 

.pull-left[
```{r}
NA
1/0+1/0
1/0-1/0
1/0-1/0
```
]

.pull-right[
```{r}
is.finite(NA)
is.finite(1/0+1/0)
is.finite(1/0-1/0)
is.nan(1/0-1/0)
```
]


---

## Coercion for infinity and NaN

First remember that `Inf`, `-Inf`, and `NaN` have type double, however their coercion behavior is not the same as for other double values.

```{r}
as.integer(Inf)
as.integer(NaN)
```

.pull-left[
```{r}
as.logical(Inf)
as.logical(NaN)
```
]

.pull-right[
```{r}
as.character(Inf)
as.character(NaN)
```
]

---

## Exercise 1

**Part 1**

What is the type of the following vectors? Explain why they have that type.

* `c(1, NA+1L, "C")`
* `c(1L / 0, NA)`
* `c(1:3, 5)`
* `c(3L, NaN+1L)`
* `c(NA, TRUE)`


**Part 2**

Considering only the four (common) data types, what is R's implicit type conversion hierarchy (from highest priority to lowest priority)? 

*Hint* - think about the pairwise interactions between types.

---
class: middle
count: false

# Generic Vectors


---

## Lists

Lists are _generic vectors_, in that they are 1 dimensional (i.e. have a length) and can contain any type of R object.

```{r}
list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2)
```


---

## Structure

Often we want a more compact representation of a complex object, the `str` function is useful for this particular task

```{r}
str( list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2) )
```


---

## Recursive lists

Lists can contain other lists, meaning they don't have to be flat

```{r}
str( list(1, list(2, list(3, 4), 5)) )
```


---

## List Coercion

By default a vector will be coerced to a list (as a list is more generic) if needed

```{r}
str( c(1, list(4, list(6, 7))) )
```

--

We can coerce a list into an atomic vector using `unlist` - the usual type coercion rules then apply to determine its type.

```{r}
unlist(list(1:3, list(4:5, 6)))
unlist( list(1, list(2, list(3, "Hello"))) )
```


---

## Named lists

Because of their more complex structure we often want to name the elements of a list (we can also do this with vectors). This can make reading and accessing the list more straight forward.

```{r}
str(list(A = 1, B = list(C = 2, D = 3)))
list("knock knock" = "who's there?")
names(list(ABC=1, DEF=list(H=2, I=3)))
```


---

## Exercise 2

Represent the following JSON data as a list in R.

```json
{
  "firstName": "John",
  "lastName": "Smith",
  "age": 25,
  "address": 
  {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": 10021
  },
  "phoneNumber": 
  [
    {
      "type": "home",
      "number": "212 555-1239"
    },
    {
      "type": "fax",
      "number": "646 555-4567"
    }
  ]
}
```

---
class: middle
count: false

# Functions

---

## When to use functions

The goal of a function should be to encapsulate a *small* *reusable* piece of code.

* Name should make it clear what the function does (think in terms of simple verbs).

* Functionality should be simple enough to be quickly understood.

* The smaller and more modular the code the easier it will be to reuse elsewhere.

* Better to change code in one location than code everywhere.

---

## Function Parts

The two parts of a function are the arguments (`formals`) and the code (`body`).

```{r}
gcd = function(long1, lat1, long2, lat2) {
  R = 6371 # Earth mean radius in km
  # distance in km
  acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1)) * R
}
```

--

.pull-left[
```{r}
formals(gcd)
```
]

.pull-right[
```{r}
body(gcd)
```
]

---

## Return values

There are two ways of returning values in R: explicit or implicit return values.

<br/>

*Explicit* - includes one or more `return` statements

```{r}
f = function(x) {
  return(x*x)
}
```

<br/>

*Implicit* - value of the last statement is returned.

```{r}
f = function(x) {
  x*x
}
```

---

## Returning multiple values

If we want a function to return more than one value we can group things using either a vector or a list.

```{r}
f = function(x) {
  c(x, x^2, x^3)
}

f(2)
f(2:3)
```

---

## Argument names

When defining a function we are also implicitly defining names for the arguments, when calling the function we can use these names to pass arguments in a different order.


```{r}
f = function(x,y,z) {
  paste0("x=",x," y=",y," z=",z)
}
```

.pull-left[
```{r,error=TRUE}
f(1,2,3)
f(z=1,x=2,y=3)
```
]

.pull-right[
```{r,error=TRUE}
f(y=2,1,3)
f(y=2,1,x=3)
```
]

```{r,error=TRUE}
f(1,2,3,m=1)
```

---

## Argument defaults

It is also possible to give function arguments default values so that they don't need to be provided every time the function is called.

```{r error=TRUE}
f = function(x,y=1,z=1) {
  paste0("x=",x," y=",y," z=",z)
}
```

```{r error=TRUE}
f()
f(x=3)
f(y=2,2)
```

---

## Scope

R has generous scoping rules, if it can't find a variable in the functions body, it will look for it in the next higher scope, and so on.

```{r}
y = 1
f = function(x) {
  x+y
}
f(3)
```

```{r}
g = function(x) {
  y=2
  x+y
}
g(3)
```

---

## 

Additionally, variables defined within a scope only persist for the duration of that scope, and do not overwrite variables at higher scopes (unless you use the global assignment operator `<<-`, *which you shouldn't*)

```{r}
x = 1
y = 1
z = 1
f = function() {
    y = 2
    g = function() {
      z = 3
      return(x + y + z)
    }
    return(g())
}
f()
c(x,y,z)
```

---

## Lazy evaluation

Arguments to R functions are lazily evaluated - meaning they are not evaluated until they are used 

```{r, error=TRUE}
f = function(x)
{
  cat("Hello world!\n")
  x
}

f(stop())
```

---

## Everything is a function

```{r}
`+`
typeof(`+`)
x = 4:1
`+`(x,2)
```

---

## Getting Help

Prefixing any function name with a `?` will open the related help file for that function.

```{r, eval=FALSE}
?`+`
?sum
```

For functions not in the base package, you can generally see their implementation by entering the function name without parentheses (or using the `body` function).

```{r}
lm
```

---

## Less Helpful Examples

```{r}
list

`[`

sum

`+`
```


---

# Acknowledgments
## Acknowledgments

Above materials are derived in part from the following sources:

* Hadley Wickham - [Advanced R](http://adv-r.had.co.nz/)
* [R Language Definition](http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html)