class: center, middle, inverse, title-slide # Data types in R ### Colin Rundel ### 2018-08-30 --- exclude: true --- class: middle count: false # Atomic Vectors --- ## Atomic Vectors R has six atomic vector types: <br/> `typeof` | `mode` | `storage.mode` :-----------|:------------|:---------------- logical | logical | logical double | numeric | double integer | numeric | integer character | character | character complex | complex | complex raw | raw | raw --- ## Vector types `logical` - boolean values `TRUE` and `FALSE` .pull-left[ ```r typeof(TRUE) ``` ``` ## [1] "logical" ``` ] .pull-right[ ```r mode(TRUE) ``` ``` ## [1] "logical" ``` ] <br/> `character` - text strings <div> .pull-left[ ```r typeof("hello") ``` ``` ## [1] "character" ``` ```r typeof('world') ``` ``` ## [1] "character" ``` ] .pull-right[ ```r mode("hello") ``` ``` ## [1] "character" ``` ```r mode('world') ``` ``` ## [1] "character" ``` ] </div> --- `double` - floating point numerical values (default numerical type) .pull-left[ ```r typeof(1.33) ``` ``` ## [1] "double" ``` ```r typeof(7) ``` ``` ## [1] "double" ``` ] .pull-right[ ```r mode(1.33) ``` ``` ## [1] "numeric" ``` ```r mode(7) ``` ``` ## [1] "numeric" ``` ] <br/> `integer` - integer numerical values (indicated with an `L`) <div> .pull-left[ ```r typeof( 7L ) ``` ``` ## [1] "integer" ``` ```r typeof( 1:3 ) ``` ``` ## [1] "integer" ``` ] .pull-right[ ```r mode( 7L ) ``` ``` ## [1] "numeric" ``` ```r mode( 1:3 ) ``` ``` ## [1] "numeric" ``` ] </div> --- ## Concatenation Atomic vectors can be constructed using the concatenate, `c()`, function. ```r c(1,2,3) ``` ``` ## [1] 1 2 3 ``` -- ```r c("Hello", "World!") ``` ``` ## [1] "Hello" "World!" ``` -- ```r c(1,c(2, c(3))) ``` ``` ## [1] 1 2 3 ``` **Note** - atomic vectors are *always* flat. --- class: split-thirds ## Testing types * `typeof(x)` - returns a character vector (length 1) of the *type* of object `x`. * `mode(x)` - returns a character vector (length 1) of the *mode* of object `x`. * `storage.mode(x)` - returns a character vector (length 1) of the *storage mode* of object `x`. .col1[ ```r typeof(1) ``` ``` ## [1] "double" ``` ```r typeof(1L) ``` ``` ## [1] "integer" ``` ```r typeof("A") ``` ``` ## [1] "character" ``` ```r typeof(TRUE) ``` ``` ## [1] "logical" ``` ] .col2[ ```r mode(1) ``` ``` ## [1] "numeric" ``` ```r mode(1L) ``` ``` ## [1] "numeric" ``` ```r mode("A") ``` ``` ## [1] "character" ``` ```r mode(TRUE) ``` ``` ## [1] "logical" ``` ] .col3[ ```r storage.mode(1) ``` ``` ## [1] "double" ``` ```r storage.mode(1L) ``` ``` ## [1] "integer" ``` ```r storage.mode("A") ``` ``` ## [1] "character" ``` ```r storage.mode(TRUE) ``` ``` ## [1] "logical" ``` ] --- ## Logical Predicates * `is.logical(x)` - returns `TRUE` if `x` has *type* logical. * `is.character(x)` - returns `TRUE` if `x` has *type* character. * `is.double(x)` - returns `TRUE` if `x` has *type* double. * `is.integer(x)` - returns `TRUE` if `x` has *type* integer. * `is.numeric(x)` - returns `TRUE` if `x` has *mode* numeric. .col1[ ```r is.integer(1) ``` ``` ## [1] FALSE ``` ```r is.integer(1L) ``` ``` ## [1] TRUE ``` ```r is.integer(3:7) ``` ``` ## [1] TRUE ``` ] .col2[ ```r is.double(1) ``` ``` ## [1] TRUE ``` ```r is.double(1L) ``` ``` ## [1] FALSE ``` ```r is.double(3:8) ``` ``` ## [1] FALSE ``` ] .col3[ ```r is.numeric(1) ``` ``` ## [1] TRUE ``` ```r is.numeric(1L) ``` ``` ## [1] TRUE ``` ```r is.numeric(3:7) ``` ``` ## [1] TRUE ``` ] --- ## Other useful predicates * `is.atomic(x)` - returns `TRUE` if `x` is an *atomic vector*. * `is.vector(x)` - returns `TRUE` if `x` is either type of vector (i.e. either *atomic vector* or *list*). ```r is.atomic(c(1,2,3)) ``` ``` ## [1] TRUE ``` ```r is.vector(c(1,2,3)) ``` ``` ## [1] TRUE ``` ```r is.atomic(list(1,2,3)) ``` ``` ## [1] FALSE ``` ```r is.vector(list(1,2,3)) ``` ``` ## [1] TRUE ``` --- ## Type Coercion R is a dynamically typed language -- it will automatically convert between most type without raising warnings or errors. ```r c(1,"Hello") ``` ``` ## [1] "1" "Hello" ``` -- ```r c(FALSE, 3L) ``` ``` ## [1] 0 3 ``` -- ```r c(1.2, 3L) ``` ``` ## [1] 1.2 3.0 ``` --- ## Operator coercion Functions and operators will attempt to coerce object to an appropriate type ```r 3.1+1L ``` ``` ## [1] 4.1 ``` -- ```r log(TRUE) ``` ``` ## [1] 0 ``` -- ```r TRUE & 7 ``` ``` ## [1] TRUE ``` -- ```r FALSE | !5 ``` ``` ## [1] FALSE ``` --- ## Explicit Coercion Most of the `is` functions we just saw have an `as` variant which can be used for *explicit* coercion. .pull-left[ ```r as.logical(5.2) ``` ``` ## [1] TRUE ``` ```r as.character(TRUE) ``` ``` ## [1] "TRUE" ``` ```r as.integer(pi) ``` ``` ## [1] 3 ``` ] .pull-right[ ```r as.numeric(FALSE) ``` ``` ## [1] 0 ``` ```r as.double("7.2") ``` ``` ## [1] 7.2 ``` ```r as.double("one") ``` ``` ## Warning: NAs introduced by coercion ``` ``` ## [1] NA ``` ] --- ## Missing Values R uses `NA` to represent missing values in its data structures, what may not be obvious is that there are different `NA`s for the different types. .pull-left[ ```r typeof(NA) ``` ``` ## [1] "logical" ``` ```r typeof(NA+1) ``` ``` ## [1] "double" ``` ```r typeof(NA+1L) ``` ``` ## [1] "integer" ``` ] .pull-right[ ```r typeof(NA_character_) ``` ``` ## [1] "character" ``` ```r typeof(NA_real_) ``` ``` ## [1] "double" ``` ```r typeof(NA_integer_) ``` ``` ## [1] "integer" ``` ] --- ## Stickiness of Missing Values Because `NA`s represent missing values it makes sense that any calculation using them should also be missing. .pull-left[ ```r 1 + NA ``` ``` ## [1] NA ``` ```r 1 / NA ``` ``` ## [1] NA ``` ```r NA * 5 ``` ``` ## [1] NA ``` ] .pull-right[ ```r mean(c(1,2,3,NA)) ``` ``` ## [1] NA ``` ```r sqrt(NA) ``` ``` ## [1] NA ``` ```r 3^NA ``` ``` ## [1] NA ``` ] --- ## Conditionals and missing values `NA`s can be problematic in some cases (particularly for control flow) ```r 1 == NA ``` ``` ## [1] NA ``` -- ```r if (2 != NA) "Here" ``` ``` ## Error in if (2 != NA) "Here": missing value where TRUE/FALSE needed ``` -- ```r if (all(c(1,2,NA,4) >= 1)) "There" ``` ``` ## Error in if (all(c(1, 2, NA, 4) >= 1)) "There": missing value where TRUE/FALSE needed ``` -- ```r if (any(c(1,2,NA,4) >= 1)) "There" ``` ``` ## [1] "There" ``` --- ## Testing for `NA` To explicitly test if a value is missing it is necessary to use `is.na` (often along with `any` or `all`). .pull-left[ ```r is.na(NA) ``` ``` ## [1] TRUE ``` ```r is.na(1) ``` ``` ## [1] FALSE ``` ```r is.na(c(1,2,3,NA)) ``` ``` ## [1] FALSE FALSE FALSE TRUE ``` ] .pull-right[ ```r any(is.na(c(1,2,3,NA))) ``` ``` ## [1] TRUE ``` ```r all(is.na(c(1,2,3,NA))) ``` ``` ## [1] FALSE ``` ] --- ## Other Special (double) values * `NaN` - Not a number * `Inf` - Positive infinity * `-Inf` - Negative infinity .pull-left[ ```r pi / 0 ``` ``` ## [1] Inf ``` ```r 0 / 0 ``` ``` ## [1] NaN ``` ```r 1/0 + 1/0 ``` ``` ## [1] Inf ``` ] .pull-right[ ```r 1/0 - 1/0 ``` ``` ## [1] NaN ``` ```r NaN / NA ``` ``` ## [1] NaN ``` ```r NaN * NA ``` ``` ## [1] NaN ``` ] --- ## Testing for `inf` and `NaN` `NaN` and `Inf` don't have the same testing issues that `NA` has, but there are still convenience functions for testing for .pull-left[ ```r NA ``` ``` ## [1] NA ``` ```r 1/0+1/0 ``` ``` ## [1] Inf ``` ```r 1/0-1/0 ``` ``` ## [1] NaN ``` ```r 1/0-1/0 ``` ``` ## [1] NaN ``` ] .pull-right[ ```r is.finite(NA) ``` ``` ## [1] FALSE ``` ```r is.finite(1/0+1/0) ``` ``` ## [1] FALSE ``` ```r is.finite(1/0-1/0) ``` ``` ## [1] FALSE ``` ```r is.nan(1/0-1/0) ``` ``` ## [1] TRUE ``` ] --- ## Coercion for infinity and NaN First remember that `Inf`, `-Inf`, and `NaN` have type double, however their coercion behavior is not the same as for other double values. ```r as.integer(Inf) ``` ``` ## Warning: NAs introduced by coercion to integer range ``` ``` ## [1] NA ``` ```r as.integer(NaN) ``` ``` ## [1] NA ``` .pull-left[ ```r as.logical(Inf) ``` ``` ## [1] TRUE ``` ```r as.logical(NaN) ``` ``` ## [1] NA ``` ] .pull-right[ ```r as.character(Inf) ``` ``` ## [1] "Inf" ``` ```r as.character(NaN) ``` ``` ## [1] "NaN" ``` ] --- ## Exercise 1 **Part 1** What is the type of the following vectors? Explain why they have that type. * `c(1, NA+1L, "C")` * `c(1L / 0, NA)` * `c(1:3, 5)` * `c(3L, NaN+1L)` * `c(NA, TRUE)` **Part 2** Considering only the four (common) data types, what is R's implicit type conversion hierarchy (from highest priority to lowest priority)? *Hint* - think about the pairwise interactions between types. --- class: middle count: false # Generic Vectors --- ## Lists Lists are _generic vectors_, in that they are 1 dimensional (i.e. have a length) and can contain any type of R object. ```r list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2) ``` ``` ## [[1]] ## [1] "A" ## ## [[2]] ## [1] TRUE FALSE ## ## [[3]] ## [1] 0.5 1.0 1.5 2.0 ## ## [[4]] ## function (x) ## x^2 ``` --- ## Structure Often we want a more compact representation of a complex object, the `str` function is useful for this particular task ```r str( list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2) ) ``` ``` ## List of 4 ## $ : chr "A" ## $ : logi [1:2] TRUE FALSE ## $ : num [1:4] 0.5 1 1.5 2 ## $ :function (x) ## ..- attr(*, "srcref")= 'srcref' int [1:8] 1 40 1 54 40 54 1 1 ## .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fd61d7afe70> ``` --- ## Recursive lists Lists can contain other lists, meaning they don't have to be flat ```r str( list(1, list(2, list(3, 4), 5)) ) ``` ``` ## List of 2 ## $ : num 1 ## $ :List of 3 ## ..$ : num 2 ## ..$ :List of 2 ## .. ..$ : num 3 ## .. ..$ : num 4 ## ..$ : num 5 ``` --- ## List Coercion By default a vector will be coerced to a list (as a list is more generic) if needed ```r str( c(1, list(4, list(6, 7))) ) ``` ``` ## List of 3 ## $ : num 1 ## $ : num 4 ## $ :List of 2 ## ..$ : num 6 ## ..$ : num 7 ``` -- We can coerce a list into an atomic vector using `unlist` - the usual type coercion rules then apply to determine its type. ```r unlist(list(1:3, list(4:5, 6))) ``` ``` ## [1] 1 2 3 4 5 6 ``` ```r unlist( list(1, list(2, list(3, "Hello"))) ) ``` ``` ## [1] "1" "2" "3" "Hello" ``` --- ## Named lists Because of their more complex structure we often want to name the elements of a list (we can also do this with vectors). This can make reading and accessing the list more straight forward. ```r str(list(A = 1, B = list(C = 2, D = 3))) ``` ``` ## List of 2 ## $ A: num 1 ## $ B:List of 2 ## ..$ C: num 2 ## ..$ D: num 3 ``` ```r list("knock knock" = "who's there?") ``` ``` ## $`knock knock` ## [1] "who's there?" ``` ```r names(list(ABC=1, DEF=list(H=2, I=3))) ``` ``` ## [1] "ABC" "DEF" ``` --- ## Exercise 2 Represent the following JSON data as a list in R. ```json { "firstName": "John", "lastName": "Smith", "age": 25, "address": { "streetAddress": "21 2nd Street", "city": "New York", "state": "NY", "postalCode": 10021 }, "phoneNumber": [ { "type": "home", "number": "212 555-1239" }, { "type": "fax", "number": "646 555-4567" } ] } ``` --- class: middle count: false # Functions --- ## When to use functions The goal of a function should be to encapsulate a *small* *reusable* piece of code. * Name should make it clear what the function does (think in terms of simple verbs). * Functionality should be simple enough to be quickly understood. * The smaller and more modular the code the easier it will be to reuse elsewhere. * Better to change code in one location than code everywhere. --- ## Function Parts The two parts of a function are the arguments (`formals`) and the code (`body`). ```r gcd = function(long1, lat1, long2, lat2) { R = 6371 # Earth mean radius in km # distance in km acos(sin(lat1)*sin(lat2) + cos(lat1)*cos(lat2) * cos(long2-long1)) * R } ``` -- .pull-left[ ```r formals(gcd) ``` ``` ## $long1 ## ## ## $lat1 ## ## ## $long2 ## ## ## $lat2 ``` ] .pull-right[ ```r body(gcd) ``` ``` ## { ## R = 6371 ## acos(sin(lat1) * sin(lat2) + cos(lat1) * cos(lat2) * cos(long2 - ## long1)) * R ## } ``` ] --- ## Return values There are two ways of returning values in R: explicit or implicit return values. <br/> *Explicit* - includes one or more `return` statements ```r f = function(x) { return(x*x) } ``` <br/> *Implicit* - value of the last statement is returned. ```r f = function(x) { x*x } ``` --- ## Returning multiple values If we want a function to return more than one value we can group things using either a vector or a list. ```r f = function(x) { c(x, x^2, x^3) } f(2) ``` ``` ## [1] 2 4 8 ``` ```r f(2:3) ``` ``` ## [1] 2 3 4 9 8 27 ``` --- ## Argument names When defining a function we are also implicitly defining names for the arguments, when calling the function we can use these names to pass arguments in a different order. ```r f = function(x,y,z) { paste0("x=",x," y=",y," z=",z) } ``` .pull-left[ ```r f(1,2,3) ``` ``` ## [1] "x=1 y=2 z=3" ``` ```r f(z=1,x=2,y=3) ``` ``` ## [1] "x=2 y=3 z=1" ``` ] .pull-right[ ```r f(y=2,1,3) ``` ``` ## [1] "x=1 y=2 z=3" ``` ```r f(y=2,1,x=3) ``` ``` ## [1] "x=3 y=2 z=1" ``` ] ```r f(1,2,3,m=1) ``` ``` ## Error in f(1, 2, 3, m = 1): unused argument (m = 1) ``` --- ## Argument defaults It is also possible to give function arguments default values so that they don't need to be provided every time the function is called. ```r f = function(x,y=1,z=1) { paste0("x=",x," y=",y," z=",z) } ``` ```r f() ``` ``` ## Error in paste0("x=", x, " y=", y, " z=", z): argument "x" is missing, with no default ``` ```r f(x=3) ``` ``` ## [1] "x=3 y=1 z=1" ``` ```r f(y=2,2) ``` ``` ## [1] "x=2 y=2 z=1" ``` --- ## Scope R has generous scoping rules, if it can't find a variable in the functions body, it will look for it in the next higher scope, and so on. ```r y = 1 f = function(x) { x+y } f(3) ``` ``` ## [1] 4 ``` ```r g = function(x) { y=2 x+y } g(3) ``` ``` ## [1] 5 ``` --- ## Additionally, variables defined within a scope only persist for the duration of that scope, and do not overwrite variables at higher scopes (unless you use the global assignment operator `<<-`, *which you shouldn't*) ```r x = 1 y = 1 z = 1 f = function() { y = 2 g = function() { z = 3 return(x + y + z) } return(g()) } f() ``` ``` ## [1] 6 ``` ```r c(x,y,z) ``` ``` ## [1] 1 1 1 ``` --- ## Lazy evaluation Arguments to R functions are lazily evaluated - meaning they are not evaluated until they are used ```r f = function(x) { cat("Hello world!\n") x } f(stop()) ``` ``` ## Hello world! ``` ``` ## Error in f(stop()): ``` --- ## Everything is a function ```r `+` ``` ``` ## function (e1, e2) .Primitive("+") ``` ```r typeof(`+`) ``` ``` ## [1] "builtin" ``` ```r x = 4:1 `+`(x,2) ``` ``` ## [1] 6 5 4 3 ``` --- ## Getting Help Prefixing any function name with a `?` will open the related help file for that function. ```r ?`+` ?sum ``` For functions not in the base package, you can generally see their implementation by entering the function name without parentheses (or using the `body` function). ```r lm ``` ``` ## function (formula, data, subset, weights, na.action, method = "qr", ## model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, ## contrasts = NULL, offset, ...) ## { ## ret.x <- x ## ret.y <- y ## cl <- match.call() ## mf <- match.call(expand.dots = FALSE) ## m <- match(c("formula", "data", "subset", "weights", "na.action", ## "offset"), names(mf), 0L) ## mf <- mf[c(1L, m)] ## mf$drop.unused.levels <- TRUE ## mf[[1L]] <- quote(stats::model.frame) ## mf <- eval(mf, parent.frame()) ## if (method == "model.frame") ## return(mf) ## else if (method != "qr") ## warning(gettextf("method = '%s' is not supported. Using 'qr'", ## method), domain = NA) ## mt <- attr(mf, "terms") ## y <- model.response(mf, "numeric") ## w <- as.vector(model.weights(mf)) ## if (!is.null(w) && !is.numeric(w)) ## stop("'weights' must be a numeric vector") ## offset <- as.vector(model.offset(mf)) ## if (!is.null(offset)) { ## if (length(offset) != NROW(y)) ## stop(gettextf("number of offsets is %d, should equal %d (number of observations)", ## length(offset), NROW(y)), domain = NA) ## } ## if (is.empty.model(mt)) { ## x <- NULL ## z <- list(coefficients = if (is.matrix(y)) matrix(, 0, ## 3) else numeric(), residuals = y, fitted.values = 0 * ## y, weights = w, rank = 0L, df.residual = if (!is.null(w)) sum(w != ## 0) else if (is.matrix(y)) nrow(y) else length(y)) ## if (!is.null(offset)) { ## z$fitted.values <- offset ## z$residuals <- y - offset ## } ## } ## else { ## x <- model.matrix(mt, mf, contrasts) ## z <- if (is.null(w)) ## lm.fit(x, y, offset = offset, singular.ok = singular.ok, ## ...) ## else lm.wfit(x, y, w, offset = offset, singular.ok = singular.ok, ## ...) ## } ## class(z) <- c(if (is.matrix(y)) "mlm", "lm") ## z$na.action <- attr(mf, "na.action") ## z$offset <- offset ## z$contrasts <- attr(x, "contrasts") ## z$xlevels <- .getXlevels(mt, mf) ## z$call <- cl ## z$terms <- mt ## if (model) ## z$model <- mf ## if (ret.x) ## z$x <- x ## if (ret.y) ## z$y <- y ## if (!qr) ## z$qr <- NULL ## z ## } ## <bytecode: 0x7fd61c21a778> ## <environment: namespace:stats> ``` --- ## Less Helpful Examples ```r list ``` ``` ## function (...) .Primitive("list") ``` ```r `[` ``` ``` ## .Primitive("[") ``` ```r sum ``` ``` ## function (..., na.rm = FALSE) .Primitive("sum") ``` ```r `+` ``` ``` ## function (e1, e2) .Primitive("+") ``` --- # Acknowledgments ## Acknowledgments Above materials are derived in part from the following sources: * Hadley Wickham - [Advanced R](http://adv-r.had.co.nz/) * [R Language Definition](http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html)