class: center, middle, inverse, title-slide # Fundamentals of R ## Programming for Statistical Science ### Shawn Santo --- ## Supplementary materials Full video lecture available in Zoom Cloud Recordings Companion videos - [RStudio Tour](https://warpwire.duke.edu/w/4zEEAA/) - [Vectors](https://warpwire.duke.edu/w/rzEEAA/) - [Operators, vectorization, and length coercion](https://warpwire.duke.edu/w/rTEEAA/) - [Control flow](https://warpwire.duke.edu/w/pzEEAA/) - [Error action](https://warpwire.duke.edu/w/qzEEAA/) - [Loops](https://warpwire.duke.edu/w/sTEEAA/) .small-text[ Videos were created for STA 323 & 523 - Summer 2020 ] Additional resources - [Google’s R Style Guide](https://google.github.io/styleguide/Rguide.html) - [Hadley's R Style Guide](http://r-pkgs.had.co.nz/style.html) - [Sections 3.1 – 3.2](https://adv-r.hadley.nz/vectors-chap.html) Advanced R - [Chapter 5](https://adv-r.hadley.nz/control-flow.html) Advanced R --- class: inverse, center, middle # Vectors --- ## Vectors The fundamental building block of data in R is a vector (collections of related values, objects, other data structures, etc). -- R has two types of vectors: * **atomic** vectors - homogeneous collections of the *same* type (e.g. all logical values, all numbers, or all character strings). * **generic** vectors - heterogeneous collections of *any* type of R object, even other lists (meaning they can have a hierarchical/tree-like structure). <br><br> I will use the term component or element when referring to a value inside a vector. --- ## Vector interrelationships .middle.center[ <img src="images/vector_types.png" height="400px"> ] .small-text[ *Source*: https://r4ds.had.co.nz/vectors.html ] --- ## Atomic vectors R has six atomic vector types: .center[ `logical`, `integer`, `double`, `character`, `complex`, `raw` ] In this course we will mostly work with the first four. You will rarely work with the last two types - complex and raw. ```r x <- c(T, F, TRUE, FALSE) typeof(x) ``` ``` #> [1] "logical" ``` ```r y <- c("a", "few", "more", "slides") typeof(y) ``` ``` #> [1] "character" ``` --- ## Coercion hierarchy If you try to combine components of different types into a single atomic vector, R will try to coerce all elements so they can be represented as the simplest type. .center[ `character` `\(\rightarrow\)` `double` `\(\rightarrow\)` `integer` `\(\rightarrow\)` `logical` ] ```r x <- c(T, 5, F, 0, 1) y <- c("a", 1, T) z <- c(3.0, 4L, 0L) ``` -- .pull-left[ ```r x ``` ``` #> [1] 1 5 0 0 1 ``` ```r y ``` ``` #> [1] "a" "1" "TRUE" ``` ```r z ``` ``` #> [1] 3 4 0 ``` ] .pull-right[ ```r typeof(x) ``` ``` #> [1] "double" ``` ```r typeof(y) ``` ``` #> [1] "character" ``` ```r typeof(z) ``` ``` #> [1] "double" ``` ] --- ## Concatenation One way to construct atomic vectors is with function `c()`. ```r c(1, 0, 1, 1, 6) ``` ``` #> [1] 1 0 1 1 6 ``` ```r c(c(3, 4), c(10, TRUE)) ``` ``` #> [1] 3 4 10 1 ``` ```r c(pi) ``` ``` #> [1] 3.141593 ``` --- class: inverse, center, middle # Operators, vectorization, and length coercion --- ## Logical (Boolean) operators | Operator | Operation | Vectorized? |:-----------------------------:|:-------------:|:------------: | <code>x | y</code> | or | Yes | `x & y` | and | Yes | `!x` | not | Yes | <code>x || y</code> | or | No | `x && y` | and | No |`xor(x,y)` | exclusive or | Yes <br><br> -- What do we mean if we say a function or operation is vectorized? --- ## Boolean examples ```r x <- c(T, F, T, T) y <- c(F, F, T, F) ``` -- .pull-left[ ```r !x ``` ``` #> [1] FALSE TRUE FALSE FALSE ``` ```r x | y ``` ``` #> [1] TRUE FALSE TRUE TRUE ``` ```r x || y ``` ``` #> [1] TRUE ``` ] -- .pull-right[ ```r x & y ``` ``` #> [1] FALSE FALSE TRUE FALSE ``` ```r x && y ``` ``` #> [1] FALSE ``` ```r xor(x, y) ``` ``` #> [1] TRUE FALSE FALSE TRUE ``` ] --- ## Comparison operators | Operator | Comparison | Vectorized? |:----------:|:--------------------------:|:----------------: | `x < y` | less than | Yes | `x > y` | greater than | Yes | `x <= y` | less than or equal to | Yes | `x >= y` | greater than or equal to | Yes | `x != y` | not equal to | Yes | `x == y` | equal to | Yes | `x %in% y` | contains | Yes (over `x`) --- ## Comparison examples ```r x <- c(4, 10, -5) y <- c(0, 51, 9 / 5) z <- c("four", "for", "4") ``` -- .pull-left[ ```r x > y ``` ``` #> [1] TRUE FALSE FALSE ``` ```r x != y ``` ``` #> [1] TRUE TRUE TRUE ``` ] -- .pull-right[ ```r x == z ``` ``` #> [1] FALSE FALSE FALSE ``` ```r x %in% z ``` ``` #> [1] TRUE FALSE FALSE ``` ] --- ## What else is vectorized? - Most of the mathematical operators - Many functions in `base` R and created by user's in packages ```r a <- c(0, -3, sqrt(75)) b <- c(1, 3, 2) ``` .pull-left[ ```r a + b ``` ``` #> [1] 1.00000 0.00000 10.66025 ``` ```r a ^ b ``` ``` #> [1] 0 -27 75 ``` ] .pull-right[ ```r rnorm(n = 3, mean = a, sd = b) ``` ``` #> [1] -0.6483697 1.6219890 6.7336622 ``` ```r exp(a / b) ``` ``` #> [1] 1.0000000 0.3678794 75.9539335 ``` ] --- ## Length coercion (vector recycling) The shorter of two atomic vectors in an operation is recycled until it is the same length as the longer atomic vector. ```r x <- c(2, 4, 6) y <- c(1, 1, 1, 2, 2) ``` .small-text[ ```r x > y ``` ``` #> [1] TRUE TRUE TRUE FALSE TRUE ``` ```r x == y ``` ``` #> [1] FALSE FALSE FALSE TRUE FALSE ``` ```r 10 / x ``` ``` #> [1] 5.000000 2.500000 1.666667 ``` ] --- class: inverse, center, middle # Control flow --- ## Conditional control flow Conditional (choice) control flow is governed by `if` and `switch()`. .pull-left[ ```r if (condition) { # code to run # when condition is # TRUE } ``` ] .pull-right[ ```r if (TRUE) { print("The condition must have been true!") } ``` ] --- ## `if` examples ```r if (1 > 0) { print("Yes, 1 is greater than 0.") } ``` ``` #> [1] "Yes, 1 is greater than 0." ``` -- ```r x <- c(1, 2, 3, 4) if (3 %in% x) { print("Yes, 3 is in x.") } ``` ``` #> [1] "Yes, 3 is in x." ``` -- ```r if (-6) { print("Other types are coerced to logical if possible.") } ``` ``` #> [1] "Other types are coerced to logical if possible." ``` --- ## More `if` examples ```r if (c(F, T, T)) { print("How many logical values can if handle?") } ``` ``` #> Warning in if (c(F, T, T)) {: the condition has length > 1 and only the first #> element will be used ``` -- ```r x <- c(1, 2, 3, 4) if (x %in% 3) { print("This works?") } ``` ```r if (c(1, 0, 1)) { print("Other types are coerced to logical if possible.") } ``` ``` #> [1] "Other types are coerced to logical if possible." ``` <br/> .small-text[ I suppressed warnings in the last two examples. ] --- ## `if` is not vectorized To remedy this potential problem of a non-vectorized `if`, you can 1. try to collapse a logical vector of length greater than 1 to a logical vector of length 1 with functions - `any()` - `all()` 2. use a vectorized conditional function such as `ifelse()` or `dplyr::case_when()`. --- ## Functions `any()` and `all()` ```r x <- c(-5, 0, 5, 10, 15) any(x >= 5) ``` ``` #> [1] TRUE ``` ```r all(x >= 5) ``` ``` #> [1] FALSE ``` <br/><br/> Functions `any()` and `all()` require a logical vector as input. --- ## Vectorized `if` ```r z <- c(-4:-1, 1:3) z ``` ``` #> [1] -4 -3 -2 -1 1 2 3 ``` ```r ifelse(test = z < 0, yes = "neg", no = "pos") ``` ``` #> [1] "neg" "neg" "neg" "neg" "pos" "pos" "pos" ``` <br> -- ```r set.seed(532) x <- rnorm(n = 4, mean = 0, sd = 1) x ``` ``` #> [1] 3.105059 -1.329432 -1.466140 -0.345289 ``` ```r ifelse(test = abs(x) > 3, yes = "outlier", no = "no outlier") ``` ``` #> [1] "outlier" "no outlier" "no outlier" "no outlier" ``` --- ## Nested conditionals .pull-left[ ```r if (condition_one) { ## ## Code to run ## } else if (condition_two) { ## ## Code to run ## } else { ## ## Code to run ## } ``` ] .pull-right[ ```r x <- 0 if (x < 0) { "Negative" } else if (x > 0) { "Positive" } else { "Zero" } ``` ``` #> [1] "Zero" ``` ] --- class: inverse, middle, center # Error action --- ## Execute error action Functions `stop()` and `stopifnot()` execute an error action. These are useful if you want to validate inputs or function arguments. ```r x <- -1 if (x < 0) { stop("Negative numbers not allowed!") } ``` ``` #> Error in eval(expr, envir, enclos): Negative numbers not allowed! ``` <br> -- ```r x <- c(3, 9, 28) stopifnot(any(x >= 0), all(x %% 3 == 0)) ``` ``` #> Error: all(x%%3 == 0) is not TRUE ``` If any of the expressions in function `stopifnot()` are not `TRUE`, then function `stop()` is called and an error message is shown. --- ## Exercises 1. What does each of the following return? Run the code to check your answer. ```r if (1 == "1") "coercion works" else "no coercion " ifelse(5 > c(1, 10, 2), "hello", "olleh") ``` <br> 2. Consider two vectors, `x` and `y`, each of length one. Write a set of conditionals that satisfy the following. - If `x` is positive and `y` is negative or `y` is positive and `x` is negative, print "knits". - If `x` divided by `y` is positive, print "stink". - Stop execution if `x` or `y` are zero. Test your code with various `x` and `y` values. Where did you place the stop execution code? ??? ## Solutions 1. .solution[ ```r if (1 == "1") "coercion works" else "no coercion " ``` ``` #> [1] "coercion works" ``` ```r ifelse(5 > c(1, 10, 2), "hello", "olleh") ``` ``` #> [1] "hello" "olleh" "hello" ``` ] 2. .solution[ ```r x <- 4 y <- -10 if (x == 0 | y == 0) { stop("One of x or y is 0!") } else if (x / y > 0) { print("stink") } else { print("knits") } ``` ``` #> [1] "knits" ``` ] --- class: inverse, middle, center # Loops --- ## Loop types R supports three types of loops: `for`, `while`, and `repeat`. ```r for (item in vector) { ## ## Iterate this code ## } ``` ```r while (we_have_a_true_condition) { ## ## Iterate this code ## } ``` ```r repeat { ## ## Iterate this code ## } ``` In the `repeat` loop we will need a `break` statement to end iteration. --- ## `for` loop A `for` loop allows you to iterate code over items in a vector. ```r k <- 0 for (i in c(2, 4, 6, 8)) { print(i ^ 2) k <- k + i ^ 2 } ``` ``` #> [1] 4 #> [1] 16 #> [1] 36 #> [1] 64 ``` -- ```r k ``` ``` #> [1] 120 ``` -- ```r for (i in c(2, 4, 6, 8)) { i ^ 2 } ``` *Automatic printing is turned off inside loops.* ??? .tiny[ ```r i <- 10 for (i in 1:3) { print("Today is Tuesday.") } ``` ``` #> [1] "Today is Tuesday." #> [1] "Today is Tuesday." #> [1] "Today is Tuesday." ``` ```r i ``` ``` #> [1] 3 ``` ] Variable `i` is overwritten when `for` is executed. Thus, `i` is assigned to the current environment. Convention is to use `i`, `j`, `k` as loop items. It will be best to avoid using these as named objects elsewhere in your code. --- ## `while` loop A `while` loop will iterate code until a given condition is `FALSE`. ```r i <- 1 res <- rep(0, 10) i ``` ``` #> [1] 1 ``` ```r res ``` ``` #> [1] 0 0 0 0 0 0 0 0 0 0 ``` -- ```r while (i <= 10) { res[i] <- i ^ 2 i <- i + 1 } res ``` ``` #> [1] 1 4 9 16 25 36 49 64 81 100 ``` --- ## `repeat` loop A `repeat` loop will iterate code until a `break` statement is executed. ```r i <- 1 res <- rep(NA, 10) repeat { res[i] <- i ^ 2 i <- i + 1 if (i > 10) {break} } res ``` ``` #> [1] 1 4 9 16 25 36 49 64 81 100 ``` --- ## Loop keywords: `next` and `break` - `next` exits the current iteration and advances the looping index - `break` exits the loop - Both `break` and `next` apply only to the innermost of nested loops. ```r for (i in 1:10) { * if (i %% 2 == 0) {next} print(paste("Number ", i, " is odd.")) * if (i %% 7 == 0) {break} } ``` -- ``` #> [1] "Number 1 is odd." #> [1] "Number 3 is odd." #> [1] "Number 5 is odd." #> [1] "Number 7 is odd." ``` --- ## Ancillary loop functions You may want to loop over indices of an object as opposed to the object's values. To do this, consider using one of `length()`, `seq()`, `seq_along()`, and `seq_len()`. .pull-left[ ```r 4:7 ``` ``` #> [1] 4 5 6 7 ``` ```r length(4:7) ``` ``` #> [1] 4 ``` ```r seq(4, 7) ``` ``` #> [1] 4 5 6 7 ``` ] .pull-right[ ```r seq_along(4:7) ``` ``` #> [1] 1 2 3 4 ``` ```r seq_len(length(4:7)) ``` ``` #> [1] 1 2 3 4 ``` ```r seq(4, 7, by = 2) ``` ``` #> [1] 4 6 ``` ] <br> Iterating over `seq_along(x)` is a better option than `1:length(x)`. --- ## Loop tips 1. Preallocate your output object when possible. 2. Don't use a `while` or `repeat` loop if a `for` loop is possible. 3. Don't use any type of loop if vectorization is possible. -- .pull-left[ Slow... ```r a <- c() for (i in seq_len(10)) { a <- c(a, i ^ 3) } ``` ] -- .pull-right[ Faster... ```r a <- numeric(10) for (i in seq_len(10)) { a[i] <- i ^ 3 } ``` ] -- Even faster... ```r (1:10) ^ 3 ``` --- ## Exercises 1. Consider the vector `x` below. ```r x <- c(3, 4, 12, 19, 23, 49, 100, 63, 70) ``` Write R code that prints the perfect squares in `x`. <br><br> 2. Consider `z <- c(-1, .5, 0, .5, 1)`. Write R code that prints the smallest non-negative integer `\(k\)` satisfying the inequality `$$\lvert cos(k) - z \rvert < 0.001$$` for each component of `z`. ??? ## Solution 1. .solution[ ```r x <- c(3, 4, 12, 19, 23, 49, 100, 63, 70) for (i in x) { if (sqrt(i) %% 1) { next } print(i) } ``` ``` #> [1] 4 #> [1] 49 #> [1] 100 ``` ] 2. .solution[ ```r for (z in c(-1, .5, 0, .5, 1)) { k <- 0 while (abs(cos(k) - z) >= .001) { k <- k + 1 } print(k) } ``` ``` #> [1] 22 #> [1] 21766 #> [1] 40459 #> [1] 21766 #> [1] 0 ``` ] --- ## References 1. Grolemund, G., & Wickham, H. (2019). R for Data Science. https://r4ds.had.co.nz/ 2. Wickham, H. (2019). Advanced R. https://adv-r.hadley.nz/