class: center, middle, inverse, title-slide # Data structures and subsetting ## Statistical Computing & Programming ### Shawn Santo --- ## Supplementary materials Full video lecture available in Zoom Cloud Recordings Additional resources - [Sections 3.3 - 3.4](https://adv-r.hadley.nz/vectors-chap.html#attributes) Advanced R - [Chapter 4](https://adv-r.hadley.nz/subsetting.html) Advanced R --- ## To-do list Before the next lab - complete the Introductory Survey, - join our GitHub organization: https://github.com/sta323-523-sp21, - watch the warpwire video on subsetting data structures. --- class: inverse, center, middle # Recall --- ## Atomic vector creation We can use functions such as `c()`, `vector()`, and `:` to create atomic vectors. ```r c(5, 10, pi, 0, -sqrt(3)) ``` ``` #> [1] 5.000000 10.000000 3.141593 0.000000 -1.732051 ``` ```r vector(mode = "character", length = 4) ``` ``` #> [1] "" "" "" "" ``` ```r vector(mode = "integer", length = 3) ``` ``` #> [1] 0 0 0 ``` ```r -10:-3 ``` ``` #> [1] -10 -9 -8 -7 -6 -5 -4 -3 ``` --- ## Generic vector creation Function `list()` allows us to create a generic vector. ```r x <- list( a = -100:100, b = list(lower = letters, upper = LETTERS), cars_data = cars ) str(x) ``` ``` #> List of 3 #> $ a : int [1:201] -100 -99 -98 -97 -96 -95 -94 -93 -92 -91 ... #> $ b :List of 2 #> ..$ lower: chr [1:26] "a" "b" "c" "d" ... #> ..$ upper: chr [1:26] "A" "B" "C" "D" ... #> $ cars_data:'data.frame': 50 obs. of 2 variables: #> ..$ speed: num [1:50] 4 4 7 7 8 9 10 10 10 11 ... #> ..$ dist : num [1:50] 2 10 4 22 16 10 18 26 34 17 ... ``` --- class: inverse, center, middle # Attributes --- ## Data structures You may have heard of factors, matrices, arrays, and date-times. These are just atomic vectors with special attributes. - Attributes attach metadata to an object. -- - Function `attr()` can retrieve and modify a single attribute. ```r attr(x, which) # get attribute attr(x, which) <- value # set / modify attribute ``` -- - Function `attributes()` can retrieve and set attributes en masse. ```r attributes(x) # get attributes attributes(x) <- value # set / modify attributes ``` --- ## Attribute: `names` Get or set the names of an object. **One option:** ```r x <- 1:4 attributes(x) ``` ``` #> NULL ``` ```r attr(x = x, which = "names") <- c("a", "b", "c", "d") attributes(x) ``` ``` #> $names #> [1] "a" "b" "c" "d" ``` ```r x ``` ``` #> a b c d #> 1 2 3 4 ``` --- **Another option:** ```r a <- 1:4 names(a) <- c("a", "b", "c", "d") attributes(a) ``` ``` #> $names #> [1] "a" "b" "c" "d" ``` ```r a ``` ``` #> a b c d #> 1 2 3 4 ``` <br/> Either method is okay to use, but since the replacement function option exists, it is best to stick with that. --- ## Attribute: `dim` Get or set the dimension of an object. ```r z <- 1:9 z ``` ``` #> [1] 1 2 3 4 5 6 7 8 9 ``` ```r attr(x = z, which = "dim") <- c(3, 3) attributes(z) ``` ``` #> $dim #> [1] 3 3 ``` ```r z ``` ``` #> [,1] [,2] [,3] #> [1,] 1 4 7 #> [2,] 2 5 8 #> [3,] 3 6 9 ``` -- We have a 3 x 3 matrix. --- ```r y <- matrix(z, nrow = 3, ncol = 3) attributes(y) ``` ``` #> $dim #> [1] 3 3 ``` ```r y ``` ``` #> [,1] [,2] [,3] #> [1,] 1 4 7 #> [2,] 2 5 8 #> [3,] 3 6 9 ``` --- ## Exercise Create a 3 x 3 x 2 array using the `dim` attribute with the vector below. ```r x <- c(5, 1, 5, 5, 1, 1, 5, 3, 2, 3, 2, 6, 4, 4, 1, 2, 1, 3) ``` <br/> Try to create the same array using function `array()`. What do you notice about how the array object is populated? ??? ## Solution .tiny[ ```r x <- c(5, 1, 5, 5, 1, 1, 5, 3, 2, 3, 2, 6, 4, 4, 1, 2, 1, 3) attr(x = x, which = "dim") <- c(3, 3, 2) x ``` ``` #> , , 1 #> #> [,1] [,2] [,3] #> [1,] 5 5 5 #> [2,] 1 1 3 #> [3,] 5 1 2 #> #> , , 2 #> #> [,1] [,2] [,3] #> [1,] 3 4 2 #> [2,] 2 4 1 #> [3,] 6 1 3 ``` ```r attributes(x) ``` ``` #> $dim #> [1] 3 3 2 ``` ```r array(x, dim = c(3, 3, 2)) ``` ``` #> , , 1 #> #> [,1] [,2] [,3] #> [1,] 5 5 5 #> [2,] 1 1 3 #> [3,] 5 1 2 #> #> , , 2 #> #> [,1] [,2] [,3] #> [1,] 3 4 2 #> [2,] 2 4 1 #> [3,] 6 1 3 ``` ] --- ## Factors Factors are built on top of integer vectors with two attributes: `class` and `levels`. Factors are how R stores and represents categorical data. A quick way to create a categorical variable as a factor is with function `factor()`. ```r x <- factor(c("walk", "single", "double", "triple", "home run")) x ``` ``` #> [1] walk single double triple home run #> Levels: double home run single triple walk ``` -- ```r typeof(x) ``` ``` #> [1] "integer" ``` ```r attributes(x) ``` ``` #> $levels #> [1] "double" "home run" "single" "triple" "walk" #> #> $class #> [1] "factor" ``` --- ## Ordered factors To induce an ordering we can use function `ordered()` as opposed to `factor()`. ```r y <- ordered(c("walk", "single", "double", "triple", "home run"), levels = c("walk", "single", "double", "triple", "home run")) y ``` ``` #> [1] walk single double triple home run #> Levels: walk < single < double < triple < home run ``` -- ```r attributes(y) ``` ``` #> $levels #> [1] "walk" "single" "double" "triple" "home run" #> #> $class #> [1] "ordered" "factor" ``` ```r str(y) ``` ``` #> Ord.factor w/ 5 levels "walk"<"single"<..: 1 2 3 4 5 ``` --- ## Exercise Create a factor vector based on the vector of airport codes below. Try to do it without using function `factor()`. ```r airports <- c("RDU", "ABE", "DTW", "GRR", "RDU", "GRR", "GNV", "JFK", "JFK", "SFO", "DTW") ``` Assume all the possible levels are ```r c("RDU", "ABE", "DTW", "GRR", "GNV", "JFK", "SFO") ``` *Hint*: Think about what type of object factors are built on. <br/> What if the possible levels are ```r c("RDU", "ABE", "DTW", "GRR", "GNV", "JFK", "SFO", "GSO", "ORD", "PHL") ``` ??? ## Solution .tiny[ ```r z <- as.integer(c(1,2,3,4,1,4,5,6,6,7,3)) attr(x = z, which = "levels") <- c("RDU", "ABE", "DTW", "GRR", "GNV", "JFK", "SFO") attr(x = z, which = "class") <- "factor" z ``` ``` #> [1] RDU ABE DTW GRR RDU GRR GNV JFK JFK SFO DTW #> Levels: RDU ABE DTW GRR GNV JFK SFO ``` ```r attributes(z) ``` ``` #> $levels #> [1] "RDU" "ABE" "DTW" "GRR" "GNV" "JFK" "SFO" #> #> $class #> [1] "factor" ``` ] --- ## Matrices and arrays - Homogeneous in their type. - Matrices are populated based on column major ordering (use `byrow` argument to change this). - Arrays can have one, two, or more dimensions. --- ## Data frames Data frames are built on top of lists with attributes: `names`, `row.names`, and `class`. Here the class is `data.frame`. ```r typeof(longley) ``` ``` #> [1] "list" ``` ```r attributes(longley) ``` ``` #> $names #> [1] "GNP.deflator" "GNP" "Unemployed" "Armed.Forces" "Population" #> [6] "Year" "Employed" #> #> $class #> [1] "data.frame" #> #> $row.names #> [1] 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 #> [16] 1962 ``` -- Here `names` refers to variable names. --- ## Data frame characteristics - Data frames can be heterogeneous across columns. - Data frames are rectangular in structure (not always tidy). - They have column names and row names. - Data frames can be subset by name or position. --- ## Data frame creation by setting attributes Start with a list ```r x <- list(c("48501", "48507", "48505"), c(3, 4, 21), c(2, 1, 2)) str(x) ``` ``` #> List of 3 #> $ : chr [1:3] "48501" "48507" "48505" #> $ : num [1:3] 3 4 21 #> $ : num [1:3] 2 1 2 ``` -- Add attributes ```r attributes(x) <- list(class = "data.frame", names = c("zip", "lead_value", "time"), row.names = 1:3) ``` --- Then we have a data frame ```r x ``` ``` #> zip lead_value time #> 1 48501 3 2 #> 2 48507 4 1 #> 3 48505 21 2 ``` ```r str(x) ``` ``` #> 'data.frame': 3 obs. of 3 variables: #> $ zip : chr "48501" "48507" "48505" #> $ lead_value: num 3 4 21 #> $ time : num 2 1 2 ``` Of course, we could have used function `data.frame()` to create our data frame object. There is also function `tibble::tibble()` - it creates a tibble object. Similar to a data frame but with two addition class components. -- ```r tibble::tibble(x) ``` ``` #> # A tibble: 3 x 3 #> zip lead_value time #> <chr> <dbl> <dbl> #> 1 48501 3 2 #> 2 48507 4 1 #> 3 48505 21 2 ``` --- ## Length coercion Coercion is slightly different for data frames. .pull-left[ ```r data.frame(x = 1:3, y = c("a")) ``` ``` #> x y #> 1 1 a #> 2 2 a #> 3 3 a ``` ] .pull-right[ ```r data.frame(x = 1:3, y = c("a","b")) ``` ``` #> Error in #> data.frame(x = 1:3, #> y = c("a", "b")) : #> arguments imply differing number of #> rows: 3, 2 ``` ] If a shorter vector is not a multiple of the longest vector an error will occur. -- <br/> What do you think will happen here? ```r data.frame(num = 1:6, treatment = c(0, 10, 20), type = c("a", "b")) ``` --- ## Summary .small-text[ | Data Structure | Built On | Attribute(s) | Quick creation | |----------------|-----------------------|-------------------------------|--------------------------------| | Matrix, Array | Atomic vector | `dim` | `matrix()`, `array()` | | Factor | Atomic integer vector | `class`, `levels` | `factor()`, `ordered()` | | Date | Atomic double vector | `class` | `as.Date()` | | Date-times | Atomic double vector | `class` | `as.POSIXct()`, `as.POSIXlt()` | | Data frame | List | `class`, `names`, `row.names` | `data.frame()` | ] --- class: inverse, center, middle # Subsetting --- ## Subsetting techniques R has three operators (functions) for subsetting the vectors we've discussed: 1. `[` 2. `[[` 3. `$` Which one you use will depend on the object you are working with, its attributes, and what you want as a result. We can subset with - numeric values - logicals - `NULL`, `NA` - character values --- ## Numeric (positive) subsetting **Indexing begins at 1, not 0.** .tiny[ ```r x <- c("NC", "SC", "VA", "TN") y <- list(states = x, rank = 1:4, message = "") ``` ] -- .tiny.pull-left[ **Atomic vector** ```r x[1] ``` ``` #> [1] "NC" ``` ```r x[c(1, 3)] ``` ``` #> [1] "NC" "VA" ``` ```r x[c(1:5)] ``` ``` #> [1] "NC" "SC" "VA" "TN" NA ``` ```r x[c(2.2, 3.9)] ``` ``` #> [1] "SC" "VA" ``` ] .tiny.pull-right[ **List** ```r str(y[1]) ``` ``` #> List of 1 #> $ states: chr [1:4] "NC" "SC" "VA" "TN" ``` ```r str(y[c(1, 3)]) ``` ``` #> List of 2 #> $ states : chr [1:4] "NC" "SC" "VA" "TN" #> $ message: chr "" ``` ```r str(y[c(1:4)]) ``` ``` #> List of 4 #> $ states : chr [1:4] "NC" "SC" "VA" "TN" #> $ rank : int [1:4] 1 2 3 4 #> $ message: chr "" #> $ NA : NULL ``` ] --- ## Numeric (negative) subsetting .tiny[ ```r x <- c("NC", "SC", "VA", "TN") y <- list(states = x, rank = 1:4, message = "") ``` ] .tiny.pull-left[ **Atomic vector** ```r x[-1] ``` ``` #> [1] "SC" "VA" "TN" ``` ```r x[-c(1, 3)] ``` ``` #> [1] "SC" "TN" ``` ```r x[c(-1, 3)] ``` ``` #> Error in x[c(-1, 3)]: only 0's may be mixed with negative subscripts ``` ```r *x[-c(2.2, 3.9)] ``` ``` #> [1] "NC" "TN" ``` ] .tiny.pull-right[ **List** ```r str(y[-1]) ``` ``` #> List of 2 #> $ rank : int [1:4] 1 2 3 4 #> $ message: chr "" ``` ```r str(y[-c(1, 3)]) ``` ``` #> List of 1 #> $ rank: int [1:4] 1 2 3 4 ``` ```r str(y[c(-1, 3)]) ``` ``` #> Error in y[c(-1, 3)]: only 0's may be mixed with negative subscripts ``` ```r *str(y[-c(2.2, 3.9)]) ``` ``` #> List of 2 #> $ states : chr [1:4] "NC" "SC" "VA" "TN" #> $ message: chr "" ``` ] --- ## Logical subsetting It returns elements that correspond to `TRUE` in the logical vector. The length of the logical vector is expected to be of the same length as the vector being subset. .tiny.pull-left[ **Atomic vector** ```r x <- c(1, 4, 7, 12) x[c(TRUE, TRUE, FALSE, TRUE)] ``` ``` #> [1] 1 4 12 ``` ```r x[c(TRUE, FALSE)] ``` ``` #> [1] 1 7 ``` ```r x[x %% 2 == 0] ``` ``` #> [1] 4 12 ``` ] .tiny.pull-right[ **List** ```r y <- list(1, 4, 7, 12) str(y[c(TRUE, TRUE, FALSE, TRUE)]) ``` ``` #> List of 3 #> $ : num 1 #> $ : num 4 #> $ : num 12 ``` ```r str(y[c(TRUE, FALSE)]) ``` ``` #> List of 2 #> $ : num 1 #> $ : num 7 ``` ```r str(y[y %% 2 == 0]) ``` ``` #> Error in y%%2: non-numeric #> argument to binary operator ``` ] --- ## Empty subsetting Returns the original vector. ```r x <- c(1,4,7) x[] ``` ``` #> [1] 1 4 7 ``` ```r y <- list(1,4,7) str(y[]) ``` ``` #> List of 3 #> $ : num 1 #> $ : num 4 #> $ : num 7 ``` --- ## Zero subsetting Returns an empty vector of the same type as the vector being subset. ```r x <- c(1,4,7) y <- list(1,4,7) ``` .pull-left[ ```r x[0] ``` ``` #> numeric(0) ``` ```r str(y[0]) ``` ``` #> list() ``` ] .pull-right[ ```r x[c(0,1)] ``` ``` #> [1] 1 ``` ```r y[c(0,1)] ``` ``` #> [[1]] #> [1] 1 ``` ] --- ## Character subsetting If a vector has names, you can select elements whose names correspond to the character vector. .pull-left[ **Atomic vector** ```r x <- c(a = 1, b = 4, c = 7) x["a"] ``` ``` #> a #> 1 ``` ```r x[c("a", "a")] ``` ``` #> a a #> 1 1 ``` ```r x[c("c", "b")] ``` ``` #> c b #> 7 4 ``` ] .pull-right[ **List** ```r y <- list(a = 1, b = 4, c = 7) str(y["a"]) ``` ``` #> List of 1 #> $ a: num 1 ``` ```r str(y[c("a", "a")]) ``` ``` #> List of 2 #> $ a: num 1 #> $ a: num 1 ``` ```r str(y[c("c", "b")]) ``` ``` #> List of 2 #> $ c: num 7 #> $ b: num 4 ``` ] --- ## Missing and NULL subsetting .pull-left[ **Atomic vector** ```r x <- c(1, 4, 7) x[NA] ``` ``` #> [1] NA NA NA ``` ```r x[NULL] ``` ``` #> numeric(0) ``` ```r x[c(1, NA)] ``` ``` #> [1] 1 NA ``` ] .pull-right[ **List** ```r y <- list(1, 4, 7) str(y[NA]) ``` ``` #> List of 3 #> $ : NULL #> $ : NULL #> $ : NULL ``` ```r str(y[NULL]) ``` ``` #> list() ``` ```r str(y[c(1, NA)]) ``` ``` #> List of 2 #> $ : num 1 #> $ : NULL ``` ] --- ## Exercise Consider the vectors `x` and `y` below. ```r x <- letters[1:5] y <- list(i = 1:5, j = -3:3, k = rep(0, 4)) ``` What is difference between subsetting with `[` and `[[` using integers? Try various positive numeric indices. --- ## Understanding `[` vs. `[[` with lists .center[ <img src="images/shopping_cart.png" width="400" height="400"> ] -- How do you get a shopping cart with only the cheese and bananas? -- How do you get the bananas out of the cart? --- ## Using `$` for subsetting lists The `$` operator only works with named lists and works similar to `[[`. .tiny.pull-left[ ```r x <- list(a = 1:3, ab = 4:6, abc = 7:9) x ``` ``` #> $a #> [1] 1 2 3 #> #> $ab #> [1] 4 5 6 #> #> $abc #> [1] 7 8 9 ``` ```r x$a ``` ``` #> [1] 1 2 3 ``` ```r x$ab ``` ``` #> [1] 4 5 6 ``` ] .tiny.pull-right[ ```r y <- list(a = 1:3, abc = 4:6, abde = 7:9) y ``` ``` #> $a #> [1] 1 2 3 #> #> $abc #> [1] 4 5 6 #> #> $abde #> [1] 7 8 9 ``` ```r y$a ``` ``` #> [1] 1 2 3 ``` ```r *y$abd ``` ``` #> [1] 7 8 9 ``` ] --- class: inverse, center, middle # Subsetting matrices, arrays, and data frames --- ## Subsetting matrices and arrays ```r (x <- matrix(1:6, nrow = 2, ncol = 3)) ``` ``` #> [,1] [,2] [,3] #> [1,] 1 3 5 #> [2,] 2 4 6 ``` .pull-left[ ```r x[1, 3] ``` ``` #> [1] 5 ``` ```r x[1:2, 1:2] ``` ``` #> [,1] [,2] #> [1,] 1 3 #> [2,] 2 4 ``` ] .pull-right[ ```r x[, 1:2] ``` ``` #> [,1] [,2] #> [1,] 1 3 #> [2,] 2 4 ``` ```r x[-1, -3] ``` ``` #> [1] 2 4 ``` ] --- ## Do I always get a matrix (array) in return? .pull-left[ ```r x[1, ] ``` ``` #> [1] 1 3 5 ``` ```r attributes(x[1, ]) ``` ``` #> NULL ``` ] .pull-right[ ```r x[, 2] ``` ``` #> [1] 3 4 ``` ```r attributes(x[, 2]) ``` ``` #> NULL ``` ] -- For matrices and arrays `[` has an argument `drop = TRUE` that coerces the result to the lowest possible dimension. -- .tiny[ ```r x[1, , drop = FALSE] ``` ``` #> [,1] [,2] [,3] #> [1,] 1 3 5 ``` ```r attributes(x[1, , drop = FALSE]) ``` ``` #> $dim #> [1] 1 3 ``` ] --- ## Preserving vs simplifying subsetting Type | Simplifying | Preserving :----------------|:-------------------------|:----------------------------------------------------- Atomic Vector | `x[[1]]` | `x[1]` List | `x[[1]]` | `x[1]` Matrix / Array | `x[1, ]` <br/> `x[, 1]` | `x[1, , drop=FALSE]` <br/> `x[, 1, drop=FALSE]` Factor | `x[1:4, drop=TRUE]` | `x[1:4]` Data frame | `x[, 1]` <br/> `x[[1]]` | `x[, 1, drop=FALSE]` <br/> `x[1]` By preserving we mean retaining the attributes. It is good practice to use `drop = FALSE` when subsetting a n-dimensional object, where `\(n > 1\)`. <br/> The drop argument for factors controls whether the levels are preserved or not. It defaults to `drop = FALSE`. --- ## Subsetting data frames Recall that data frames are lists with attributes `class`, `names`, `row.names`. Thus, they can be subset using `[`, `[[`, and `$`. They also support matrix-style subsetting (specify rows and columns to subset). ```r df <- data.frame(coin = c("BTC", "ETH", "XRP"), price = c(10417.04, 172.52, .26), vol = c(21.29, 8.07, 1.23) ) ``` -- What will the following return? .pull-left[ ```r df[1] df[c(1, 3)] df[1:2, 3] df[, "price"] ``` ] .pull-right[ ```r df[[1]] df[["vol"]] df[[c(1, 3)]] df[[1, 3]] ``` ] ??? What will the following return? .tiny[ .pull-left[ ```r df[1] ``` ``` #> coin #> 1 BTC #> 2 ETH #> 3 XRP ``` ```r df[c(1, 3)] ``` ``` #> coin vol #> 1 BTC 21.29 #> 2 ETH 8.07 #> 3 XRP 1.23 ``` ```r df[1:2, 3] ``` ``` #> [1] 21.29 8.07 ``` ```r df[, "price"] ``` ``` #> [1] 10417.04 172.52 0.26 ``` ] .pull-right[ ```r df[[1]] ``` ``` #> [1] "BTC" "ETH" "XRP" ``` ```r df[["vol"]] ``` ``` #> [1] 21.29 8.07 1.23 ``` ```r df[[c(1, 3)]] ``` ``` #> [1] "XRP" ``` ```r df[[1, 3]] ``` ``` #> [1] 21.29 ``` ] ] --- class: inverse, center, middle # Subsetting extras --- ## Subassignment Indexing can occur on the right-hand-side of an expression for extraction or on the left-hand-side for replacement. ```r x <- c(1, 4, 7) ``` ```r x[2] <- 2 x ``` ``` #> [1] 1 2 7 ``` -- ```r x[x %% 2 != 0] <- x[x %% 2 != 0] + 1 x ``` ``` #> [1] 2 2 8 ``` -- ```r x[c(1, 1, 1, 1)] <- c(0, 7, 2, 3) ``` What is `x` now? -- ```r x ``` ``` #> [1] 3 2 8 ``` ??? Subassignment is done sequentially, so if an index is specified more than once the latest assigned value for an index will result. --- .pull-left[ ```r x <- 1:6 x[c(2, NA)] <- 1 x ``` ``` #> [1] 1 1 3 4 5 6 ``` ```r x <- 1:6 x[c(TRUE, NA)] <- 1 x ``` ``` #> [1] 1 2 1 4 1 6 ``` ] .pull-right[ ```r x <- 1:6 x[c(-1, -3)] <- 3 x ``` ``` #> [1] 1 3 3 3 3 3 ``` ```r x <- 1:6 x[] <- 6:1 x ``` ``` #> [1] 6 5 4 3 2 1 ``` ] --- ## Adding list and data frame elements ```r df <- data.frame( x = rnorm(4), y = rt(4, df = 1) ) ``` -- .tiny[ ```r df$z <- rchisq(4, df = 1) df ``` ``` #> x y z #> 1 -0.5518461 5.7648271 0.7712077 #> 2 -0.9270803 -0.4806014 1.4487278 #> 3 -1.0078601 3.3526089 1.4287586 #> 4 1.4708991 3.5458261 2.3065770 ``` ] -- .tiny[ ```r df["a"] <- rexp(4) df ``` ``` #> x y z a #> 1 -0.5518461 5.7648271 0.7712077 0.3581307 #> 2 -0.9270803 -0.4806014 1.4487278 0.8275527 #> 3 -1.0078601 3.3526089 1.4287586 1.7513987 #> 4 1.4708991 3.5458261 2.3065770 0.7897827 ``` ] --- ## Removing list and data frame elements .tiny[ ```r df <- data.frame(coin = c("BTC", "ETH", "XRP"), price = c(10417.04, 172.52, .26), vol = c(21.29, 8.07, 1.23) ) ``` ] .tiny[ ```r df["coin"] <- NULL str(df) ``` ``` #> 'data.frame': 3 obs. of 2 variables: #> $ price: num 10417.04 172.52 0.26 #> $ vol : num 21.29 8.07 1.23 ``` ```r df[[1]] <- NULL str(df) ``` ``` #> 'data.frame': 3 obs. of 1 variable: #> $ vol: num 21.29 8.07 1.23 ``` ```r df$vol <- NULL str(df) ``` ``` #> 'data.frame': 3 obs. of 0 variables ``` ] --- ## Exercises Use the built-in data frame `longley` to answer the following questions. 1. Which year was the percentage of people employed relative to the population highest? Return the result as a data frame. 2. The Korean war took place from 1950 - 1953. Filter the data frame so it only contains data from those years. 3. Which years did the number of people in the armed forces exceed the number of people unemployed? Give the result as an atomic vector. ??? ## Solutions 1. .tiny[ ```r longley[which.max(longley$Employed / longley$Population), "Year", drop=FALSE] ``` ``` #> Year #> 1956 1956 ``` ] 2. .tiny[ ```r longley[longley$Year %in% 1950:1953, ] ``` ``` #> GNP.deflator GNP Unemployed Armed.Forces Population Year Employed #> 1950 89.5 284.599 335.1 165.0 110.929 1950 61.187 #> 1951 96.2 328.975 209.9 309.9 112.075 1951 63.221 #> 1952 98.1 346.999 193.2 359.4 113.270 1952 63.639 #> 1953 99.0 365.385 187.0 354.7 115.094 1953 64.989 ``` ] 3. .tiny[ ```r longley$Year[longley$Armed.Forces > longley$Unemployed] ``` ``` #> [1] 1951 1952 1953 1955 1956 ``` ] --- ## References 1. Wickham, H. (2021). Advanced R. https://adv-r.hadley.nz/