Exercise 1

Problem

Create a 3 x 3 x 2 array using the dim attribute with the vector below.

x <- c(5, 1, 5, 5, 1, 1, 5, 3, 2, 3, 2, 6, 4, 4, 1, 2, 1, 3)

Try to create the same array using function array(). What do you notice about how the array object is populated?

Solution

x <- c(5, 1, 5, 5, 1, 1, 5, 3, 2, 3, 2, 6, 4, 4, 1, 2, 1, 3)

# set dim attribute
attr(x = x, which = "dim") <- c(3, 3, 2)
x

#> , , 1
#> 
#>      [,1] [,2] [,3]
#> [1,]    5    5    5
#> [2,]    1    1    3
#> [3,]    5    1    2
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]
#> [1,]    3    4    2
#> [2,]    2    4    1
#> [3,]    6    1    3

attributes(x)

#> $dim
#> [1] 3 3 2

array(x, dim = c(3, 3, 2))

#> , , 1
#> 
#>      [,1] [,2] [,3]
#> [1,]    5    5    5
#> [2,]    1    1    3
#> [3,]    5    1    2
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3]
#> [1,]    3    4    2
#> [2,]    2    4    1
#> [3,]    6    1    3

The array is populated using column-major ordering.

vector index	row	column	matrix
1	1	1	1
2	2	1	1
3	3	1	1
4	1	2	1
5	2	2	1
6	3	2	1
7	1	3	1
8	2	3	1
9	3	3	1
10	1	1	2
11	2	1	2
12	3	1	2
13	1	2	2
14	2	2	2
15	3	2	2
16	1	3	2
17	2	3	2
18	3	3	2

Exercise 2

Problem

Create a factor vector based on the vector of airport codes below. Try to do it without using function factor().

airports <- c("RDU", "ABE", "DTW", "GRR", "RDU", "GRR", "GNV",
             "JFK", "JFK", "SFO", "DTW")

Assume all the possible levels are

c("RDU", "ABE", "DTW", "GRR", "GNV", "JFK", "SFO")

Hint: Think about what type of object factors are built on.

What if the possible levels are

c("RDU", "ABE", "DTW", "GRR", "GNV", "JFK", "SFO", "GSO", "ORD", "PHL")

Solution

Factors are built on atomic integer vectors. First we’ll create an integer vector that serves as a mapping to the unique airport codes.

z <- as.integer(c(1,2,3,4,1,4,5,6,6,7,3))

# set levels
attr(x = z, which = "levels") <- c("RDU", "ABE", "DTW", 
                                   "GRR", "GNV", "JFK", "SFO")

# set class to be factor
attr(x = z, which = "class") <- "factor"
z

#>  [1] RDU ABE DTW GRR RDU GRR GNV JFK JFK SFO DTW
#> Levels: RDU ABE DTW GRR GNV JFK SFO

attributes(z)

#> $levels
#> [1] "RDU" "ABE" "DTW" "GRR" "GNV" "JFK" "SFO"
#> 
#> $class
#> [1] "factor"

If the levels change, then we need to adjust our integer vector to have 9 integer values.

Exercise 3

Problem

Consider the vectors x and y below.

x <- letters[1:5]
y <- list(i = 1:5, j = -3:3, k = rep(0, 4))

What is difference between subsetting with [ and [[ using integers? Try various indices.

Solution

Let’s look at atomic vectors first.

x[[4]]

#> [1] "d"

x[4]

#> [1] "d"

x[[1:3]]

#> Error in x[[1:3]]: attempt to select more than one element in vectorIndex

x[1:3]

#> [1] "a" "b" "c"

Subsetting with a vector of length one appears to be the same using [ versus [[. We cannot use [[ and subset with a vector of length greater than one.

Doing the same for y yields

y[[2]]

#> [1] -3 -2 -1  0  1  2  3

y[2]

#> $j
#> [1] -3 -2 -1  0  1  2  3

y[[1:2]]

#> [1] 2

y[1:2]

#> $i
#> [1] 1 2 3 4 5
#> 
#> $j
#> [1] -3 -2 -1  0  1  2  3

y[[1:3]]

#> Error in y[[1:3]]: recursive indexing failed at level 2

y[1:3]

#> $i
#> [1] 1 2 3 4 5
#> 
#> $j
#> [1] -3 -2 -1  0  1  2  3
#> 
#> $k
#> [1] 0 0 0 0

We see similar results for the generic vector y. We still should not use [[ when subsetting with a vector of length greater than one. The result of y[[1:2]] seems to work, but it is not doing what you think. That code is equivalent to y[[1]][2].

Exercise 4

Problem

Consider the atomic vector x.

set.seed(73961)
x <- sample(1:100, size = 100, replace = TRUE)
x

#>   [1]  13  99  21  78   5   2  11  99  46  97  57  53  97  12  41  50  59  58
#>  [19]   1  63  43  89  31  23  51  35  22  34  82  29  92  44  67  62  59  48
#>  [37]  49  20   9  36  77  94  86  54  11  79  49  37  29  52  30   4  23  65
#>  [55]  20  15  28   4  43  16  49  69  23  36  25  93   2  21  68   6  37  94
#>  [73]  99  40  11  26  93  56  78   9  61  60  91  20  74  55  97  98  45  98
#>  [91]  74  46  94  38  26  78  49  25  21 100

Use subsetting to

select every third value from x beginning at position 6;
remove all values with an odd index;
remove all numbers divisible by 3 or 5 and replace them with 0.

Solution

Recall that x is

#>   [1]  13  99  21  78   5   2  11  99  46  97  57  53  97  12  41  50  59  58
#>  [19]   1  63  43  89  31  23  51  35  22  34  82  29  92  44  67  62  59  48
#>  [37]  49  20   9  36  77  94  86  54  11  79  49  37  29  52  30   4  23  65
#>  [55]  20  15  28   4  43  16  49  69  23  36  25  93   2  21  68   6  37  94
#>  [73]  99  40  11  26  93  56  78   9  61  60  91  20  74  55  97  98  45  98
#>  [91]  74  46  94  38  26  78  49  25  21 100

Part 1

x[seq(6, length(x), by = 3)]

#>  [1]  2 46 53 41 58 43 23 22 29 67 48  9 94 11 37 30 65 28 16 23 93 68 94 11 56
#> [26] 61 20 97 98 94 78 21

Part 2

x[-seq(1, length(x), 2)]

#>  [1]  99  78   2  99  97  53  12  50  58  63  89  23  35  34  29  44  62  48  20
#> [20]  36  94  54  79  37  52   4  65  15   4  16  69  36  93  21   6  94  40  26
#> [39]  56   9  60  20  55  98  98  46  38  78  25 100

Part 3

x[!(x %% 3) | !(x %% 5)] <- 0
x

#>   [1] 13  0  0  0  0  2 11  0 46 97  0 53 97  0 41  0 59 58  1  0 43 89 31 23  0
#>  [26]  0 22 34 82 29 92 44 67 62 59  0 49  0  0  0 77 94 86  0 11 79 49 37 29 52
#>  [51]  0  4 23  0  0  0 28  4 43 16 49  0 23  0  0  0  2  0 68  0 37 94  0  0 11
#>  [76] 26  0 56  0  0 61  0 91  0 74  0 97 98  0 98 74 46 94 38 26  0 49  0  0  0

Exercise 5

Problem

Consider the list given below.

x <- jsonlite::fromJSON('{
  "id" : "8671703e-aab7-47a8-818e-a83a91278658",
  "index" : 1,
  "period" : 1,
  "timestamp" : "00:00:00.000",
  "minute" : 0,
  "second" : 0,
  "type" : {
    "id" : 35,
    "name" : "Starting XI"
  },
  "possession" : 1,
  "possession_team" : {
    "id" : 746,
    "name" : "Manchester City WFC"
  }}')

str(x)

#> List of 9
#>  $ id             : chr "8671703e-aab7-47a8-818e-a83a91278658"
#>  $ index          : int 1
#>  $ period         : int 1
#>  $ timestamp      : chr "00:00:00.000"
#>  $ minute         : int 0
#>  $ second         : int 0
#>  $ type           :List of 2
#>   ..$ id  : int 35
#>   ..$ name: chr "Starting XI"
#>  $ possession     : int 1
#>  $ possession_team:List of 2
#>   ..$ id  : int 746
#>   ..$ name: chr "Manchester City WFC"

Subset x to obtain the period, minute, and second as a list.
Subset x to obtain the possession_team name as a character vector.
Subset x to obtain a list with type as the first component of the list.

Solution

Part 1

x[c("period", "minute", "second")]

#> $period
#> [1] 1
#> 
#> $minute
#> [1] 0
#> 
#> $second
#> [1] 0

Part 2

x$possession_team$name

#> [1] "Manchester City WFC"

Part 3

x["type"]

#> $type
#> $type$id
#> [1] 35
#> 
#> $type$name
#> [1] "Starting XI"

Exercise 6

Problem

Use the built-in data frame longley to answer the following questions.

Which year was the percentage of people employed relative to the population highest? Return the result as a data frame.
The Korean war took place from 1950 - 1953. Filter the data frame so it only contains data from those years.
Which years did the number of people in the armed forces exceed the number of people unemployed? Give the result as an atomic vector.

Solution

Part 1

longley[which.max(longley$Employed / longley$Population), 
        "Year", drop=FALSE]

Part 2

longley[longley$Year %in% 1950:1953, ]

Part 3

longley$Year[longley$Armed.Forces > longley$Unemployed]

#> [1] 1951 1952 1953 1955 1956

Exercises: Data structures and subsetting

Shawn Santo

Exercise 1

Problem

Solution

Exercise 2

Problem

Solution

Exercise 3

Problem

Solution

Exercise 4

Problem

Solution

Part 1

Part 2

Part 3

Exercise 5

Problem

Solution

Part 1

Part 2

Part 3

Exercise 6

Problem

Solution

Part 1

Part 2

Part 3

vector index	row	column	matrix
1	1	1	1
2	2	1	1
3	3	1	1
4	1	2	1
5	2	2	1
6	3	2	1
7	1	3	1
8	2	3	1
9	3	3	1
10	1	1	2
11	2	1	2
12	3	1	2
13	1	2	2
14	2	2	2
15	3	2	2
16	1	3	2
17	2	3	2
18	3	3	2

vector index	row	column	matrix
1	1	1	1
2	2	1	1
3	3	1	1
4	1	2	1
5	2	2	1
6	3	2	1
7	1	3	1
8	2	3	1
9	3	3	1
10	1	1	2
11	2	1	2
12	3	1	2
13	1	2	2
14	2	2	2
15	3	2	2
16	1	3	2
17	2	3	2
18	3	3	2

vector index	row	column	matrix
1	1	1	1
2	2	1	1
3	3	1	1
4	1	2	1
5	2	2	1
6	3	2	1
7	1	3	1
8	2	3	1
9	3	3	1
10	1	1	2
11	2	1	2
12	3	1	2
13	1	2	2
14	2	2	2
15	3	2	2
16	1	3	2
17	2	3	2
18	3	3	2