R has several different subsetting operators ([, [[, and $). The behavior of these operators will depend on the object they are being used with. In general there are 6 different data types that can be used to subset: • Positive integers • Negative integers • Logical values • Empty • Zero • Character values (names) ## Subsetting Vectors ## Vectors - Positive Ints Returns elements at the given location - note R uses a 1-based indexing scheme. x = c(1,4,7) x[c(1,3)] ## [1] 1 7 x[c(1,1)] ## [1] 1 1 x[c(1.9,2.1)] ## [1] 1 4 y = list(1,4,7) str( y[c(1,3)] ) ## List of 2 ##$ : num 1
##  $: num 7 str( y[c(1,1)] ) ## List of 2 ##$ : num 1
##  $: num 1 str( y[c(1.9,2.1)] ) ## List of 2 ##$ : num 1
##  $: num 4 ## Vectors - Negative Ints Excludes elements at the given location x = c(1,4,7) x[-1] ## [1] 4 7 x[-c(1,3)] ## [1] 4 x[c(-1,2)] ## Error: only 0's may be mixed with negative subscripts y = list(1,4,7) str( y[-1] ) ## List of 2 ##$ : num 4
##  $: num 7 str( y[-c(1,3)] ) ## List of 1 ##$ : num 4
y[c(-1,2)]
## Error: only 0's may be mixed with negative subscripts

## Vectors - Logical Values

Returns elements that correspond to TRUE in the logical vector. Length of the logical vector is expected to be the same of the vector being subsetted.

x = c(1,4,7,12)
x[c(TRUE,TRUE,FALSE,TRUE)]
## [1]  1  4 12
x[c(TRUE,FALSE)]
## [1] 1 7
x[x %% 2 == 0]
## [1]  4 12

y = list(1,4,7,12)
str( y[c(TRUE,TRUE,FALSE,TRUE)] )
## List of 3
##  $: num 1 ##$ : num 4
##  $: num 12 str( y[c(TRUE,FALSE)] ) ## List of 2 ##$ : num 1
##  $: num 7 str( y[y %% 2 == 0] ) ## Error: non-numeric argument to binary operator ## Vectors - Empty Returns the original vector. x = c(1,4,7) x[] ## [1] 1 4 7 y = list(1,4,7) str(y[]) ## List of 3 ##$ : num 1
##  $: num 4 ##$ : num 7

## Vectors - Zero

Returns an empty vector

x = c(1,4,7)
x[0]
## numeric(0)
y = list(1,4,7)
str(y[0])
##  list()

## Vectors - Character Values

If the vector has names, select elements whose names correspond to the character vector.

x = c(a=1,b=4,c=7)
x["a"]
## a
## 1
x[c("b","c")]
## b c
## 4 7

y = list(a=1,b=4,c=7)
str(y["a"])
## List of 1
##  $a: num 1 str(y[c("b","c")]) ## List of 2 ##$ b: num 4
##  $c: num 7 ## Vectors - Out of bound subsetting x = c(1,4,7) x[4] ## [1] NA x["a"] ## [1] NA y = list(1,4,7) str(y[4]) ## List of 1 ##$ : NULL
str(y["a"])
## List of 1
##  $: NULL ## Vectors - Missing and NULL x = c(1,4,7) x[NA] ## [1] NA NA NA x[NULL] ## numeric(0) y = list(1,4,7) str(y[NA]) ## List of 3 ##$ : NULL
##  $: NULL ##$ : NULL
str(y[NULL])
##  list()

## Vectors - [ vs. [[

[[ subsets like [ except it only subsets a single value. Note that for lists the returned value may not be a list (more on this later).

x = c(1,4,7)
x[[1]]
## [1] 1
y = list(1,4,7)
y[2]
## [[1]]
## [1] 4
y[[2]]
## [1] 4

## Vectors - [[ vs. $$ is equivalent to [[ for character subsetting of lists, by default it uses partial matching (exact=FALSE).

x = c("abc"=1, "def"=5)
x$abc ## Error:$ operator is invalid for atomic vectors
y = list("abc"=1, "def"=5)
y$abc ## [1] 1 y$d
## [1] 5

## Logical operators and comparisons

op Elementwise Comp Elementwise
x | y True x < y True
x & y True x > y True
!x True x <= y True
x || y False x >= y True
x && y False x != y True
xor(x,y) True x == y True
x %in% y True (for x)

## Exercise 1

Below are 100 values,

x = c(56, 3, 17, 2, 4, 9, 6, 5, 19, 5, 2, 3, 5, 0, 13, 12, 6, 31, 10, 21, 8, 4, 1, 1, 2, 5, 16, 1, 3, 8, 1,
3, 4, 8, 5, 2, 8, 6, 18, 40, 10, 20, 1, 27, 2, 11, 14, 5, 7, 0, 3, 0, 7, 0, 8, 10, 10, 12, 8, 82,
21, 3, 34, 55, 18, 2, 9, 29, 1, 4, 7, 14, 7, 1, 2, 7, 4, 74, 5, 0, 3, 13, 2, 8, 1, 6, 13, 7, 1, 10,
5, 2, 4, 4, 14, 15, 4, 17, 1, 9)

write down how you would create a subset to accomplish each of the following:

• Select every third value starting at position 2 in x.

• Remove all values with an odd index (e.g. 1, 3, etc.)

• Select only the values that are primes. (You may assume all values are less than 100)

• Remove every 4th value, but only if it is odd.

## Matrices and Arrays

Atomic vectors can be treated as multidimensional (2 or more) objects by adding a dim attribute.

x = 1:8
dim(x) = c(2,4)
x
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8
matrix(1:8, nrow=2, ncol=4)
##      [,1] [,2] [,3] [,4]
## [1,]    1    3    5    7
## [2,]    2    4    6    8
x = 1:8
attr(x,"dim") = c(2,2,2)
x
## , , 1
##
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
##
## , , 2
##
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8
x = array(1:8,c(2,2,2))
x
## , , 1
##
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
##
## , , 2
##
##      [,1] [,2]
## [1,]    5    7
## [2,]    6    8

## Naming dimensions

x = array(1:8,c(2,2,2))
colnames(x) = LETTERS[1:2]
rownames(x) = LETTERS[3:4]
dimnames(x)[[3]] = LETTERS[5:6]
x
## , , E
##
##   A B
## C 1 3
## D 2 4
##
## , , F
##
##   A B
## C 5 7
## D 6 8
str(x)
##  int [1:2, 1:2, 1:2] 1 2 3 4 5 6 7 8
##  - attr(*, "dimnames")=List of 3
##   ..$: chr [1:2] "C" "D" ## ..$ : chr [1:2] "A" "B"
##   ..$: chr [1:2] "E" "F" ## Subsetting Matrices x = matrix(1:6, nrow=2, ncol=3, dimnames=list(c("A","B"),c("M","N","O"))) x[1,3] ## [1] 5 x[1:2, 1:2] ## M N ## A 1 3 ## B 2 4 x[1:2,] ## M N O ## A 1 3 5 ## B 2 4 6 x[, 1:2] ## M N ## A 1 3 ## B 2 4 x[-1,-3] ## M N ## 2 4 x[2,-1] ## N O ## 4 6 x["A","M"] ## [1] 1 x["A", c("M","O")] ## M O ## 1 5 x["B",] ## M N O ## 2 4 6 x[, "C"] ## Error: subscript out of bounds x[1,"M"] ## [1] 1 x["B"] ## [1] NA x[1] ## [1] 1 x[-1] ## [1] 2 3 4 5 6 ## Preserving Subsetting x = matrix(1:6, nrow=2, ncol=3, dimnames=list(c("A","B"),c("M","N","O"))) x[1, , drop=FALSE] ## M N O ## A 1 3 5 x[, 2, drop=FALSE] ## N ## A 3 ## B 4 ## Preserving vs Simplifying Subsets Simplifying Preserving Vector x[[1]] x[1] List x[[1]] x[1] Array x[1, ] x[, 1] x[1, , drop = FALSE] x[, 1, drop = FALSE] Factor x[1:4, drop = TRUE] x[1:4] Data frame x[, 1] x[[1]] x[, 1, drop = FALSE] x[1] ## Factor Subsetting (x = factor(c("BS", "MS", "PhD", "MS"))) ## [1] BS MS PhD MS ## Levels: BS MS PhD x[1:2] ## [1] BS MS ## Levels: BS MS PhD x[1:2, drop=TRUE] ## [1] BS MS ## Levels: BS MS ## Data Frame Subsetting df = data.frame(a = 1:2, b = 3:4) str(df[1]) ## 'data.frame': 2 obs. of 1 variable: ##$ a: int  1 2
str(df[[1]])
##  int [1:2] 1 2
str(df[, "a", drop = FALSE])
## 'data.frame':    2 obs. of  1 variable:
##  $a: int 1 2 str(df[, "a"]) ## int [1:2] 1 2 str(df["a"]) ## 'data.frame': 2 obs. of 1 variable: ##$ a: int  1 2
str(df[c("a","b","a")])
## 'data.frame':    2 obs. of  3 variables:
##  $a : int 1 2 ##$ b  : int  3 4
##  $a.1: int 1 2 str(df[c(FALSE,TRUE)]) ## 'data.frame': 2 obs. of 1 variable: ##$ b: int  3 4

## Subsetting and assignment

Subsets can also be used with assignment to update specific values within an object.

x = c(1, 4, 7)
x[2] = 2
x
## [1] 1 2 7
x[x %% 2 != 0] = x[x %% 2 != 0] + 1
x
## [1] 2 2 8
x[c(1,1)] = c(2,3)
x
## [1] 3 2 8
x = 1:6
x[c(2,NA)] = 1
x
## [1] 1 1 3 4 5 6
x[c(TRUE,NA)] = 1
x
## [1] 1 1 1 4 1 6
x[c(-1,-3)] = 3
x
## [1] 1 3 1 3 3 3
x[] = 6:1
x
## [1] 6 5 4 3 2 1

## Deleting list (df) elements

df = data.frame(a = 1:2, b = TRUE, c = c("A", "B"))
df[["b"]] = NULL
str(df)
## 'data.frame':    2 obs. of  2 variables:
##  $a: int 1 2 ##$ c: Factor w/ 2 levels "A","B": 1 2
df[,"c"] = NULL
str(df)
## 'data.frame':    2 obs. of  1 variable:

## Exercise 2

Load the course eval data set using the following command:

d = read.csv("~cr173/Sta523/data/evals.csv")

This data frame contains the following variables (columns):

• cls_val - students' average course value rating
• prof_val - students' average professor value rating
• rank - professor's rank (0 - teaching, 1 - tenure track, 2 - tenured)
• gender - professor's gender (0 - male, 1 - female)
• cls_level - class level (0 - lower division, 1 - upper division)

Some of the values in data frame are missing. They have been coded using the value -999, make sure that they are properly treated as NAs.

Use subsetting to replace the values of the categorical variables with the appropriate character strings (do not use factors).