R has several different subsetting operators ([
, [[
, and $
).
The behavior of these operators will depend on the object they are being used with.
In general there are 6 different data types that can be used to subset:
Positive integers
Negative integers
Logical values
Empty
Zero
Returns elements at the given location (note R uses a 1-based not a 0-based indexing scheme).
x = c(1,4,7)
x[c(1,3)]
## [1] 1 7
x[c(1,1)]
## [1] 1 1
x[c(1.9,2.1)]
## [1] 1 4
y = list(1,4,7)
str( y[c(1,3)] )
## List of 2
## $ : num 1
## $ : num 7
str( y[c(1,1)] )
## List of 2
## $ : num 1
## $ : num 1
str( y[c(1.9,2.1)] )
## List of 2
## $ : num 1
## $ : num 4
Excludes elements at the given location
x = c(1,4,7)
x[-1]
## [1] 4 7
x[-c(1,3)]
## [1] 4
x[c(-1,2)]
## Error in x[c(-1, 2)]: only 0's may be mixed with negative subscripts
y = list(1,4,7)
str( y[-1] )
## List of 2
## $ : num 4
## $ : num 7
str( y[-c(1,3)] )
## List of 1
## $ : num 4
y[c(-1,2)]
## Error in y[c(-1, 2)]: only 0's may be mixed with negative subscripts
Returns elements that correspond to TRUE
in the logical vector. Length of the logical vector is expected to be the same of the vector being subsetted.
x = c(1,4,7,12)
x[c(TRUE,TRUE,FALSE,TRUE)]
## [1] 1 4 12
x[c(TRUE,FALSE)]
## [1] 1 7
x[x %% 2 == 0]
## [1] 4 12
y = list(1,4,7,12)
str( y[c(TRUE,TRUE,FALSE,TRUE)] )
## List of 3
## $ : num 1
## $ : num 4
## $ : num 12
str( y[c(TRUE,FALSE)] )
## List of 2
## $ : num 1
## $ : num 7
str( y[y %% 2 == 0] )
## Error in y%%2: non-numeric argument to binary operator
Returns the original vector.
x = c(1,4,7)
x[]
## [1] 1 4 7
y = list(1,4,7)
str(y[])
## List of 3
## $ : num 1
## $ : num 4
## $ : num 7
Returns an empty vector of the same type as the vector being subseted.
x = c(1,4,7)
x[0]
## numeric(0)
y = list(1,4,7)
str(y[0])
## list()
If the vector has names, select elements whose names correspond to the character vector.
x = c(a=1,b=4,c=7)
x["a"]
## a
## 1
x[c("b","c")]
## b c
## 4 7
y = list(a=1,b=4,c=7)
str(y["a"])
## List of 1
## $ a: num 1
str(y[c("b","c")])
## List of 2
## $ b: num 4
## $ c: num 7
x = c(1,4,7)
x[4]
## [1] NA
x["a"]
## [1] NA
x[c(1,4)]
## [1] 1 NA
y = list(1,4,7)
str(y[4])
## List of 1
## $ : NULL
str(y["a"])
## List of 1
## $ : NULL
str(y[c(1,4)])
## List of 2
## $ : num 1
## $ : NULL
x = c(1,4,7)
x[NA]
## [1] NA NA NA
x[NULL]
## numeric(0)
x[c(1,NA)]
## [1] 1 NA
y = list(1,4,7)
str(y[NA])
## List of 3
## $ : NULL
## $ : NULL
## $ : NULL
str(y[NULL])
## list()
str(y[c(1,NA)])
## List of 2
## $ : num 1
## $ : NULL
[[
subsets like [
except it only subsets a single value. Note that for lists the returned value may not be a list (more on this later).
x = c(1,4,7)
x[[1]]
## [1] 1
y = list(1,4,7)
y[2]
## [[1]]
## [1] 4
y[[2]]
## [1] 4
$
is equivalent to [[
for character subsetting of lists, by default it uses partial matching (exact=FALSE
).
x = c("abc"=1, "def"=5)
x$abc
## Error in x$abc: $ operator is invalid for atomic vectors
y = list("abc"=1, "def"=5)
y$abc
## [1] 1
y$d
## [1] 5
op | Vectorized | Comp | Vectorized | |
---|---|---|---|---|
x | y | True | x < y | True | |
x & y | True | x > y | True | |
!x | True | x <= y | True | |
x || y | False | x >= y | True | |
x && y | False | x != y | True | |
xor(x,y) | True | x == y | True | |
x %in% y | True (for x) |
Below are 100 values,
x = c(56, 3, 17, 2, 4, 9, 6, 5, 19, 5, 2, 3, 5, 0, 13, 12, 6, 31, 10, 21, 8, 4, 1, 1, 2, 5, 16, 1, 3, 8, 1,
3, 4, 8, 5, 2, 8, 6, 18, 40, 10, 20, 1, 27, 2, 11, 14, 5, 7, 0, 3, 0, 7, 0, 8, 10, 10, 12, 8, 82,
21, 3, 34, 55, 18, 2, 9, 29, 1, 4, 7, 14, 7, 1, 2, 7, 4, 74, 5, 0, 3, 13, 2, 8, 1, 6, 13, 7, 1, 10,
5, 2, 4, 4, 14, 15, 4, 17, 1, 9)
write down how you would create a subset to accomplish each of the following:
Select every third value starting at position 2 in x
.
Remove all values with an odd index (e.g. 1, 3, etc.)
Select only the values that are primes. (You may assume all values are less than 100)
Remove every 4th value, but only if it is odd.
Atomic vectors can be treated as multidimensional (2 or more) objects by adding a dim
attribute.
x = 1:8
dim(x) = c(2,4)
x
## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 7
## [2,] 2 4 6 8
matrix(1:8, nrow=2, ncol=4)
## [,1] [,2] [,3] [,4]
## [1,] 1 3 5 7
## [2,] 2 4 6 8
x = 1:8
attr(x,"dim") = c(2,2,2)
x
## , , 1
##
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## , , 2
##
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
x = array(1:8,c(2,2,2))
x
## , , 1
##
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## , , 2
##
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
x = array(1:8,c(2,2,2))
rownames(x) = LETTERS[1:2]
colnames(x) = LETTERS[3:4]
dimnames(x)[[3]] = LETTERS[5:6]
x
## , , E
##
## C D
## A 1 3
## B 2 4
##
## , , F
##
## C D
## A 5 7
## B 6 8
str(x)
## int [1:2, 1:2, 1:2] 1 2 3 4 5 6 7 8
## - attr(*, "dimnames")=List of 3
## ..$ : chr [1:2] "A" "B"
## ..$ : chr [1:2] "C" "D"
## ..$ : chr [1:2] "E" "F"
(x = matrix(1:6, nrow=2, ncol=3, dimnames=list(c("A","B"),c("M","N","O"))))
## M N O
## A 1 3 5
## B 2 4 6
x[1,3]
## [1] 5
x[1:2, 1:2]
## M N
## A 1 3
## B 2 4
x[, 1:2]
## M N
## A 1 3
## B 2 4
x[-1,-3]
## M N
## 2 4
x["A","M"]
## [1] 1
x["A", c("M","O")]
## M O
## 1 5
x[, "C"]
## Error in x[, "C"]: subscript out of bounds
x[1,"M"]
## [1] 1
x["B",]
## M N O
## 2 4 6
x["B"]
## [1] NA
x[-1]
## [1] 2 3 4 5 6
By default R’s [
subset operator is preserving subset operator, in that the returned object will have the same type as the parent. Confusingly, when used with a matrix or array [
becomes a simplifying operator (does not preserve type) - this behavior can be controlled by the drop
argument.
x = matrix(1:6, nrow=2, ncol=3, dimnames=list(c("A","B"),c("M","N","O")))
x[1, ]
## M N O
## 1 3 5
x[1, , drop=TRUE]
## M N O
## 1 3 5
x[1, , drop=FALSE]
## M N O
## A 1 3 5
Simplifying | Preserving | |
---|---|---|
Vector | x[[1]] |
x[1] |
List | x[[1]] |
x[1] |
Array | x[1, ] x[, 1] |
x[1, , drop = FALSE] x[, 1, drop = FALSE] |
Factor | x[1:4, drop = TRUE] |
x[1:4] |
Data frame | x[, 1] x[[1]] |
x[, 1, drop = FALSE] x[1] |