Dimensions | Homogeneous | Heterogeneous |
---|---|---|
1d | Vector (atomic vector) | List (generic vector) |
2d | Matrix | Data Frame |
nd | Array | — |
R has six basic atomic vector types, but for now we’ll only focus on the first four:
logical
double
integer
character
complex
raw
logical - boolean values TRUE
and FALSE
typeof(TRUE)
## [1] "logical"
character - character strings
typeof("hello")
## [1] "character"
typeof('world')
## [1] "character"
double - floating point numerical values (default numerical type)
typeof(1.335)
## [1] "double"
typeof(7)
## [1] "double"
integer - integer numerical values (indicated with an L
)
typeof( 7L )
## [1] "integer"
typeof( 1:3 )
## [1] "integer"
Vectors can be constructed using the c()
function.
c(1,2,3)
## [1] 1 2 3
c("Hello", "World!")
## [1] "Hello" "World!"
c(1,c(2, c(3)))
## [1] 1 2 3
R is a dynamically typed language – it will happily convert between the various types without complaint.
c(1,"Hello")
## [1] "1" "Hello"
c(FALSE, 3L)
## [1] 0 3
c(1.2, 3L)
## [1] 1.2 3.0
R uses NA
to represent missing values in its data structures.
typeof(NA)
## [1] "logical"
NaN
- Not a number
Inf
- Positive infinity
-Inf
- Negative infinity
pi / 0
## [1] Inf
0 / 0
## [1] NaN
1/0 + 1/0
## [1] Inf
1/0 - 1/0
## [1] NaN
NaN / NA
## [1] NaN
NaN * NA
## [1] NaN
What is the type of the following vectors? Explain why they have that type.
c(1, NA+1L, "C")
c(1L / 0, NA)
c(1:3, 5)
c(3L, NaN+1L)
c(NA, TRUE)
Lists are generic vectors: 1d and can contain any combination of R objects.
mylist = list("A", 1:4, c(TRUE,FALSE), (1:4)/2)
mylist
## [[1]]
## [1] "A"
##
## [[2]]
## [1] 1 2 3 4
##
## [[3]]
## [1] TRUE FALSE
##
## [[4]]
## [1] 0.5 1.0 1.5 2.0
str(mylist)
## List of 4
## $ : chr "A"
## $ : int [1:4] 1 2 3 4
## $ : logi [1:2] TRUE FALSE
## $ : num [1:4] 0.5 1 1.5 2
Lists can even contain other lists, meaning they don’t have to be flat
str( list(1, list(2, list(3))) )
## List of 2
## $ : num 1
## $ :List of 2
## ..$ : num 2
## ..$ :List of 1
## .. ..$ : num 3
Because of their more complex structure we often want to name the elements of a list (we can also do this with vectors). This can make reading and accessing the list more straight forward.
myotherlist = list(A = "hello", B = 1:4, "knock knock" = "who's there?")
str(myotherlist)
## List of 3
## $ A : chr "hello"
## $ B : int [1:4] 1 2 3 4
## $ knock knock: chr "who's there?"
names(myotherlist)
## [1] "A" "B" "knock knock"
myotherlist$B
## [1] 1 2 3 4
A data frame is the most commonly used data structure in R, they are just a list of equal length vectors (usually atomic, but you can use generic as well). Each vector is treated as a column and elements of the vectors as rows.
Most often a data frame will be constructed by reading in from a file, but we can also create them from scratch.
df = data.frame(x = 1:3, y = c("a", "b", "c"))
str(df)
## 'data.frame': 3 obs. of 2 variables:
## $ x: int 1 2 3
## $ y: Factor w/ 3 levels "a","b","c": 1 2 3
attributes(df)
## $names
## [1] "x" "y"
##
## $row.names
## [1] 1 2 3
##
## $class
## [1] "data.frame"
class(df$x)
## [1] "integer"
class(df$y)
## [1] "factor"
By default R will often attempt to convert character vectors into factors when they are included in a data frame. Sometimes this is useful, sometimes it isn’t – either way it is important to know what type/class you are working with. This behavior can be changed using the stringsAsFactors
argument.
df = data.frame(x = 1:3, y = c("a", "b", "c"), stringsAsFactors = FALSE)
str(df)
## 'data.frame': 3 obs. of 2 variables:
## $ x: int 1 2 3
## $ y: chr "a" "b" "c"
We can add rows or columns to a data frame using rbind
and cbind
respectively.
df = data.frame(x = 1:3, y = c("a","b","c"))
cbind(df, z=TRUE)
## x y z
## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 c TRUE
rbind(df, c(4,"b"))
## x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 4 b
df1 = data.frame(x = 1:3, y = c("a","b","c"))
df2 = data.frame(m = 3:1, n = c(TRUE,TRUE,FALSE))
df3 = cbind(df1,df2)
df3
## x y m n
## 1 1 a 3 TRUE
## 2 2 b 2 TRUE
## 3 3 c 1 FALSE
str(df3)
## 'data.frame': 3 obs. of 4 variables:
## $ x: int 1 2 3
## $ y: Factor w/ 3 levels "a","b","c": 1 2 3
## $ m: int 3 2 1
## $ n: logi TRUE TRUE FALSE
R has several different subsetting operators ([
, [[
, and $
).
The behavior of these operators will depend on the object they are being used with.
There are 4 main data types that can be used to subset:
Inclusion (positive integers)
Exclusion (negative integers)
Logical values
Character values (names)
Returns elements at the given location. Note that R uses a 1-based indexing scheme.
x = c(8,4,7)
x[c(1,3)]
## [1] 8 7
x[c(1,1)]
## [1] 8 8
Excludes elements at the given location
x = c(8,4,7)
x[-1]
## [1] 4 7
x[-c(1,3)]
## [1] 4
Returns elements that correspond to TRUE
in the logical vector.
x = c(-10,4,7,12)
x[c(TRUE,TRUE,FALSE,TRUE)]
## [1] -10 4 12
x[x > 3]
## [1] 4 7 12
x[x < -2 | x > 4]
## [1] -10 7 12
x[x < -2 & x > 4]
## numeric(0)
op | meaning | comp | meaning | |
---|---|---|---|---|
x | y | or | x < y | less than | |
x & y | and | x > y | greater than | |
!x | not x | x <= y | less than or equal to | |
%% | mod | x >= y | greater than or equal to | |
x != y | not equal to | |||
x == y | equal to | |||
x %in% y | x in y |
If the vector has names, select elements whose names correspond to the character vector.
x = c(a=1,b=4,c=7)
x["a"]
## a
## 1
x[c("b","c")]
## b c
## 4 7
x = c(1,4,7)
x[4]
## [1] NA
x["a"]
## [1] NA
x = c(8,4,7)
x[1]
## [1] 8
x[[1]]
## [1] 8
y = list(8,4,7)
y[2]
## [[1]]
## [1] 4
y[[2]]
## [1] 4
Below are 100 values,
x = c(56, 3, 17, 2, 4, 9, 6, 5, 19, 5, 2, 3, 5, 0, 13, 12, 6, 31, 10, 21, 8, 4, 1, 1, 2, 5, 16, 1, 3, 8, 1,
3, 4, 8, 5, 2, 8, 6, 18, 40, 10, 20, 1, 27, 2, 11, 14, 5, 7, 0, 3, 0, 7, 0, 8, 10, 10, 12, 8, 82,
21, 3, 34, 55, 18, 2, 9, 29, 1, 4, 7, 14, 7, 1, 2, 7, 4, 74, 5, 0, 3, 13, 2, 8, 1, 6, 13, 7, 1, 10,
5, 2, 4, 4, 14, 15, 4, 17, 1, 9)
write down how you would create a subset to accomplish each of the following:
Select all observations with values greater than or equal to 40
Select all observations with values less than 30 or greater than 50
Select all observations with values between 35 and 75
Remove all observations with an odd index (e.g. 1, 3, etc.)
(x = factor(c("BS", "MS", "PhD", "MS")))
## [1] BS MS PhD MS
## Levels: BS MS PhD
x[1:2]
## [1] BS MS
## Levels: BS MS PhD
x[1:2, drop=TRUE]
## [1] BS MS
## Levels: BS MS
df = data.frame(a = 1:2, b = 3:4, c = 5:6)
df[1,]
## a b c
## 1 1 3 5
df[,-2]
## a c
## 1 1 5
## 2 2 6
df[, c("a","b")]
## a b
## 1 1 3
## 2 2 4
Subsets can also be used with assignment to update specific values within an object.
x = c(1, 4, 7)
x[2] = 2
x
## [1] 1 2 7
x[1] = x[1] + 1
x
## [1] 2 2 7
x = c(1,2,1,3,2,1,2,1,3)
x[x == 1] = "male"
x[x == 2] = "female"
x[x == 3] = "other"
str(x)
## chr [1:9] "male" "female" "male" "other" "female" "male" ...
x = factor(x, levels = c("male","female","other")); str(x)
## Factor w/ 3 levels "male","female",..: 1 2 1 3 2 1 2 1 3
y = x[x != "other"]; str(y)
## Factor w/ 3 levels "male","female",..: 1 2 1 2 1 2 1
w = x[x != "other", drop = TRUE]; str(w)
## Factor w/ 2 levels "male","female": 1 2 1 2 1 2 1
See HW2.
Above materials are derived in part from the following sources: