Dimensions | Homogeneous | Heterogeneous |
---|---|---|
1d | Vector (atomic vector) | List (generic vector) |
2d | Matrix | Data Frame |
nd | Array | — |
R has six atomic vector types:
typeof | mode | storage.mode |
---|---|---|
logical | logical | logical |
double | numeric | double |
integer | numeric | integer |
character | character | character |
complex | complex | complex |
raw | raw | raw |
For now we’ll only worry about the first four.
logical - boolean values TRUE
and FALSE
typeof(TRUE)
## [1] "logical"
character - character strings
typeof("hello")
## [1] "character"
typeof('world')
## [1] "character"
double - floating point numerical values (default numerical type)
typeof(1.33)
## [1] "double"
typeof(7)
## [1] "double"
integer - integer numerical values (indicated with an L
)
typeof( 7L )
## [1] "integer"
typeof( 1:3 )
## [1] "integer"
Atomic vectors can be constructed using the c()
function, note that vectors are always flat.
c(1,2,3)
## [1] 1 2 3
c("Hello", "World!")
## [1] "Hello" "World!"
c(1,c(2, c(3)))
## [1] 1 2 3
typeof(x)
- returns a character vector of the type of object x
.
is.logical(x)
- returns TRUE
if x
has type logical.
is.character(x)
- returns TRUE
if x
has type character.
is.double(x)
- returns TRUE
if x
has type double.
is.integer(x)
- returns TRUE
if x
has type integer.
is.numeric(x)
- returns TRUE
if x
has mode numeric.
is.numeric(7L)
## [1] TRUE
is.numeric(7)
## [1] TRUE
typeof(7L)
## [1] "integer"
typeof(7)
## [1] "double"
is.atomic(x)
- returns TRUE
if x
is an atomic vector.
is.vector(x)
- returns TRUE
if x
is any type of vector (e.g. atomic vector or list).
is.atomic(c(1,2,3))
## [1] TRUE
is.vector(c(1,2,3))
## [1] TRUE
is.atomic(list(1,2,3))
## [1] FALSE
is.vector(list(1,2,3))
## [1] TRUE
R is a dynamically typed language – it will happily convert between the various types without complaint.
c(1,"Hello")
## [1] "1" "Hello"
c(FALSE, 3L)
## [1] 0 3
c(1.2, 3L)
## [1] 1.2 3.0
Functions and operators with attempt to coerce object to an appropriate type
3.1+1L
## [1] 4.1
log(TRUE)
## [1] 0
TRUE & 7
## [1] TRUE
FALSE | !5
## [1] FALSE
Most of the is
functions we just saw have an as
variant which can be used for explicit coercion.
as.logical(5.2)
## [1] TRUE
as.character(TRUE)
## [1] "TRUE"
as.integer(pi)
## [1] 3
as.numeric(FALSE)
## [1] 0
as.double("7.2")
## [1] 7.2
as.double("one")
## Warning: NAs introduced by coercion
## [1] NA
R uses NA
to represent missing values in its data structures, what may not be obvious is that there are different NA
s for the different types.
typeof(NA)
## [1] "logical"
typeof(NA+1)
## [1] "double"
typeof(NA+1L)
## [1] "integer"
Because NA
s represent missing values it makes sense that any calculation using them should also be missing.
1 / NA
## [1] NA
NA * 5
## [1] NA
mean(c(1,2,3,NA))
## [1] NA
3^NA
## [1] NA
This makes sense but can be problematic in some cases (particularly for control flow)
1 == NA
## [1] NA
if (2 != NA)
print("Here")
## Error in if (2 != NA) print("Here"): missing value where TRUE/FALSE needed
To explicitly test if a value is missing it is necessary to use is.na
(often along with any
or all
).
is.na(NA)
## [1] TRUE
is.na(1)
## [1] FALSE
is.na(c(1,2,3,NA))
## [1] FALSE FALSE FALSE TRUE
any(is.na(c(1,2,3,NA)))
## [1] TRUE
all(is.na(c(1,2,3,NA)))
## [1] FALSE
NaN
- Not a number
Inf
- Positive infinity
-Inf
- Negative infinity
pi / 0
## [1] Inf
0 / 0
## [1] NaN
1/0 + 1/0
## [1] Inf
1/0 - 1/0
## [1] NaN
NaN / NA
## [1] NaN
NaN * NA
## [1] NA
NaN
and Inf
don’t have the same testing issues that NA
has, but there are still convenience functions for testing for
is.finite(-Inf)
## [1] FALSE
is.finite(1/0+1/0)
## [1] FALSE
is.nan(1/0-1/0)
## [1] TRUE
First remember that Inf
, -Inf
, and NaN
are exclusively of type double, however their coercion behavior is not the same as for other double values.
as.integer(Inf)
## Warning: NAs introduced by coercion to integer range
## [1] NA
as.logical(Inf)
## [1] TRUE
as.logical(NaN)
## [1] NA
as.character(Inf)
## [1] "Inf"
as.character(NaN)
## [1] "NaN"
Part 1
What is the type of the following vectors? Explain why they have that type.
c(1, NA+1L, "C")
c(1L / 0, NA)
c(1:3, 5)
c(3L, NaN+1L)
c(NA, TRUE)
Part 2
Considering only the four (common) data types, what is R’s implicit type conversion hierarchy (from highest priority to lowest priority)? Hint - think about the pairwise interactions between types.
Lists are generic vectors, in that they are 1d and can contain any combination of R objects.
list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2)
## [[1]]
## [1] "A"
##
## [[2]]
## [1] TRUE FALSE
##
## [[3]]
## [1] 0.5 1.0 1.5 2.0
##
## [[4]]
## function (x)
## x^2
str( list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2) )
## List of 4
## $ : chr "A"
## $ : logi [1:2] TRUE FALSE
## $ : num [1:4] 0.5 1 1.5 2
## $ :function (x)
## ..- attr(*, "srcref")=Class 'srcref' atomic [1:8] 1 40 1 54 40 54 1 1
## .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fe7cdd44120>
Lists can even contain other lists, meaning they don’t have to be flat
str( list(1, list(2, list(3))) )
## List of 2
## $ : num 1
## $ :List of 2
## ..$ : num 2
## ..$ :List of 1
## .. ..$ : num 3
By default a vector will be coerced to a list (as a list is more generic) if needed
str( c(1:3,list(4,5,list(6,7))) )
## List of 6
## $ : int 1
## $ : int 2
## $ : int 3
## $ : num 4
## $ : num 5
## $ :List of 2
## ..$ : num 6
## ..$ : num 7
We can force a list back to a vector using unlist
unlist(list(1:3, list(4:5, 6)))
## [1] 1 2 3 4 5 6
unlist( list(1, list(2, list(3, "Hello"))) )
## [1] "1" "2" "3" "Hello"
Note that this also has the effect of flattening the list (since atomic vectors must be flat).
Because of their more complex structure we often want to name the elements of a list (we can also do this with vectors). This can make reading and accessing the list more straight forward.
str(list(A = 1, B = list(C = 2, D = 3)))
## List of 2
## $ A: num 1
## $ B:List of 2
## ..$ C: num 2
## ..$ D: num 3
list("knock knock" = "who's there?")
## $`knock knock`
## [1] "who's there?"
names(list(ABC=1, DEF=list(H=2, I=3)))
## [1] "ABC" "DEF"
Represent the following JSON data as a list in R.
{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address":
{
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": 10021
},
"phoneNumber":
[
{
"type": "home",
"number": "212 555-1239"
},
{
"type": "fax",
"number": "646 555-4567"
}
]
}
Attributes are arbitrary metadata that can be attached to objects in R. Some are special (e.g. class, comment, dim, dimnames, names, etc.) and change the way in which an object is treated by R.
Attributes are a named list that is attached to every R object, they can be accessed (get and set) individually via the attr
and collectively via attributes
.
(x = c(L=1,M=2,N=3))
## L M N
## 1 2 3
attr(x,"names") = c("A","B","C")
x
## A B C
## 1 2 3
names(x)
## [1] "A" "B" "C"
str(x)
## Named num [1:3] 1 2 3
## - attr(*, "names")= chr [1:3] "A" "B" "C"
attributes(x)
## $names
## [1] "A" "B" "C"
str(attributes(x))
## List of 1
## $ names: chr [1:3] "A" "B" "C"
Factor objects are how R stores data for categorical variables (fixed # of discrete values).
(x = factor(c("BS", "MS", "PhD", "MS")))
## [1] BS MS PhD MS
## Levels: BS MS PhD
str(x)
## Factor w/ 3 levels "BS","MS","PhD": 1 2 3 2
typeof(x)
## [1] "integer"
A factor is just an integer vector with two attributes: class
and levels
.
attributes(x)
## $levels
## [1] "BS" "MS" "PhD"
##
## $class
## [1] "factor"
Construct a factor variable (without using factor
, as.factor
, or related functions) that contains the weather forecast for the next 7 days.
There should be 4 levels - sun
, clouds
, rain
, snow
.
Find the weekly forecast from Weather Underground
Start with an integer vector and add the appropriate attributes.
What would you need to do if I decided that I’d prefer to have only three levels: sun/cloud
, rain
, snow
.
Above materials are derived in part from the following sources: