Data structures and dimensionality

Dimensions	Homogeneous	Heterogeneous
1d	Vector (atomic vector)	List (generic vector)
2d	Matrix	Data Frame
nd	Array	—

Vectors

Atomic Vectors

R has six atomic vector types:

typeof	mode	storage.mode
logical	logical	logical
double	numeric	double
integer	numeric	integer
character	character	character
complex	complex	complex
raw	raw	raw

For now we’ll only worry about the first four.

Vector types

logical - boolean values TRUE and FALSE

typeof(TRUE)

## [1] "logical"

character - character strings

typeof("hello")

## [1] "character"

typeof('world')

## [1] "character"

double - floating point numerical values (default numerical type)

typeof(1.33)

## [1] "double"

typeof(7)

## [1] "double"

integer - integer numerical values (indicated with an L)

typeof( 7L )

## [1] "integer"

typeof( 1:3 )

## [1] "integer"

Concatenation

Atomic vectors can be constructed using the c() function, note that vectors are always flat.

c(1,2,3)

## [1] 1 2 3

c("Hello", "World!")

## [1] "Hello"  "World!"

c(1,c(2, c(3)))

## [1] 1 2 3

Testing types

typeof(x) - returns a character vector of the type of object x.

is.logical(x) - returns TRUE if x has type logical.

is.character(x) - returns TRUE if x has type character.

is.double(x) - returns TRUE if x has type double.

is.integer(x) - returns TRUE if x has type integer.

is.numeric(x) - returns TRUE if x has mode numeric.

is.numeric(7L)

## [1] TRUE

is.numeric(7)

## [1] TRUE

typeof(7L)

## [1] "integer"

typeof(7)

## [1] "double"

is.atomic(x) - returns TRUE if x is an atomic vector.

is.vector(x) - returns TRUE if x is any type of vector (e.g. atomic vector or list).

is.atomic(c(1,2,3))

## [1] TRUE

is.vector(c(1,2,3))

## [1] TRUE

is.atomic(list(1,2,3))

## [1] FALSE

is.vector(list(1,2,3))

## [1] TRUE

Coercion

R is a dynamically typed language – it will happily convert between the various types without complaint.

c(1,"Hello")

## [1] "1"     "Hello"

c(FALSE, 3L)

## [1] 0 3

c(1.2, 3L)

## [1] 1.2 3.0

Operator coercion

Functions and operators with attempt to coerce object to an appropriate type

3.1+1L

## [1] 4.1

log(TRUE)

## [1] 0

TRUE & 7

## [1] TRUE

FALSE | !5

## [1] FALSE

Explicit Coercion

Most of the is functions we just saw have an as variant which can be used for explicit coercion.

as.logical(5.2)

## [1] TRUE

as.character(TRUE)

## [1] "TRUE"

as.integer(pi)

## [1] 3

as.numeric(FALSE)

## [1] 0

as.double("7.2")

## [1] 7.2

as.double("one")

## Warning: NAs introduced by coercion

## [1] NA

Missing Values

R uses NA to represent missing values in its data structures, what may not be obvious is that there are different NAs for the different types.

typeof(NA)

## [1] "logical"

typeof(NA+1)

## [1] "double"

typeof(NA+1L)

## [1] "integer"

Contagiousness of Missing Values

Because NAs represent missing values it makes sense that any calculation using them should also be missing.

1 / NA

## [1] NA

NA * 5

## [1] NA

mean(c(1,2,3,NA))

## [1] NA

3^NA

## [1] NA

This makes sense but can be problematic in some cases (particularly for control flow)

1 == NA

## [1] NA

if (2 != NA)
  print("Here")

## Error in if (2 != NA) print("Here"): missing value where TRUE/FALSE needed

Testing for NA

To explicitly test if a value is missing it is necessary to use is.na (often along with any or all).

is.na(NA)

## [1] TRUE

is.na(1)

## [1] FALSE

is.na(c(1,2,3,NA))

## [1] FALSE FALSE FALSE  TRUE

any(is.na(c(1,2,3,NA)))

## [1] TRUE

all(is.na(c(1,2,3,NA)))

## [1] FALSE

Other Special (floating point) Values

NaN - Not a number

Inf - Positive infinity

-Inf - Negative infinity

pi / 0

## [1] Inf

0 / 0

## [1] NaN

1/0 + 1/0

## [1] Inf

1/0 - 1/0

## [1] NaN

NaN / NA

## [1] NaN

NaN * NA

## [1] NA

Testing for infinity and NaN

NaN and Inf don’t have the same testing issues that NA has, but there are still convenience functions for testing for

is.finite(-Inf)

## [1] FALSE

is.finite(1/0+1/0)

## [1] FALSE

is.nan(1/0-1/0)

## [1] TRUE

Coercion for infinity and NaN

First remember that Inf, -Inf, and NaN are exclusively of type double, however their coercion behavior is not the same as for other double values.

as.integer(Inf)

## Warning: NAs introduced by coercion to integer range

## [1] NA

as.logical(Inf)

## [1] TRUE

as.logical(NaN)

## [1] NA

as.character(Inf)

## [1] "Inf"

as.character(NaN)

## [1] "NaN"

Exercise 1

Part 1

What is the type of the following vectors? Explain why they have that type.

c(1, NA+1L, "C")
c(1L / 0, NA)
c(1:3, 5)
c(3L, NaN+1L)
c(NA, TRUE)

Part 2

Considering only the four (common) data types, what is R’s implicit type conversion hierarchy (from highest priority to lowest priority)? Hint - think about the pairwise interactions between types.

Lists

Lists are generic vectors, in that they are 1d and can contain any combination of R objects.

list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2)

## [[1]]
## [1] "A"
## 
## [[2]]
## [1]  TRUE FALSE
## 
## [[3]]
## [1] 0.5 1.0 1.5 2.0
## 
## [[4]]
## function (x) 
## x^2

str( list("A", c(TRUE,FALSE), (1:4)/2, function(x) x^2) )

## List of 4
##  $ : chr "A"
##  $ : logi [1:2] TRUE FALSE
##  $ : num [1:4] 0.5 1 1.5 2
##  $ :function (x)  
##   ..- attr(*, "srcref")=Class 'srcref'  atomic [1:8] 1 40 1 54 40 54 1 1
##   .. .. ..- attr(*, "srcfile")=Classes 'srcfilecopy', 'srcfile' <environment: 0x7fe7cdd44120>

Recursive lists

Lists can even contain other lists, meaning they don’t have to be flat

str( list(1, list(2, list(3))) )

## List of 2
##  $ : num 1
##  $ :List of 2
##   ..$ : num 2
##   ..$ :List of 1
##   .. ..$ : num 3

List Coercion

By default a vector will be coerced to a list (as a list is more generic) if needed

str( c(1:3,list(4,5,list(6,7))) )

## List of 6
##  $ : int 1
##  $ : int 2
##  $ : int 3
##  $ : num 4
##  $ : num 5
##  $ :List of 2
##   ..$ : num 6
##   ..$ : num 7

We can force a list back to a vector using unlist

unlist(list(1:3, list(4:5, 6)))

## [1] 1 2 3 4 5 6

unlist( list(1, list(2, list(3, "Hello"))) )

## [1] "1"     "2"     "3"     "Hello"

Note that this also has the effect of flattening the list (since atomic vectors must be flat).

Named lists

Because of their more complex structure we often want to name the elements of a list (we can also do this with vectors). This can make reading and accessing the list more straight forward.

str(list(A = 1, B = list(C = 2, D = 3)))

## List of 2
##  $ A: num 1
##  $ B:List of 2
##   ..$ C: num 2
##   ..$ D: num 3

list("knock knock" = "who's there?")

## $`knock knock`
## [1] "who's there?"

names(list(ABC=1, DEF=list(H=2, I=3)))

## [1] "ABC" "DEF"

Exercise 2

Represent the following JSON data as a list in R.

{
  "firstName": "John",
  "lastName": "Smith",
  "age": 25,
  "address": 
  {
    "streetAddress": "21 2nd Street",
    "city": "New York",
    "state": "NY",
    "postalCode": 10021
  },
  "phoneNumber": 
  [
    {
      "type": "home",
      "number": "212 555-1239"
    },
    {
      "type": "fax",
      "number": "646 555-4567"
    }
  ]
}

Attributes

Attributes are arbitrary metadata that can be attached to objects in R. Some are special (e.g. class, comment, dim, dimnames, names, etc.) and change the way in which an object is treated by R.

Attributes are a named list that is attached to every R object, they can be accessed (get and set) individually via the attr and collectively via attributes.

(x = c(L=1,M=2,N=3))

## L M N 
## 1 2 3

attr(x,"names") = c("A","B","C")
x

## A B C 
## 1 2 3

names(x)

## [1] "A" "B" "C"

str(x)

##  Named num [1:3] 1 2 3
##  - attr(*, "names")= chr [1:3] "A" "B" "C"

attributes(x)

## $names
## [1] "A" "B" "C"

str(attributes(x))

## List of 1
##  $ names: chr [1:3] "A" "B" "C"

Factors

Factor objects are how R stores data for categorical variables (fixed # of discrete values).

(x = factor(c("BS", "MS", "PhD", "MS")))

## [1] BS  MS  PhD MS 
## Levels: BS MS PhD

str(x)

##  Factor w/ 3 levels "BS","MS","PhD": 1 2 3 2

typeof(x)

## [1] "integer"

A factor is just an integer vector with two attributes: class and levels.

attributes(x)

## $levels
## [1] "BS"  "MS"  "PhD"
## 
## $class
## [1] "factor"

Exercise 3

Construct a factor variable (without using factor, as.factor, or related functions) that contains the weather forecast for the next 7 days.

There should be 4 levels - sun, clouds, rain, snow.
Find the weekly forecast from Weather Underground
Start with an integer vector and add the appropriate attributes.
What would you need to do if I decided that I’d prefer to have only three levels: sun/cloud, rain, snow.

Acknowledgments

Above materials are derived in part from the following sources:

Hadley Wickham - Advanced R
R Language Definition

R data structures

Data structures and dimensionality

Vectors

Atomic Vectors

Vector types

Concatenation

Testing types

Coercion

Operator coercion

Explicit Coercion

Missing Values

Contagiousness of Missing Values

Testing for NA

Other Special (floating point) Values

Testing for infinity and NaN

Coercion for infinity and NaN

Exercise 1

Lists

Recursive lists

List Coercion

Named lists

Exercise 2

Attributes

Attributes

Factors

Exercise 3

Acknowledgments

Acknowledgments