(Almost) Everything is a Vector

Types of vectors

The fundamental building block of data in R are vectors (collections of related values, objects, other data structures, etc).


R has two fundamental vector classes:

  • Vectors (atomic vectors)

    • collections of values that are all of the same type (e.g. all logical values, all numbers, or all character strings).
  • Lists (generic vectors)

    • collections of any type of R object, even other lists (meaning they can have a hierarchical/tree-like structure).

Atomic Vectors

R has six atomic vector types:

typeof mode storage.mode
logical logical logical
double numeric double
integer numeric integer
character character character
complex complex complex
raw raw raw


For now we'll mainly worry about the first type, we'll discuss the following three next time (final two rarely come up).

Conditionals

Logical (boolean) operations

Operator Operation Vectorized?
x | y or Yes
x & y and Yes
!x not Yes
x || y or No
x && y and No
xor(x,y) exclusive or Yes

Vectorized?

x = c(TRUE,FALSE,TRUE)
y = c(FALSE,TRUE,TRUE)
x | y
## [1] TRUE TRUE TRUE
x || y
## [1] TRUE
x & y
## [1] FALSE FALSE  TRUE
x && y
## [1] FALSE

Vectorized? (Length coercion)

x = c(TRUE,FALSE,TRUE)
y = c(TRUE)
z = c(FALSE,TRUE)
x | y
## [1] TRUE TRUE TRUE
y | z
## [1] TRUE TRUE
x | z
## Warning in x | z: longer object length is not
## a multiple of shorter object length
## [1] TRUE TRUE TRUE
x & y
## [1]  TRUE FALSE  TRUE
y & z
## [1] FALSE  TRUE
x & z
## Warning in x & z: longer object length is not
## a multiple of shorter object length
## [1] FALSE FALSE FALSE

Comparisons

Operator Comparison Vectorized?
x < y less than Yes
x > y greater than Yes
x <= y less than or equal to Yes
x >= y greater than or equal to Yes
x != y not equal to Yes
x == y equal to Yes
x %in% y contains Yes (for x)

Comparisons

x = c("A","B","C")
z = c("A")
x == z
## [1]  TRUE FALSE FALSE
x != z
## [1] FALSE  TRUE  TRUE
x > z
## [1] FALSE  TRUE  TRUE
x %in% z
## [1]  TRUE FALSE FALSE
z %in% x
## [1] TRUE



Conditional Control Flow - if

Conditional execution of code blocks is achieved via if statements. Note that if statements are not vectorized.

x = c(3,4)

if (3 %in% x)
    print("Here!")
## [1] "Here!"
if (x >= 2)
    print("Now Here!")
## Warning in if (x >= 2) print("Now Here!"): the condition has length > 1 and
## only the first element will be used
## [1] "Now Here!"

Collapsing logical vectors

There are a couple of helper functions for collapsing a logical vector down to a single value: any, all

x = c(3,4)
any(x >= 2)
## [1] TRUE
all(x >= 2)
## [1] TRUE



!any(x >= 2)
## [1] FALSE
if (any(x >= 2))
    print("Now There!")
## [1] "Now There!"

Nesting Conditionals - if, else if, and else

x = 3
if (x < 0) {
   print("Negative")
} else if (x > 0) {
   print("Positive")
} else {
   print("Zero")
}
## [1] "Positive"
x = 0
if (x < 0) {
   print("Negative")
} else if (x > 0) {
   print("Positive")
} else {
   print("Zero")
}
## [1] "Zero"

Loops

for loops

Simplest, and most common type of loop in R - given a vector iterate through the elements and evaluate the code block for each.

for(x in 1:10)
{
  cat(x^2,"")
}
## 1 4 9 16 25 36 49 64 81 100
for(y in list(1:3, LETTERS[1:7], c(TRUE,FALSE)))
{
  cat(length(y),"")
}
## 3 7 2

Alternative loops - while

Repeat until the given condition is not met (i.e. results in FALSE)

i = 1
res = rep(NA,10)
while (i <= 10)
{
  res[i] = i^2
  i = i+1
}
res
##  [1]   1   4   9  16  25  36  49  64  81 100

Alternative loops - repeat

Repeat until break

i = 1
res = rep(NA,10)
repeat
{
  res[i] = i^2
  i = i+1
  if (i > 10)
    break
}
res
##  [1]   1   4   9  16  25  36  49  64  81 100

Special keywords - break and next

These are special actions that only work inside of a loop

  • break - ends the current (inner-most) loop
  • next - ends the current iteration
for(i in 1:10)
{
    if (i %% 2 == 0)
        break
    cat(i,"")
}
## 1
for(i in 1:10)
{
    if (i %% 2 == 0)
        next
    cat(i,"")
}
## 1 3 5 7 9

Storing results

It is almost always better to create an object to store your results first, rather than growing the object as you go.

# Good
res = rep(NA,10)
for(x in 1:10)
{
  res[x] = x^2
}
res
##  [1]   1   4   9  16  25  36  49  64  81 100
# Bad
res = c()
for (x in 1:10)
{
  res = c(res,x^2)
}
res
##  [1]   1   4   9  16  25  36  49  64  81 100

Back to for loops

Often we want to use a loop across the indexes of an object and not the elements themselves. There are several useful functions to help you do this: :, seq, seq_along, seq_len, etc.

l = list(1:3, LETTERS[1:7], c(TRUE,FALSE))
res = rep(NA, length(l))

for(x in seq_along(l))
{
  res[x] = length(l[[x]])
}

res
## [1] 3 7 2




1:length(l)
## [1] 1 2 3
seq_along(l)
## [1] 1 2 3
seq_len(length(l))
## [1] 1 2 3

Looping over element indices

Best Practice:

good = function(x)
{
  for(i in seq_along(x))
    cat(1,"")
}

Antipattern:

bad = function(x)
{
  for(i in 1:length(x))
    cat(1,"")
}
good(c(1,2,3))
## 1 1 1
good(c())



bad(c(1,2,3))
## 1 1 1
bad(c())
## 1 1

Some lessons learned

  • Everything we've shown so far can also be done using
    • subsetting ([]) or
    • functional approaches (*apply or purrr)


  • There are almost always multiple possible approaches,
    • the best initial solution is the one you can get working the quickest
    • once something is working you can worry about making it faster / more efficient.


Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Exercise 1

Below is the list of primes between 2 and 100:

2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 
43, 47, 53, 59, 61, 67, 71, 73, 79, 83, 89, 97

If you were given the vector x = c(3, 4, 12, 19, 23, 48, 50, 61, 63, 78), write out the R code necessary to print only the values of x that are not prime (without using subsetting or the %in% operator).

Your code should use nested loops to iterate through the vector of primes and x.

Error Checking

stop and stopifnot

Often we want to validate user input or function arguments - if these tests do not pass then we often want to report the error and stop execution.

ok = FALSE
if (!ok)
  stop("Things are not ok.")
## Error in eval(expr, envir, enclos): Things are not ok.
stopifnot(ok)
## Error: ok is not TRUE

Note - an error (like the one generated by stop) will prevent an RMarkdown document from compiling unless error=TRUE is set for that code block.

Style choices

do_stuff_v1 = function(x) {
  if (condition_one) {
    ##
    ## Do stuff
    ##
  } else if (condition_two) {
    ##
    ## Do other stuff
    ##
  } else if (condition_error) {
    stop("Condition error occured")
  }
}


do_stuff_v2 = function(x) {
  if (condition_error) {
    stop("Condition error occured")
  }

  if (condition_one) {
    ##
    ## Do stuff
    ##
  } else if (condition_two) {
    ##
    ## Do other stuff
    ##
  }
}

Acknowledgments

Acknowledgments