Data structures & S3

class: center, middle, inverse, title-slide

# Data structures & S3
### Colin Rundel
### 2019-01-22

---

exclude: true

---
class: middle
count: false

# Attributes

---

## Attributes

Attributes are metadata that can be attached to objects in R. Some are special (e.g. `class`, `comment`, `dim`, `dimnames`, `names`, etc.) and change the way in which an object is treated by R.

Attributes are a named list that is attached to an R object, they can be accessed (get and set) individually via the `attr` and collectively via `attributes`.

.midi[

```r
(x = c(L=1,M=2,N=3))
```

```
## L M N 
## 1 2 3
```

```r
str(x)
```

```
##  Named num [1:3] 1 2 3
##  - attr(*, "names")= chr [1:3] "L" "M" "N"
```

```r
attributes(x)
```

```
## $names
## [1] "L" "M" "N"
```
]

---

```r
str(attributes(x))
```

```
## List of 1
##  $ names: chr [1:3] "L" "M" "N"
```

```r
attr(x,"names") = c("A","B","C")
x
```

```
## A B C 
## 1 2 3
```

```r
names(x)
```

```
## [1] "A" "B" "C"
```

```r
names(x) = c("Z","Y","X")
x
```

```
## Z Y X 
## 1 2 3
```

---

## Factors

Factor objects are how R represents categorical data (e.g. a variable where there are a fixed #s of possible outcomes).

```r
(x = factor(c("BS", "MS", "PhD", "MS")))
```

```
## [1] BS  MS  PhD MS 
## Levels: BS MS PhD
```

```r
str(x)
```

```
##  Factor w/ 3 levels "BS","MS","PhD": 1 2 3 2
```

```r
typeof(x)
```

```
## [1] "integer"
```

---

A factor is just an integer vector with two attributes: `class = "factor"` and `levels = ` a character vector.

```r
attributes(x)
```

```
## $levels
## [1] "BS"  "MS"  "PhD"
## 
## $class
## [1] "factor"
```

---

## Exercise 1

Construct a factor variable (without using `factor`, `as.factor`, or related functions) that contains the weather forecast for Durham over the next 5 days.

<br/>

* There should be 5 levels - `sun`, `partial clouds`, `clouds`, `rain`, `snow`.

* Start with an *integer* vector and add the appropriate attributes.

---
class: middle
count: false

# Data Frames

---

## Data Frames

A data frame is how R handles heterogeneous tabular data (i.e. rows and columns) and is one of the most commonly used data structure in R.

At their core R represents data frames as a list of equal length vectors (usually atomic, but you can use lists as well).

```r
df = data.frame(x = 1:3, y = c("a", "b", "c"))
df
```

```
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
```

```r
str(df)
```

```
## 'data.frame':	3 obs. of  2 variables:
##  $ x: int  1 2 3
##  $ y: Factor w/ 3 levels "a","b","c": 1 2 3
```
---

```r
typeof(df)
```

```
## [1] "list"
```

```r
attributes(df)
```

```
## $names
## [1] "x" "y"
## 
## $class
## [1] "data.frame"
## 
## $row.names
## [1] 1 2 3
```

---

## Roll your own data.frame

```r
df2 = list(x = 1:3, y = factor(c("a", "b", "c")))
```

.pull-left[

```r
attr(df2,"class") = "data.frame"
df2
```

```
## [1] x y
## <0 rows> (or 0-length row.names)
```
]

.pull-right[

```r
attr(df2,"row.names") = 1:3
df2
```

```
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
```
]

```r
str(df2)
```

```
## 'data.frame':	3 obs. of  2 variables:
##  $ x: int  1 2 3
##  $ y: Factor w/ 3 levels "a","b","c": 1 2 3
```

```r
identical(df, df2)
```

```
## [1] TRUE
```

---

## Strings (Characters) vs Factors

By default character vectors will be convert into factors when they are included in a data frame.

Sometimes this is useful, usually it isn't -- either way it is important to know what type/class you are working with. This behavior can be changed using the `stringsAsFactors` argument to data.frame and related functions.

```r
df = data.frame(x = 1:3, y = c("a", "b", "c"), stringsAsFactors = FALSE)
df
```

```
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
```

```r
str(df)
```

```
## 'data.frame':	3 obs. of  2 variables:
##  $ x: int  1 2 3
##  $ y: chr  "a" "b" "c"
```
---

## Some general advice ...

---

## Length Coercion

For data frames if the lengths of the component vectors are not multiples then there will be an error (previous examples this only produced a warning).

```r
data.frame(x = 1:3, y = c("a"))
```

```
##   x y
## 1 1 a
## 2 2 a
## 3 3 a
```

```r
data.frame(x = 1:3, y = c("a","b"))
```

```
## Error in data.frame(x = 1:3, y = c("a", "b")): arguments imply differing number of rows: 3, 2
```

```r
data.frame(x = 1:3, y = character())
```

```
## Error in data.frame(x = 1:3, y = character()): arguments imply differing number of rows: 3, 0
```
---

## Growing data frames

We can add rows or columns to a data frame using `rbind` and `cbind` respectively.

```r
df = data.frame(x = 1:3, y = c("a","b","c"))
cbind(df, z=TRUE)
```

```
##   x y    z
## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 c TRUE
```

.pull-left[

```r
rbind(df, c(1,"a"))
```

```
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 1 a
```
]

.pull-right[

```r
str( rbind(df, c(1,"a")) )
```

```
## 'data.frame':	4 obs. of  2 variables:
##  $ x: chr  "1" "2" "3" "1"
##  $ y: Factor w/ 3 levels "a","b","c": 1 2 3 1
```
]

---

.pull-left[

```r
rbind(df, list(1,"a"))
```

```
##   x y
## 1 1 a
## 2 2 b
## 3 3 c
## 4 1 a
```
]

.pull-right[

```r
str( rbind(df, list(1,"a")) )
```

```
## 'data.frame':	4 obs. of  2 variables:
##  $ x: num  1 2 3 1
##  $ y: Factor w/ 3 levels "a","b","c": 1 2 3 1
```
]

```r
df1 = data.frame(x = 1:3, y = c("a","b","c"))
df2 = data.frame(m = 3:1, n = c(TRUE,TRUE,FALSE))
```

```r
cbind(df1, df2)
```

```
##   x y m     n
## 1 1 a 3  TRUE
## 2 2 b 2  TRUE
## 3 3 c 1 FALSE
```

```r
rbind(df1, df2)
```

```
## Error in match.names(clabs, names(xi)): names do not match previous names
```

---

## Matrices

A matrix is a 2 dimensional equivalent of an atomic vector (i.e. all entries must share the same type).

```r
(m = matrix(c(1,2,3,4), ncol=2, nrow=2))
```

```
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
```

```r
attributes(m)
```

```
## $dim
## [1] 2 2
```

---

## Column major ordering

A matrix is an atomic vector with a `dim` attribute. Data is stored in column major order (fill the first column starting at row one, then the next column and so on).

.pull-left[

```r
cm = matrix(c(1,2,3,4), 
            ncol=2, nrow=2)

cm
```

```
##      [,1] [,2]
## [1,]    1    3
## [2,]    2    4
```

```r
c(cm)
```

```
## [1] 1 2 3 4
```
]

.pull-right[

```r
rm = matrix(c(1,2,3,4), 
            ncol=2, nrow=2, 
            byrow=TRUE)
rm
```

```
##      [,1] [,2]
## [1,]    1    2
## [2,]    3    4
```

```r
c(rm)
```

```
## [1] 1 3 2 4
```
]

---
class: middle
count: false

# S3 Objects

---

## What is S3?

<br/>

> S3 is R’s first and simplest OO system. It is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages. S3 is informal and ad hoc, but it has a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system.

--Hadley Wickham, Advanced R

.footnote[
* S3 should not be confused with R's other object oriented systems: S4, Reference classes, and R6*.
]

---

## `class`
.pull-left[

```r
class( 1 )
```

```
## [1] "numeric"
```

```r
class( "A" )
```

```
## [1] "character"
```

```r
class( NA )
```

```
## [1] "logical"
```

```r
class( TRUE )
```

```
## [1] "logical"
```
]

.pull-right[

```r
class( matrix(1,2,2) )
```

```
## [1] "matrix"
```

```r
class( factor(c("A","B")) )
```

```
## [1] "factor"
```

```r
class( data.frame(x=1:3) )
```

```
## [1] "data.frame"
```

```r
class( (function(x) x^2) )
```

```
## [1] "function"
```
]

---

## An example

.pull-left[

```r
print( c("A","B","A","C") )
```

```
## [1] "A" "B" "A" "C"
```

```r
print( factor(c("A","B","A","C")) )
```

```
## [1] A B A C
## Levels: A B C
```
]

.pull-right[

```r
print( data.frame(a=1:3, b=4:6) )
```

```
##   a b
## 1 1 4
## 2 2 5
## 3 3 6
```
]

<br/>

```r
print
```

```
## function (x, ...) 
## UseMethod("print")
## <bytecode: 0x7fa7735c2f18>
## <environment: namespace:base>
```

---

## Other examples

.pull-left[

```r
mean
```

```
## function (x, ...) 
## UseMethod("mean")
## <bytecode: 0x7fa771b6ba20>
## <environment: namespace:base>
```

```r
t.test
```

```
## function (x, ...) 
## UseMethod("t.test")
## <bytecode: 0x7fa7718d5b88>
## <environment: namespace:stats>
```
]

.pull-right[

```r
summary
```

```
## function (object, ...) 
## UseMethod("summary")
## <bytecode: 0x7fa7726bbd70>
## <environment: namespace:base>
```

```r
plot
```

```
## function (x, y, ...) 
## UseMethod("plot")
## <bytecode: 0x7fa7701608d8>
## <environment: namespace:graphics>
```
]

```r
sum
```

```
## function (..., na.rm = FALSE)  .Primitive("sum")
```

---

## What's going on?

S3 objects and their related functions work using a very simple dispatch mechanism - a generic function is created whose sole job is to call the `UseMethod` function which then calls a class specialized function named using the convention: `generic.class`.

We can see all of the specialized versions of the generic using the `methods` function.

```r
methods("plot")
```

```
##  [1] plot.acf*           plot.data.frame*    plot.decomposed.ts*
##  [4] plot.default        plot.dendrogram*    plot.density*      
##  [7] plot.ecdf           plot.factor*        plot.formula*      
## [10] plot.function       plot.hclust*        plot.histogram*    
## [13] plot.HoltWinters*   plot.isoreg*        plot.lm*           
## [16] plot.medpolish*     plot.mlm*           plot.ppr*          
## [19] plot.prcomp*        plot.princomp*      plot.profile.nls*  
## [22] plot.R6*            plot.raster*        plot.spec*         
## [25] plot.stepfun        plot.stl*           plot.table*        
## [28] plot.ts             plot.tskernel*      plot.TukeyHSD*     
## see '?methods' for accessing help and source code
```

---

.small[

```r
methods("print")
```

```
##   [1] print.acf*                                        
##   [2] print.AES*                                        
##   [3] print.anova*                                      
##   [4] print.aov*                                        
##   [5] print.aovlist*                                    
##   [6] print.ar*                                         
##   [7] print.Arima*                                      
##   [8] print.arima0*                                     
##   [9] print.AsIs                                        
##  [10] print.aspell*                                     
##  [11] print.aspell_inspect_context*                     
##  [12] print.bibentry*                                   
##  [13] print.Bibtex*                                     
##  [14] print.boxx*                                       
##  [15] print.browseVignettes*                            
##  [16] print.by                                          
##  [17] print.bytes*                                      
##  [18] print.changedFiles*                               
##  [19] print.check_code_usage_in_package*                
##  [20] print.check_compiled_code*                        
##  [21] print.check_demo_index*                           
##  [22] print.check_depdef*                               
##  [23] print.check_details*                              
##  [24] print.check_details_changes*                      
##  [25] print.check_doi_db*                               
##  [26] print.check_dotInternal*                          
##  [27] print.check_make_vars*                            
##  [28] print.check_nonAPI_calls*                         
##  [29] print.check_package_code_assign_to_globalenv*     
##  [30] print.check_package_code_attach*                  
##  [31] print.check_package_code_data_into_globalenv*     
##  [32] print.check_package_code_startup_functions*       
##  [33] print.check_package_code_syntax*                  
##  [34] print.check_package_code_unload_functions*        
##  [35] print.check_package_compact_datasets*             
##  [36] print.check_package_CRAN_incoming*                
##  [37] print.check_package_datasets*                     
##  [38] print.check_package_depends*                      
##  [39] print.check_package_description*                  
##  [40] print.check_package_description_encoding*         
##  [41] print.check_package_license*                      
##  [42] print.check_packages_in_dir*                      
##  [43] print.check_packages_used*                        
##  [44] print.check_po_files*                             
##  [45] print.check_pragmas*                              
##  [46] print.check_Rd_contents*                          
##  [47] print.check_Rd_line_widths*                       
##  [48] print.check_Rd_metadata*                          
##  [49] print.check_Rd_xrefs*                             
##  [50] print.check_RegSym_calls*                         
##  [51] print.check_so_symbols*                           
##  [52] print.check_T_and_F*                              
##  [53] print.check_url_db*                               
##  [54] print.check_vignette_index*                       
##  [55] print.checkDocFiles*                              
##  [56] print.checkDocStyle*                              
##  [57] print.checkFF*                                    
##  [58] print.checkRd*                                    
##  [59] print.checkReplaceFuns*                           
##  [60] print.checkS3methods*                             
##  [61] print.checkTnF*                                   
##  [62] print.checkVignettes*                             
##  [63] print.citation*                                   
##  [64] print.cli_sitrep*                                 
##  [65] print.codoc*                                      
##  [66] print.codocClasses*                               
##  [67] print.codocData*                                  
##  [68] print.colonnade*                                  
##  [69] print.colorConverter*                             
##  [70] print.compactPDF*                                 
##  [71] print.condition                                   
##  [72] print.connection                                  
##  [73] print.CRAN_package_reverse_dependencies_and_views*
##  [74] print.crayon*                                     
##  [75] print.data.frame                                  
##  [76] print.Date                                        
##  [77] print.default                                     
##  [78] print.dendrogram*                                 
##  [79] print.density*                                    
##  [80] print.difftime                                    
##  [81] print.dist*                                       
##  [82] print.Dlist                                       
##  [83] print.DLLInfo                                     
##  [84] print.DLLInfoList                                 
##  [85] print.DLLRegisteredRoutines                       
##  [86] print.document_context*                           
##  [87] print.document_position*                          
##  [88] print.document_range*                             
##  [89] print.document_selection*                         
##  [90] print.dummy_coef*                                 
##  [91] print.dummy_coef_list*                            
##  [92] print.ecdf*                                       
##  [93] print.eigen                                       
##  [94] print.factanal*                                   
##  [95] print.factor                                      
##  [96] print.family*                                     
##  [97] print.fileSnapshot*                               
##  [98] print.findLineNumResult*                          
##  [99] print.formula*                                    
## [100] print.frame*                                      
## [101] print.fseq*                                       
## [102] print.ftable*                                     
## [103] print.function                                    
## [104] print.getAnywhere*                                
## [105] print.glm*                                        
## [106] print.hclust*                                     
## [107] print.help_files_with_topic*                      
## [108] print.hexmode                                     
## [109] print.HoltWinters*                                
## [110] print.hsearch*                                    
## [111] print.hsearch_db*                                 
## [112] print.htest*                                      
## [113] print.html*                                       
## [114] print.html_dependency*                            
## [115] print.infl*                                       
## [116] print.integrate*                                  
## [117] print.isoreg*                                     
## [118] print.kmeans*                                     
## [119] print.knitr_kable*                                
## [120] print.Latex*                                      
## [121] print.LaTeX*                                      
## [122] print.libraryIQR                                  
## [123] print.listof                                      
## [124] print.lm*                                         
## [125] print.loadings*                                   
## [126] print.loess*                                      
## [127] print.logLik*                                     
## [128] print.ls_str*                                     
## [129] print.medpolish*                                  
## [130] print.MethodsFunction*                            
## [131] print.mtable*                                     
## [132] print.NativeRoutineList                           
## [133] print.news_db*                                    
## [134] print.nls*                                        
## [135] print.noquote                                     
## [136] print.numeric_version                             
## [137] print.object_size*                                
## [138] print.octmode                                     
## [139] print.packageDescription*                         
## [140] print.packageInfo                                 
## [141] print.packageIQR*                                 
## [142] print.packageStatus*                              
## [143] print.pairwise.htest*                             
## [144] print.PDF_Array*                                  
## [145] print.PDF_Dictionary*                             
## [146] print.pdf_doc*                                    
## [147] print.pdf_fonts*                                  
## [148] print.PDF_Indirect_Reference*                     
## [149] print.pdf_info*                                   
## [150] print.PDF_Keyword*                                
## [151] print.PDF_Name*                                   
## [152] print.PDF_Stream*                                 
## [153] print.PDF_String*                                 
## [154] print.person*                                     
## [155] print.pillar*                                     
## [156] print.pillar_ornament*                            
## [157] print.pillar_shaft*                               
## [158] print.pillar_vertical*                            
## [159] print.POSIXct                                     
## [160] print.POSIXlt                                     
## [161] print.power.htest*                                
## [162] print.ppr*                                        
## [163] print.prcomp*                                     
## [164] print.princomp*                                   
## [165] print.proc_time                                   
## [166] print.promise*                                    
## [167] print.quosure*                                    
## [168] print.quosures*                                   
## [169] print.R6*                                         
## [170] print.R6ClassGenerator*                           
## [171] print.raster*                                     
## [172] print.Rcpp_stack_trace*                           
## [173] print.Rd*                                         
## [174] print.recordedplot*                               
## [175] print.restart                                     
## [176] print.RGBcolorConverter*                          
## [177] print.rif_shaft*                                  
## [178] print.rlang_box_done*                             
## [179] print.rlang_data_pronoun*                         
## [180] print.rlang_envs*                                 
## [181] print.rlang_error*                                
## [182] print.rlang_fake_data_pronoun*                    
## [183] print.rlang_lambda_function*                      
## [184] print.rlang_trace*                                
## [185] print.rlang_zap*                                  
## [186] print.rle                                         
## [187] print.roman*                                      
## [188] print.rule*                                       
## [189] print.sessionInfo*                                
## [190] print.shiny.tag*                                  
## [191] print.shiny.tag.list*                             
## [192] print.simple.list                                 
## [193] print.smooth.spline*                              
## [194] print.socket*                                     
## [195] print.spark*                                      
## [196] print.squeezed_colonnade*                         
## [197] print.srcfile                                     
## [198] print.srcref                                      
## [199] print.stepfun*                                    
## [200] print.stl*                                        
## [201] print.StructTS*                                   
## [202] print.subdir_tests*                               
## [203] print.summarize_CRAN_check_status*                
## [204] print.summary.aov*                                
## [205] print.summary.aovlist*                            
## [206] print.summary.ecdf*                               
## [207] print.summary.glm*                                
## [208] print.summary.lm*                                 
## [209] print.summary.loess*                              
## [210] print.summary.manova*                             
## [211] print.summary.nls*                                
## [212] print.summary.packageStatus*                      
## [213] print.summary.ppr*                                
## [214] print.summary.prcomp*                             
## [215] print.summary.princomp*                           
## [216] print.summary.table                               
## [217] print.summary.warnings                            
## [218] print.summaryDefault                              
## [219] print.table                                       
## [220] print.tables_aov*                                 
## [221] print.tbl*                                        
## [222] print.terms*                                      
## [223] print.tree*                                       
## [224] print.trunc_mat*                                  
## [225] print.ts*                                         
## [226] print.tskernel*                                   
## [227] print.TukeyHSD*                                   
## [228] print.tukeyline*                                  
## [229] print.tukeysmooth*                                
## [230] print.undoc*                                      
## [231] print.vignette*                                   
## [232] print.warnings                                    
## [233] print.x                                           
## [234] print.xfun_raw_string*                            
## [235] print.xfun_strict_list*                           
## [236] print.xgettext*                                   
## [237] print.xngettext*                                  
## [238] print.xtabs*                                      
## [239] print.y                                           
## see '?methods' for accessing help and source code
```
]

---

```r
print.data.frame
```

```
## function (x, ..., digits = NULL, quote = FALSE, right = TRUE, 
##     row.names = TRUE, max = NULL) 
## {
##     n <- length(row.names(x))
##     if (length(x) == 0L) {
##         cat(sprintf(ngettext(n, "data frame with 0 columns and %d row", 
##             "data frame with 0 columns and %d rows"), n), "\n", 
##             sep = "")
##     }
##     else if (n == 0L) {
##         print.default(names(x), quote = FALSE)
##         cat(gettext("<0 rows> (or 0-length row.names)\n"))
##     }
##     else {
##         if (is.null(max)) 
##             max <- getOption("max.print", 99999L)
##         if (!is.finite(max)) 
##             stop("invalid 'max' / getOption(\"max.print\"): ", 
##                 max)
##         omit <- (n0 <- max%/%length(x)) < n
##         m <- as.matrix(format.data.frame(if (omit) 
##             x[seq_len(n0), , drop = FALSE]
##         else x, digits = digits, na.encode = FALSE))
##         if (!isTRUE(row.names)) 
##             dimnames(m)[[1L]] <- if (isFALSE(row.names)) 
##                 rep.int("", if (omit) 
##                   n0
##                 else n)
##             else row.names
##         print(m, ..., quote = quote, right = right, max = max)
##         if (omit) 
##             cat(" [ reached 'max' / getOption(\"max.print\") -- omitted", 
##                 n - n0, "rows ]\n")
##     }
##     invisible(x)
## }
## <bytecode: 0x7fa7714a4008>
## <environment: namespace:base>
```
---

```r
print.matrix
```

```
## Error in eval(expr, envir, enclos): object 'print.matrix' not found
```

```r
print.default
```

```
## function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, 
##     right = FALSE, max = NULL, useSource = TRUE, ...) 
## {
##     noOpt <- missing(digits) && missing(quote) && missing(na.print) && 
##         missing(print.gap) && missing(right) && missing(max) && 
##         missing(useSource) && missing(...)
##     .Internal(print.default(x, digits, quote, na.print, print.gap, 
##         right, max, useSource, noOpt))
## }
## <bytecode: 0x7fa772628320>
## <environment: namespace:base>
```

---

## The other way

If instead we have a class and want to know what specialized functions exist for that class, then we can again use the `methods` function - this time with the `class` argument.

```r
methods(class="data.frame")
```

```
##  [1] [             [[            [[<-          [<-           $            
##  [6] $<-           aggregate     anyDuplicated as_tibble     as.data.frame
## [11] as.list       as.matrix     by            cbind         coerce       
## [16] dim           dimnames      dimnames<-    droplevels    duplicated   
## [21] edit          format        formula       glimpse       head         
## [26] initialize    is_vector_s3  is.na         Math          merge        
## [31] na.exclude    na.omit       Ops           plot          print        
## [36] prompt        rbind         row.names     row.names<-   rowsum       
## [41] show          shuffle       slotsFromS3   split         split<-      
## [46] stack         str           subset        summary       Summary      
## [51] t             tail          transform     type_sum      type.convert 
## [56] unique        unstack       within       
## see '?methods' for accessing help and source code
```

---
class: small

```r
`[.data.frame`
```

```
## function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == 
##     1) 
## {
##     mdrop <- missing(drop)
##     Narg <- nargs() - !mdrop
##     has.j <- !missing(j)
##     if (!all(names(sys.call()) %in% c("", "drop")) && !isS4(x)) 
##         warning("named arguments other than 'drop' are discouraged")
##     if (Narg < 3L) {
##         if (!mdrop) 
##             warning("'drop' argument will be ignored")
##         if (missing(i)) 
##             return(x)
##         if (is.matrix(i)) 
##             return(as.matrix(x)[i])
##         nm <- names(x)
##         if (is.null(nm)) 
##             nm <- character()
##         if (!is.character(i) && anyNA(nm)) {
##             names(nm) <- names(x) <- seq_along(x)
##             y <- NextMethod("[")
##             cols <- names(y)
##             if (anyNA(cols)) 
##                 stop("undefined columns selected")
##             cols <- names(y) <- nm[cols]
##         }
##         else {
##             y <- NextMethod("[")
##             cols <- names(y)
##             if (!is.null(cols) && anyNA(cols)) 
##                 stop("undefined columns selected")
##         }
##         if (anyDuplicated(cols)) 
##             names(y) <- make.unique(cols)
##         attr(y, "row.names") <- .row_names_info(x, 0L)
##         attr(y, "class") <- oldClass(x)
##         return(y)
##     }
##     if (missing(i)) {
##         if (drop && !has.j && length(x) == 1L) 
##             return(.subset2(x, 1L))
##         nm <- names(x)
##         if (is.null(nm)) 
##             nm <- character()
##         if (has.j && !is.character(j) && anyNA(nm)) {
##             names(nm) <- names(x) <- seq_along(x)
##             y <- .subset(x, j)
##             cols <- names(y)
##             if (anyNA(cols)) 
##                 stop("undefined columns selected")
##             cols <- names(y) <- nm[cols]
##         }
##         else {
##             y <- if (has.j) 
##                 .subset(x, j)
##             else x
##             cols <- names(y)
##             if (anyNA(cols)) 
##                 stop("undefined columns selected")
##         }
##         if (drop && length(y) == 1L) 
##             return(.subset2(y, 1L))
##         if (anyDuplicated(cols)) 
##             names(y) <- make.unique(cols)
##         nrow <- .row_names_info(x, 2L)
##         if (drop && !mdrop && nrow == 1L) 
##             return(structure(y, class = NULL, row.names = NULL))
##         else {
##             attr(y, "class") <- oldClass(x)
##             attr(y, "row.names") <- .row_names_info(x, 0L)
##             return(y)
##         }
##     }
##     xx <- x
##     cols <- names(xx)
##     x <- vector("list", length(x))
##     x <- .Internal(copyDFattr(xx, x))
##     oldClass(x) <- attr(x, "row.names") <- NULL
##     if (has.j) {
##         nm <- names(x)
##         if (is.null(nm)) 
##             nm <- character()
##         if (!is.character(j) && anyNA(nm)) 
##             names(nm) <- names(x) <- seq_along(x)
##         x <- x[j]
##         cols <- names(x)
##         if (drop && length(x) == 1L) {
##             if (is.character(i)) {
##                 rows <- attr(xx, "row.names")
##                 i <- pmatch(i, rows, duplicates.ok = TRUE)
##             }
##             xj <- .subset2(.subset(xx, j), 1L)
##             return(if (length(dim(xj)) != 2L) xj[i] else xj[i, 
##                 , drop = FALSE])
##         }
##         if (anyNA(cols)) 
##             stop("undefined columns selected")
##         if (!is.null(names(nm))) 
##             cols <- names(x) <- nm[cols]
##         nxx <- structure(seq_along(xx), names = names(xx))
##         sxx <- match(nxx[j], seq_along(xx))
##     }
##     else sxx <- seq_along(x)
##     rows <- NULL
##     if (is.character(i)) {
##         rows <- attr(xx, "row.names")
##         i <- pmatch(i, rows, duplicates.ok = TRUE)
##     }
##     for (j in seq_along(x)) {
##         xj <- xx[[sxx[j]]]
##         x[[j]] <- if (length(dim(xj)) != 2L) 
##             xj[i]
##         else xj[i, , drop = FALSE]
##     }
##     if (drop) {
##         n <- length(x)
##         if (n == 1L) 
##             return(x[[1L]])
##         if (n > 1L) {
##             xj <- x[[1L]]
##             nrow <- if (length(dim(xj)) == 2L) 
##                 dim(xj)[1L]
##             else length(xj)
##             drop <- !mdrop && nrow == 1L
##         }
##         else drop <- FALSE
##     }
##     if (!drop) {
##         if (is.null(rows)) 
##             rows <- attr(xx, "row.names")
##         rows <- rows[i]
##         if ((ina <- anyNA(rows)) | (dup <- anyDuplicated(rows))) {
##             if (!dup && is.character(rows)) 
##                 dup <- "NA" %in% rows
##             if (ina) 
##                 rows[is.na(rows)] <- "NA"
##             if (dup) 
##                 rows <- make.unique(as.character(rows))
##         }
##         if (has.j && anyDuplicated(nm <- names(x))) 
##             names(x) <- make.unique(nm)
##         if (is.null(rows)) 
##             rows <- attr(xx, "row.names")[i]
##         attr(x, "row.names") <- rows
##         oldClass(x) <- oldClass(xx)
##     }
##     x
## }
## <bytecode: 0x7fa771cc8d38>
## <environment: namespace:base>
```

---

## Adding methods

.pull-left[

```r
x = structure(c(1,2,3), class="x")

x
```

```
## [1] 1 2 3
## attr(,"class")
## [1] "x"
```
]

.pull-right[

```r
y = structure(c(1,2,3), class="y")

y
```

```
## [1] 1 2 3
## attr(,"class")
## [1] "y"
```
]

<div>
.pull-left[

```r
print.x = function(x) 
  print("Class x!")

x
```

```
## [1] "Class x!"
```
]

.pull-right[

```r
print.y = function(y) 
  print("Class y!")

y
```

```
## [1] "Class y!"
```
]
</div>

<div>
.pull-left[

```r
class(x) = "y"
x
```

```
## [1] "Class y!"
```
]

.pull-right[

```r
class(y) = "x"
y
```

```
## [1] "Class x!"
```
]
</div>

---

## Defining a new S3 Generic

```r
shuffle = function(x, ...) {
  UseMethod("shuffle")
}

shuffle.default = function(x) {
  n = length(x)
  x[sample(seq_len(n),n)]
}

shuffle.data.frame = function(df) {
  n = nrow(df)
  df[sample(seq_len(n),n),]
}
```

.pull-left[

```r
shuffle( 1:10 )
```

```
##  [1]  2  1  7  4  5  8 10  9  6  3
```

```r
shuffle( letters[1:5] )
```

```
## [1] "e" "d" "c" "b" "a"
```
]

.pull-right[

```r
shuffle( 
  data.frame(a=1:4, b=5:8, c=9:12)
)
```

```
##   a b  c
## 1 1 5  9
## 2 2 6 10
## 3 3 7 11
## 4 4 8 12
```
]

---
class: middle
count: false

# Tibbles

---

## Modern data frames

Hadley Wickham has a package that modifies data frames to be more modern, or as he calls them surly and lazy.

```r
library(tibble)
class(iris)
```

```
## [1] "data.frame"
```

```r
tbl_iris = as_tibble(iris)
class(tbl_iris)
```

```
## [1] "tbl_df"     "tbl"        "data.frame"
```

---

## Fancy Printing

```r
tbl_iris
```

```
## # A tibble: 150 x 5
##    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
##  1          5.1         3.5          1.4         0.2 setosa 
##  2          4.9         3            1.4         0.2 setosa 
##  3          4.7         3.2          1.3         0.2 setosa 
##  4          4.6         3.1          1.5         0.2 setosa 
##  5          5           3.6          1.4         0.2 setosa 
##  6          5.4         3.9          1.7         0.4 setosa 
##  7          4.6         3.4          1.4         0.3 setosa 
##  8          5           3.4          1.5         0.2 setosa 
##  9          4.4         2.9          1.4         0.2 setosa 
## 10          4.9         3.1          1.5         0.1 setosa 
## # … with 140 more rows
```

---

## Fancier printing

```r
data_frame(x = rnorm(10,sd=5), y = rnorm(10))
```

```
## # A tibble: 10 x 2
##         x       y
##     <dbl>   <dbl>
##  1 -7.27  -1.70  
##  2  5.30   0.0933
##  3 -4.64   0.419 
##  4 -3.14   0.556 
##  5 -0.305 -0.0337
##  6 -2.21   0.0310
##  7 -4.18  -0.350 
##  8  8.82  -0.204 
##  9  9.73  -0.961 
## 10 -1.84  -0.888
```

---

## Tibbles are lazy

```r
tbl_iris[1,]
```

```
## # A tibble: 1 x 5
##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
##          <dbl>       <dbl>        <dbl>       <dbl> <fct>  
## 1          5.1         3.5          1.4         0.2 setosa
```

.pull-left[

```r
tbl_iris[,"Species"]
```

```
## # A tibble: 150 x 1
##    Species
##    <fct>  
##  1 setosa 
##  2 setosa 
##  3 setosa 
##  4 setosa 
##  5 setosa 
##  6 setosa 
##  7 setosa 
##  8 setosa 
##  9 setosa 
## 10 setosa 
## # … with 140 more rows
```
]

.pull-right[

```r
data_frame(
  x = 1:3, 
  y = c("A","B","C")
)
```

```
## # A tibble: 3 x 2
##       x y    
##   <int> <chr>
## 1     1 A    
## 2     2 B    
## 3     3 C
```
]

---

## Multiple classes

```r
d = data_frame(
  x = 1:3, 
  y = c("A","B","C")
)

class(d)
```

```
## [1] "tbl_df"     "tbl"        "data.frame"
```

<br/>

```r
class(d) = rev(class(d))
class(d)
```

```
## [1] "data.frame" "tbl"        "tbl_df"
```

```r
d
```

```
##   x y
## 1 1 A
## 2 2 B
## 3 3 C
```

---

## Reverting a tbl

```r
d = data_frame(
  x = 1:3, 
  y = c("A","B","C")
)

d
```

```
## # A tibble: 3 x 2
##       x y    
##   <int> <chr>
## 1     1 A    
## 2     2 B    
## 3     3 C
```

.pull-left[

```r
data.frame(d)
```

```
##   x y
## 1 1 A
## 2 2 B
## 3 3 C
```
]

.pull-right[

```r
class(d) = "data.frame"
d
```

```
##   x y
## 1 1 A
## 2 2 B
## 3 3 C
```
]

---

## Acknowledgments

Above materials are derived in part from the following sources:

* Hadley Wickham - [Advanced R](http://adv-r.had.co.nz/)
* [R Language Definition](http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html)