class: center, middle, inverse, title-slide # Data structures & S3 ### Colin Rundel ### 2019-01-22 --- exclude: true --- class: middle count: false # Attributes --- ## Attributes Attributes are metadata that can be attached to objects in R. Some are special (e.g. `class`, `comment`, `dim`, `dimnames`, `names`, etc.) and change the way in which an object is treated by R. Attributes are a named list that is attached to an R object, they can be accessed (get and set) individually via the `attr` and collectively via `attributes`. .midi[ ```r (x = c(L=1,M=2,N=3)) ``` ``` ## L M N ## 1 2 3 ``` ```r str(x) ``` ``` ## Named num [1:3] 1 2 3 ## - attr(*, "names")= chr [1:3] "L" "M" "N" ``` ```r attributes(x) ``` ``` ## $names ## [1] "L" "M" "N" ``` ] --- ```r str(attributes(x)) ``` ``` ## List of 1 ## $ names: chr [1:3] "L" "M" "N" ``` -- ```r attr(x,"names") = c("A","B","C") x ``` ``` ## A B C ## 1 2 3 ``` -- ```r names(x) ``` ``` ## [1] "A" "B" "C" ``` ```r names(x) = c("Z","Y","X") x ``` ``` ## Z Y X ## 1 2 3 ``` --- ## Factors Factor objects are how R represents categorical data (e.g. a variable where there are a fixed #s of possible outcomes). ```r (x = factor(c("BS", "MS", "PhD", "MS"))) ``` ``` ## [1] BS MS PhD MS ## Levels: BS MS PhD ``` -- ```r str(x) ``` ``` ## Factor w/ 3 levels "BS","MS","PhD": 1 2 3 2 ``` -- ```r typeof(x) ``` ``` ## [1] "integer" ``` --- ## A factor is just an integer vector with two attributes: `class = "factor"` and `levels = ` a character vector. ```r attributes(x) ``` ``` ## $levels ## [1] "BS" "MS" "PhD" ## ## $class ## [1] "factor" ``` --- ## Exercise 1 Construct a factor variable (without using `factor`, `as.factor`, or related functions) that contains the weather forecast for Durham over the next 5 days. <br/> <img src="imgs/darksky_forecast.png" width="80%" style="display: block; margin: auto;" /> * There should be 5 levels - `sun`, `partial clouds`, `clouds`, `rain`, `snow`. * Start with an *integer* vector and add the appropriate attributes. --- class: middle count: false # Data Frames --- ## Data Frames A data frame is how R handles heterogeneous tabular data (i.e. rows and columns) and is one of the most commonly used data structure in R. At their core R represents data frames as a list of equal length vectors (usually atomic, but you can use lists as well). ```r df = data.frame(x = 1:3, y = c("a", "b", "c")) df ``` ``` ## x y ## 1 1 a ## 2 2 b ## 3 3 c ``` ```r str(df) ``` ``` ## 'data.frame': 3 obs. of 2 variables: ## $ x: int 1 2 3 ## $ y: Factor w/ 3 levels "a","b","c": 1 2 3 ``` --- ```r typeof(df) ``` ``` ## [1] "list" ``` ```r attributes(df) ``` ``` ## $names ## [1] "x" "y" ## ## $class ## [1] "data.frame" ## ## $row.names ## [1] 1 2 3 ``` --- ## Roll your own data.frame ```r df2 = list(x = 1:3, y = factor(c("a", "b", "c"))) ``` -- .pull-left[ ```r attr(df2,"class") = "data.frame" df2 ``` ``` ## [1] x y ## <0 rows> (or 0-length row.names) ``` ] -- .pull-right[ ```r attr(df2,"row.names") = 1:3 df2 ``` ``` ## x y ## 1 1 a ## 2 2 b ## 3 3 c ``` ] ```r str(df2) ``` ``` ## 'data.frame': 3 obs. of 2 variables: ## $ x: int 1 2 3 ## $ y: Factor w/ 3 levels "a","b","c": 1 2 3 ``` ```r identical(df, df2) ``` ``` ## [1] TRUE ``` --- ## Strings (Characters) vs Factors By default character vectors will be convert into factors when they are included in a data frame. Sometimes this is useful, usually it isn't -- either way it is important to know what type/class you are working with. This behavior can be changed using the `stringsAsFactors` argument to data.frame and related functions. ```r df = data.frame(x = 1:3, y = c("a", "b", "c"), stringsAsFactors = FALSE) df ``` ``` ## x y ## 1 1 a ## 2 2 b ## 3 3 c ``` ```r str(df) ``` ``` ## 'data.frame': 3 obs. of 2 variables: ## $ x: int 1 2 3 ## $ y: chr "a" "b" "c" ``` --- ## Some general advice ... <br/> <br/> <img src="imgs/stringsasfactors.jpg" align="center" width="650px"/> --- ## Length Coercion For data frames if the lengths of the component vectors are not multiples then there will be an error (previous examples this only produced a warning). ```r data.frame(x = 1:3, y = c("a")) ``` ``` ## x y ## 1 1 a ## 2 2 a ## 3 3 a ``` ```r data.frame(x = 1:3, y = c("a","b")) ``` ``` ## Error in data.frame(x = 1:3, y = c("a", "b")): arguments imply differing number of rows: 3, 2 ``` ```r data.frame(x = 1:3, y = character()) ``` ``` ## Error in data.frame(x = 1:3, y = character()): arguments imply differing number of rows: 3, 0 ``` --- ## Growing data frames We can add rows or columns to a data frame using `rbind` and `cbind` respectively. ```r df = data.frame(x = 1:3, y = c("a","b","c")) cbind(df, z=TRUE) ``` ``` ## x y z ## 1 1 a TRUE ## 2 2 b TRUE ## 3 3 c TRUE ``` .pull-left[ ```r rbind(df, c(1,"a")) ``` ``` ## x y ## 1 1 a ## 2 2 b ## 3 3 c ## 4 1 a ``` ] .pull-right[ ```r str( rbind(df, c(1,"a")) ) ``` ``` ## 'data.frame': 4 obs. of 2 variables: ## $ x: chr "1" "2" "3" "1" ## $ y: Factor w/ 3 levels "a","b","c": 1 2 3 1 ``` ] --- .pull-left[ ```r rbind(df, list(1,"a")) ``` ``` ## x y ## 1 1 a ## 2 2 b ## 3 3 c ## 4 1 a ``` ] .pull-right[ ```r str( rbind(df, list(1,"a")) ) ``` ``` ## 'data.frame': 4 obs. of 2 variables: ## $ x: num 1 2 3 1 ## $ y: Factor w/ 3 levels "a","b","c": 1 2 3 1 ``` ] -- ```r df1 = data.frame(x = 1:3, y = c("a","b","c")) df2 = data.frame(m = 3:1, n = c(TRUE,TRUE,FALSE)) ``` ```r cbind(df1, df2) ``` ``` ## x y m n ## 1 1 a 3 TRUE ## 2 2 b 2 TRUE ## 3 3 c 1 FALSE ``` ```r rbind(df1, df2) ``` ``` ## Error in match.names(clabs, names(xi)): names do not match previous names ``` --- ## Matrices A matrix is a 2 dimensional equivalent of an atomic vector (i.e. all entries must share the same type). ```r (m = matrix(c(1,2,3,4), ncol=2, nrow=2)) ``` ``` ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ``` ```r attributes(m) ``` ``` ## $dim ## [1] 2 2 ``` --- ## Column major ordering A matrix is an atomic vector with a `dim` attribute. Data is stored in column major order (fill the first column starting at row one, then the next column and so on). .pull-left[ ```r cm = matrix(c(1,2,3,4), ncol=2, nrow=2) cm ``` ``` ## [,1] [,2] ## [1,] 1 3 ## [2,] 2 4 ``` ```r c(cm) ``` ``` ## [1] 1 2 3 4 ``` ] .pull-right[ ```r rm = matrix(c(1,2,3,4), ncol=2, nrow=2, byrow=TRUE) rm ``` ``` ## [,1] [,2] ## [1,] 1 2 ## [2,] 3 4 ``` ```r c(rm) ``` ``` ## [1] 1 3 2 4 ``` ] --- class: middle count: false # S3 Objects --- ## What is S3? <br/> > S3 is R’s first and simplest OO system. It is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages. S3 is informal and ad hoc, but it has a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. --Hadley Wickham, Advanced R .footnote[ * S3 should not be confused with R's other object oriented systems: S4, Reference classes, and R6*. ] --- ## `class` .pull-left[ ```r class( 1 ) ``` ``` ## [1] "numeric" ``` ```r class( "A" ) ``` ``` ## [1] "character" ``` ```r class( NA ) ``` ``` ## [1] "logical" ``` ```r class( TRUE ) ``` ``` ## [1] "logical" ``` ] .pull-right[ ```r class( matrix(1,2,2) ) ``` ``` ## [1] "matrix" ``` ```r class( factor(c("A","B")) ) ``` ``` ## [1] "factor" ``` ```r class( data.frame(x=1:3) ) ``` ``` ## [1] "data.frame" ``` ```r class( (function(x) x^2) ) ``` ``` ## [1] "function" ``` ] --- ## An example .pull-left[ ```r print( c("A","B","A","C") ) ``` ``` ## [1] "A" "B" "A" "C" ``` ```r print( factor(c("A","B","A","C")) ) ``` ``` ## [1] A B A C ## Levels: A B C ``` ] .pull-right[ ```r print( data.frame(a=1:3, b=4:6) ) ``` ``` ## a b ## 1 1 4 ## 2 2 5 ## 3 3 6 ``` ] -- <br/> ```r print ``` ``` ## function (x, ...) ## UseMethod("print") ## <bytecode: 0x7fa7735c2f18> ## <environment: namespace:base> ``` --- ## Other examples .pull-left[ ```r mean ``` ``` ## function (x, ...) ## UseMethod("mean") ## <bytecode: 0x7fa771b6ba20> ## <environment: namespace:base> ``` ```r t.test ``` ``` ## function (x, ...) ## UseMethod("t.test") ## <bytecode: 0x7fa7718d5b88> ## <environment: namespace:stats> ``` ] .pull-right[ ```r summary ``` ``` ## function (object, ...) ## UseMethod("summary") ## <bytecode: 0x7fa7726bbd70> ## <environment: namespace:base> ``` ```r plot ``` ``` ## function (x, y, ...) ## UseMethod("plot") ## <bytecode: 0x7fa7701608d8> ## <environment: namespace:graphics> ``` ] ```r sum ``` ``` ## function (..., na.rm = FALSE) .Primitive("sum") ``` --- ## What's going on? S3 objects and their related functions work using a very simple dispatch mechanism - a generic function is created whose sole job is to call the `UseMethod` function which then calls a class specialized function named using the convention: `generic.class`. We can see all of the specialized versions of the generic using the `methods` function. ```r methods("plot") ``` ``` ## [1] plot.acf* plot.data.frame* plot.decomposed.ts* ## [4] plot.default plot.dendrogram* plot.density* ## [7] plot.ecdf plot.factor* plot.formula* ## [10] plot.function plot.hclust* plot.histogram* ## [13] plot.HoltWinters* plot.isoreg* plot.lm* ## [16] plot.medpolish* plot.mlm* plot.ppr* ## [19] plot.prcomp* plot.princomp* plot.profile.nls* ## [22] plot.R6* plot.raster* plot.spec* ## [25] plot.stepfun plot.stl* plot.table* ## [28] plot.ts plot.tskernel* plot.TukeyHSD* ## see '?methods' for accessing help and source code ``` --- .small[ ```r methods("print") ``` ``` ## [1] print.acf* ## [2] print.AES* ## [3] print.anova* ## [4] print.aov* ## [5] print.aovlist* ## [6] print.ar* ## [7] print.Arima* ## [8] print.arima0* ## [9] print.AsIs ## [10] print.aspell* ## [11] print.aspell_inspect_context* ## [12] print.bibentry* ## [13] print.Bibtex* ## [14] print.boxx* ## [15] print.browseVignettes* ## [16] print.by ## [17] print.bytes* ## [18] print.changedFiles* ## [19] print.check_code_usage_in_package* ## [20] print.check_compiled_code* ## [21] print.check_demo_index* ## [22] print.check_depdef* ## [23] print.check_details* ## [24] print.check_details_changes* ## [25] print.check_doi_db* ## [26] print.check_dotInternal* ## [27] print.check_make_vars* ## [28] print.check_nonAPI_calls* ## [29] print.check_package_code_assign_to_globalenv* ## [30] print.check_package_code_attach* ## [31] print.check_package_code_data_into_globalenv* ## [32] print.check_package_code_startup_functions* ## [33] print.check_package_code_syntax* ## [34] print.check_package_code_unload_functions* ## [35] print.check_package_compact_datasets* ## [36] print.check_package_CRAN_incoming* ## [37] print.check_package_datasets* ## [38] print.check_package_depends* ## [39] print.check_package_description* ## [40] print.check_package_description_encoding* ## [41] print.check_package_license* ## [42] print.check_packages_in_dir* ## [43] print.check_packages_used* ## [44] print.check_po_files* ## [45] print.check_pragmas* ## [46] print.check_Rd_contents* ## [47] print.check_Rd_line_widths* ## [48] print.check_Rd_metadata* ## [49] print.check_Rd_xrefs* ## [50] print.check_RegSym_calls* ## [51] print.check_so_symbols* ## [52] print.check_T_and_F* ## [53] print.check_url_db* ## [54] print.check_vignette_index* ## [55] print.checkDocFiles* ## [56] print.checkDocStyle* ## [57] print.checkFF* ## [58] print.checkRd* ## [59] print.checkReplaceFuns* ## [60] print.checkS3methods* ## [61] print.checkTnF* ## [62] print.checkVignettes* ## [63] print.citation* ## [64] print.cli_sitrep* ## [65] print.codoc* ## [66] print.codocClasses* ## [67] print.codocData* ## [68] print.colonnade* ## [69] print.colorConverter* ## [70] print.compactPDF* ## [71] print.condition ## [72] print.connection ## [73] print.CRAN_package_reverse_dependencies_and_views* ## [74] print.crayon* ## [75] print.data.frame ## [76] print.Date ## [77] print.default ## [78] print.dendrogram* ## [79] print.density* ## [80] print.difftime ## [81] print.dist* ## [82] print.Dlist ## [83] print.DLLInfo ## [84] print.DLLInfoList ## [85] print.DLLRegisteredRoutines ## [86] print.document_context* ## [87] print.document_position* ## [88] print.document_range* ## [89] print.document_selection* ## [90] print.dummy_coef* ## [91] print.dummy_coef_list* ## [92] print.ecdf* ## [93] print.eigen ## [94] print.factanal* ## [95] print.factor ## [96] print.family* ## [97] print.fileSnapshot* ## [98] print.findLineNumResult* ## [99] print.formula* ## [100] print.frame* ## [101] print.fseq* ## [102] print.ftable* ## [103] print.function ## [104] print.getAnywhere* ## [105] print.glm* ## [106] print.hclust* ## [107] print.help_files_with_topic* ## [108] print.hexmode ## [109] print.HoltWinters* ## [110] print.hsearch* ## [111] print.hsearch_db* ## [112] print.htest* ## [113] print.html* ## [114] print.html_dependency* ## [115] print.infl* ## [116] print.integrate* ## [117] print.isoreg* ## [118] print.kmeans* ## [119] print.knitr_kable* ## [120] print.Latex* ## [121] print.LaTeX* ## [122] print.libraryIQR ## [123] print.listof ## [124] print.lm* ## [125] print.loadings* ## [126] print.loess* ## [127] print.logLik* ## [128] print.ls_str* ## [129] print.medpolish* ## [130] print.MethodsFunction* ## [131] print.mtable* ## [132] print.NativeRoutineList ## [133] print.news_db* ## [134] print.nls* ## [135] print.noquote ## [136] print.numeric_version ## [137] print.object_size* ## [138] print.octmode ## [139] print.packageDescription* ## [140] print.packageInfo ## [141] print.packageIQR* ## [142] print.packageStatus* ## [143] print.pairwise.htest* ## [144] print.PDF_Array* ## [145] print.PDF_Dictionary* ## [146] print.pdf_doc* ## [147] print.pdf_fonts* ## [148] print.PDF_Indirect_Reference* ## [149] print.pdf_info* ## [150] print.PDF_Keyword* ## [151] print.PDF_Name* ## [152] print.PDF_Stream* ## [153] print.PDF_String* ## [154] print.person* ## [155] print.pillar* ## [156] print.pillar_ornament* ## [157] print.pillar_shaft* ## [158] print.pillar_vertical* ## [159] print.POSIXct ## [160] print.POSIXlt ## [161] print.power.htest* ## [162] print.ppr* ## [163] print.prcomp* ## [164] print.princomp* ## [165] print.proc_time ## [166] print.promise* ## [167] print.quosure* ## [168] print.quosures* ## [169] print.R6* ## [170] print.R6ClassGenerator* ## [171] print.raster* ## [172] print.Rcpp_stack_trace* ## [173] print.Rd* ## [174] print.recordedplot* ## [175] print.restart ## [176] print.RGBcolorConverter* ## [177] print.rif_shaft* ## [178] print.rlang_box_done* ## [179] print.rlang_data_pronoun* ## [180] print.rlang_envs* ## [181] print.rlang_error* ## [182] print.rlang_fake_data_pronoun* ## [183] print.rlang_lambda_function* ## [184] print.rlang_trace* ## [185] print.rlang_zap* ## [186] print.rle ## [187] print.roman* ## [188] print.rule* ## [189] print.sessionInfo* ## [190] print.shiny.tag* ## [191] print.shiny.tag.list* ## [192] print.simple.list ## [193] print.smooth.spline* ## [194] print.socket* ## [195] print.spark* ## [196] print.squeezed_colonnade* ## [197] print.srcfile ## [198] print.srcref ## [199] print.stepfun* ## [200] print.stl* ## [201] print.StructTS* ## [202] print.subdir_tests* ## [203] print.summarize_CRAN_check_status* ## [204] print.summary.aov* ## [205] print.summary.aovlist* ## [206] print.summary.ecdf* ## [207] print.summary.glm* ## [208] print.summary.lm* ## [209] print.summary.loess* ## [210] print.summary.manova* ## [211] print.summary.nls* ## [212] print.summary.packageStatus* ## [213] print.summary.ppr* ## [214] print.summary.prcomp* ## [215] print.summary.princomp* ## [216] print.summary.table ## [217] print.summary.warnings ## [218] print.summaryDefault ## [219] print.table ## [220] print.tables_aov* ## [221] print.tbl* ## [222] print.terms* ## [223] print.tree* ## [224] print.trunc_mat* ## [225] print.ts* ## [226] print.tskernel* ## [227] print.TukeyHSD* ## [228] print.tukeyline* ## [229] print.tukeysmooth* ## [230] print.undoc* ## [231] print.vignette* ## [232] print.warnings ## [233] print.x ## [234] print.xfun_raw_string* ## [235] print.xfun_strict_list* ## [236] print.xgettext* ## [237] print.xngettext* ## [238] print.xtabs* ## [239] print.y ## see '?methods' for accessing help and source code ``` ] --- ```r print.data.frame ``` ``` ## function (x, ..., digits = NULL, quote = FALSE, right = TRUE, ## row.names = TRUE, max = NULL) ## { ## n <- length(row.names(x)) ## if (length(x) == 0L) { ## cat(sprintf(ngettext(n, "data frame with 0 columns and %d row", ## "data frame with 0 columns and %d rows"), n), "\n", ## sep = "") ## } ## else if (n == 0L) { ## print.default(names(x), quote = FALSE) ## cat(gettext("<0 rows> (or 0-length row.names)\n")) ## } ## else { ## if (is.null(max)) ## max <- getOption("max.print", 99999L) ## if (!is.finite(max)) ## stop("invalid 'max' / getOption(\"max.print\"): ", ## max) ## omit <- (n0 <- max%/%length(x)) < n ## m <- as.matrix(format.data.frame(if (omit) ## x[seq_len(n0), , drop = FALSE] ## else x, digits = digits, na.encode = FALSE)) ## if (!isTRUE(row.names)) ## dimnames(m)[[1L]] <- if (isFALSE(row.names)) ## rep.int("", if (omit) ## n0 ## else n) ## else row.names ## print(m, ..., quote = quote, right = right, max = max) ## if (omit) ## cat(" [ reached 'max' / getOption(\"max.print\") -- omitted", ## n - n0, "rows ]\n") ## } ## invisible(x) ## } ## <bytecode: 0x7fa7714a4008> ## <environment: namespace:base> ``` --- ```r print.matrix ``` ``` ## Error in eval(expr, envir, enclos): object 'print.matrix' not found ``` -- ```r print.default ``` ``` ## function (x, digits = NULL, quote = TRUE, na.print = NULL, print.gap = NULL, ## right = FALSE, max = NULL, useSource = TRUE, ...) ## { ## noOpt <- missing(digits) && missing(quote) && missing(na.print) && ## missing(print.gap) && missing(right) && missing(max) && ## missing(useSource) && missing(...) ## .Internal(print.default(x, digits, quote, na.print, print.gap, ## right, max, useSource, noOpt)) ## } ## <bytecode: 0x7fa772628320> ## <environment: namespace:base> ``` --- ## The other way If instead we have a class and want to know what specialized functions exist for that class, then we can again use the `methods` function - this time with the `class` argument. ```r methods(class="data.frame") ``` ``` ## [1] [ [[ [[<- [<- $ ## [6] $<- aggregate anyDuplicated as_tibble as.data.frame ## [11] as.list as.matrix by cbind coerce ## [16] dim dimnames dimnames<- droplevels duplicated ## [21] edit format formula glimpse head ## [26] initialize is_vector_s3 is.na Math merge ## [31] na.exclude na.omit Ops plot print ## [36] prompt rbind row.names row.names<- rowsum ## [41] show shuffle slotsFromS3 split split<- ## [46] stack str subset summary Summary ## [51] t tail transform type_sum type.convert ## [56] unique unstack within ## see '?methods' for accessing help and source code ``` --- class: small ```r `[.data.frame` ``` ``` ## function (x, i, j, drop = if (missing(i)) TRUE else length(cols) == ## 1) ## { ## mdrop <- missing(drop) ## Narg <- nargs() - !mdrop ## has.j <- !missing(j) ## if (!all(names(sys.call()) %in% c("", "drop")) && !isS4(x)) ## warning("named arguments other than 'drop' are discouraged") ## if (Narg < 3L) { ## if (!mdrop) ## warning("'drop' argument will be ignored") ## if (missing(i)) ## return(x) ## if (is.matrix(i)) ## return(as.matrix(x)[i]) ## nm <- names(x) ## if (is.null(nm)) ## nm <- character() ## if (!is.character(i) && anyNA(nm)) { ## names(nm) <- names(x) <- seq_along(x) ## y <- NextMethod("[") ## cols <- names(y) ## if (anyNA(cols)) ## stop("undefined columns selected") ## cols <- names(y) <- nm[cols] ## } ## else { ## y <- NextMethod("[") ## cols <- names(y) ## if (!is.null(cols) && anyNA(cols)) ## stop("undefined columns selected") ## } ## if (anyDuplicated(cols)) ## names(y) <- make.unique(cols) ## attr(y, "row.names") <- .row_names_info(x, 0L) ## attr(y, "class") <- oldClass(x) ## return(y) ## } ## if (missing(i)) { ## if (drop && !has.j && length(x) == 1L) ## return(.subset2(x, 1L)) ## nm <- names(x) ## if (is.null(nm)) ## nm <- character() ## if (has.j && !is.character(j) && anyNA(nm)) { ## names(nm) <- names(x) <- seq_along(x) ## y <- .subset(x, j) ## cols <- names(y) ## if (anyNA(cols)) ## stop("undefined columns selected") ## cols <- names(y) <- nm[cols] ## } ## else { ## y <- if (has.j) ## .subset(x, j) ## else x ## cols <- names(y) ## if (anyNA(cols)) ## stop("undefined columns selected") ## } ## if (drop && length(y) == 1L) ## return(.subset2(y, 1L)) ## if (anyDuplicated(cols)) ## names(y) <- make.unique(cols) ## nrow <- .row_names_info(x, 2L) ## if (drop && !mdrop && nrow == 1L) ## return(structure(y, class = NULL, row.names = NULL)) ## else { ## attr(y, "class") <- oldClass(x) ## attr(y, "row.names") <- .row_names_info(x, 0L) ## return(y) ## } ## } ## xx <- x ## cols <- names(xx) ## x <- vector("list", length(x)) ## x <- .Internal(copyDFattr(xx, x)) ## oldClass(x) <- attr(x, "row.names") <- NULL ## if (has.j) { ## nm <- names(x) ## if (is.null(nm)) ## nm <- character() ## if (!is.character(j) && anyNA(nm)) ## names(nm) <- names(x) <- seq_along(x) ## x <- x[j] ## cols <- names(x) ## if (drop && length(x) == 1L) { ## if (is.character(i)) { ## rows <- attr(xx, "row.names") ## i <- pmatch(i, rows, duplicates.ok = TRUE) ## } ## xj <- .subset2(.subset(xx, j), 1L) ## return(if (length(dim(xj)) != 2L) xj[i] else xj[i, ## , drop = FALSE]) ## } ## if (anyNA(cols)) ## stop("undefined columns selected") ## if (!is.null(names(nm))) ## cols <- names(x) <- nm[cols] ## nxx <- structure(seq_along(xx), names = names(xx)) ## sxx <- match(nxx[j], seq_along(xx)) ## } ## else sxx <- seq_along(x) ## rows <- NULL ## if (is.character(i)) { ## rows <- attr(xx, "row.names") ## i <- pmatch(i, rows, duplicates.ok = TRUE) ## } ## for (j in seq_along(x)) { ## xj <- xx[[sxx[j]]] ## x[[j]] <- if (length(dim(xj)) != 2L) ## xj[i] ## else xj[i, , drop = FALSE] ## } ## if (drop) { ## n <- length(x) ## if (n == 1L) ## return(x[[1L]]) ## if (n > 1L) { ## xj <- x[[1L]] ## nrow <- if (length(dim(xj)) == 2L) ## dim(xj)[1L] ## else length(xj) ## drop <- !mdrop && nrow == 1L ## } ## else drop <- FALSE ## } ## if (!drop) { ## if (is.null(rows)) ## rows <- attr(xx, "row.names") ## rows <- rows[i] ## if ((ina <- anyNA(rows)) | (dup <- anyDuplicated(rows))) { ## if (!dup && is.character(rows)) ## dup <- "NA" %in% rows ## if (ina) ## rows[is.na(rows)] <- "NA" ## if (dup) ## rows <- make.unique(as.character(rows)) ## } ## if (has.j && anyDuplicated(nm <- names(x))) ## names(x) <- make.unique(nm) ## if (is.null(rows)) ## rows <- attr(xx, "row.names")[i] ## attr(x, "row.names") <- rows ## oldClass(x) <- oldClass(xx) ## } ## x ## } ## <bytecode: 0x7fa771cc8d38> ## <environment: namespace:base> ``` --- ## Adding methods .pull-left[ ```r x = structure(c(1,2,3), class="x") x ``` ``` ## [1] 1 2 3 ## attr(,"class") ## [1] "x" ``` ] .pull-right[ ```r y = structure(c(1,2,3), class="y") y ``` ``` ## [1] 1 2 3 ## attr(,"class") ## [1] "y" ``` ] -- <div> .pull-left[ ```r print.x = function(x) print("Class x!") x ``` ``` ## [1] "Class x!" ``` ] .pull-right[ ```r print.y = function(y) print("Class y!") y ``` ``` ## [1] "Class y!" ``` ] </div> -- <div> .pull-left[ ```r class(x) = "y" x ``` ``` ## [1] "Class y!" ``` ] .pull-right[ ```r class(y) = "x" y ``` ``` ## [1] "Class x!" ``` ] </div> --- ## Defining a new S3 Generic ```r shuffle = function(x, ...) { UseMethod("shuffle") } shuffle.default = function(x) { n = length(x) x[sample(seq_len(n),n)] } shuffle.data.frame = function(df) { n = nrow(df) df[sample(seq_len(n),n),] } ``` -- .pull-left[ ```r shuffle( 1:10 ) ``` ``` ## [1] 2 1 7 4 5 8 10 9 6 3 ``` ```r shuffle( letters[1:5] ) ``` ``` ## [1] "e" "d" "c" "b" "a" ``` ] .pull-right[ ```r shuffle( data.frame(a=1:4, b=5:8, c=9:12) ) ``` ``` ## a b c ## 1 1 5 9 ## 2 2 6 10 ## 3 3 7 11 ## 4 4 8 12 ``` ] --- class: middle count: false # Tibbles --- ## Modern data frames Hadley Wickham has a package that modifies data frames to be more modern, or as he calls them surly and lazy. ```r library(tibble) class(iris) ``` ``` ## [1] "data.frame" ``` ```r tbl_iris = as_tibble(iris) class(tbl_iris) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` --- ## Fancy Printing ```r tbl_iris ``` ``` ## # A tibble: 150 x 5 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## <dbl> <dbl> <dbl> <dbl> <fct> ## 1 5.1 3.5 1.4 0.2 setosa ## 2 4.9 3 1.4 0.2 setosa ## 3 4.7 3.2 1.3 0.2 setosa ## 4 4.6 3.1 1.5 0.2 setosa ## 5 5 3.6 1.4 0.2 setosa ## 6 5.4 3.9 1.7 0.4 setosa ## 7 4.6 3.4 1.4 0.3 setosa ## 8 5 3.4 1.5 0.2 setosa ## 9 4.4 2.9 1.4 0.2 setosa ## 10 4.9 3.1 1.5 0.1 setosa ## # … with 140 more rows ``` --- ## Fancier printing ```r data_frame(x = rnorm(10,sd=5), y = rnorm(10)) ``` ``` ## # A tibble: 10 x 2 ## x y ## <dbl> <dbl> ## 1 -7.27 -1.70 ## 2 5.30 0.0933 ## 3 -4.64 0.419 ## 4 -3.14 0.556 ## 5 -0.305 -0.0337 ## 6 -2.21 0.0310 ## 7 -4.18 -0.350 ## 8 8.82 -0.204 ## 9 9.73 -0.961 ## 10 -1.84 -0.888 ``` --- ## Tibbles are lazy ```r tbl_iris[1,] ``` ``` ## # A tibble: 1 x 5 ## Sepal.Length Sepal.Width Petal.Length Petal.Width Species ## <dbl> <dbl> <dbl> <dbl> <fct> ## 1 5.1 3.5 1.4 0.2 setosa ``` .pull-left[ ```r tbl_iris[,"Species"] ``` ``` ## # A tibble: 150 x 1 ## Species ## <fct> ## 1 setosa ## 2 setosa ## 3 setosa ## 4 setosa ## 5 setosa ## 6 setosa ## 7 setosa ## 8 setosa ## 9 setosa ## 10 setosa ## # … with 140 more rows ``` ] -- .pull-right[ ```r data_frame( x = 1:3, y = c("A","B","C") ) ``` ``` ## # A tibble: 3 x 2 ## x y ## <int> <chr> ## 1 1 A ## 2 2 B ## 3 3 C ``` ] --- ## Multiple classes ```r d = data_frame( x = 1:3, y = c("A","B","C") ) class(d) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` -- <br/> ```r class(d) = rev(class(d)) class(d) ``` ``` ## [1] "data.frame" "tbl" "tbl_df" ``` ```r d ``` ``` ## x y ## 1 1 A ## 2 2 B ## 3 3 C ``` --- ## Reverting a tbl ```r d = data_frame( x = 1:3, y = c("A","B","C") ) d ``` ``` ## # A tibble: 3 x 2 ## x y ## <int> <chr> ## 1 1 A ## 2 2 B ## 3 3 C ``` .pull-left[ ```r data.frame(d) ``` ``` ## x y ## 1 1 A ## 2 2 B ## 3 3 C ``` ] .pull-right[ ```r class(d) = "data.frame" d ``` ``` ## x y ## 1 1 A ## 2 2 B ## 3 3 C ``` ] --- ## Acknowledgments Above materials are derived in part from the following sources: * Hadley Wickham - [Advanced R](http://adv-r.had.co.nz/) * [R Language Definition](http://stat.ethz.ch/R-manual/R-devel/doc/manual/R-lang.html)