class: center, middle, inverse, title-slide .title[ # Fundamentals of R: More Data Structures ] .subtitle[ ##
STA35A: Statistical Data Science 1 ] .author[ ### Xiao Hui Tai ] .date[ ### October 2, 2024 ] --- layout: true <!-- <div class="my-footer"> --> <!-- <span> --> <!-- <a href="https://datasciencebox.org" target="_blank">datasciencebox.org</a> --> <!-- </span> --> <!-- </div> --> --- ## Recap: Vectors and arrays -- - Vectors: - Additional attributes: Factors, dates - Vector arithmetic - Other functions on vectors - Comparison operators - Indexing operators - Named components (today) - Arrays - Vectors with dimension --- ## Today - Arrays - Matrices - Introduction to lists --- ## Named components You can give names to elements or components of vectors .small[ ``` r x <- c(7, 8, 10, 45) y <- c(-7, -8, -10, -45) (names(x) <- c("v1", "v2", "v3", "fred")) ``` ``` ## [1] "v1" "v2" "v3" "fred" ``` ``` r x[c("fred", "v1")] ``` ``` ## fred v1 ## 45 7 ``` ``` r x[c(4, 1)] ``` ``` ## fred v1 ## 45 7 ``` ] Note the labels are in what R prints; not actually part of the value --- `names(x)` is just another vector (of characters): ``` r names(y) <- names(x) sort(names(x)) ``` ``` ## [1] "fred" "v1" "v2" "v3" ``` ``` r which(names(x) == "fred") ``` ``` ## [1] 4 ``` --- ## Arrays - Many data structures in R are made by adding bells and whistles to vectors - **Arrays** are vectors with *dimensions* - For example a two-dimensional array: ``` r x <- c(7, 8, 10, 45) x.arr <- array(x, dim = c(2, 2)) x.arr ``` ``` ## [,1] [,2] ## [1,] 7 10 ## [2,] 8 45 ``` --- ## Arrays - Filled column-wise (by columns) - `dim` says how many rows and columns ``` r dim(x.arr) ``` ``` ## [1] 2 2 ``` --- ## Arrays with more than two dimensions - Arrays can have three dimensions ( `\(r \times c \times h\)`; think about stacking `\(r \times c\)` matrices) - Can also have `\(4, 5, \ldots n\)` dimensional arrays - `dim` is a length `\(n\)` vector - Says the size/number of indices of each component - e.g., a `\(4 \times 3\)` array has `dim` length 2, elements are 4 and 3 .small[ ``` r myArr <- array(1:12, dim = c(4, 3)) myArr ``` ``` ## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 2 6 10 ## [3,] 3 7 11 ## [4,] 4 8 12 ``` ``` r dim(myArr) ``` ``` ## [1] 4 3 ``` ] --- ## Arrays with more than two dimensions - A `\(4 \times 3 \times 2\)` array has `dim` length 3, elements are 4, 3 and 2 ``` r myArr <- array(1:24, dim = c(4, 3, 2)) myArr ``` ``` ## , , 1 ## ## [,1] [,2] [,3] ## [1,] 1 5 9 ## [2,] 2 6 10 ## [3,] 3 7 11 ## [4,] 4 8 12 ## ## , , 2 ## ## [,1] [,2] [,3] ## [1,] 13 17 21 ## [2,] 14 18 22 ## [3,] 15 19 23 ## [4,] 16 20 24 ``` --- ## Arrays with more than two dimensions - A `\(4 \times 3 \times 2\)` array has `dim` length 3, elements are 4, 3 and 2 ``` r dim(myArr) ``` ``` ## [1] 4 3 2 ``` Some other properties of the array: ``` r is.vector(myArr) ``` ``` ## [1] FALSE ``` ``` r is.array(myArr) ``` ``` ## [1] TRUE ``` --- ``` r typeof(myArr) ``` ``` ## [1] "integer" ``` ``` r str(myArr) ``` ``` ## int [1:4, 1:3, 1:2] 1 2 3 4 5 6 7 8 9 10 ... ``` ``` r attributes(myArr) ``` ``` ## $dim ## [1] 4 3 2 ``` `typeof()` returns the type of the _elements_ `str()` gives the **structure**: here, a numeric array, with three dimensions, size/indices, and then the actual numbers --- ## Arrays: Accessing and operating on arrays Can access a 2-D array either by pairs of indices or by the underlying vector: ``` r x <- c(7, 8, 10, 45) (x.arr <- array(x, dim = c(2, 2))) ``` ``` ## [,1] [,2] ## [1,] 7 10 ## [2,] 8 45 ``` ``` r x.arr[1, 2] ``` ``` ## [1] 10 ``` ``` r x.arr[3] ``` ``` ## [1] 10 ``` Remember that arrays are filled column-wise --- ## Arrays: Accessing and operating on arrays Changing array values: ``` r x.arr ``` ``` ## [,1] [,2] ## [1,] 7 10 ## [2,] 8 45 ``` ``` r x.arr[3] <- 0 x.arr ``` ``` ## [,1] [,2] ## [1,] 7 0 ## [2,] 8 45 ``` --- ## Arrays: Accessing and operating on arrays Omitting an index means "all of it": ``` r x.arr[c(1:2), 2] ``` ``` ## [1] 0 45 ``` ``` r x.arr[, 2] ``` ``` ## [1] 0 45 ``` --- ## Functions on arrays Using a **vector-style function** on an array structure will go down to the underlying vector, *unless* the function is set up to handle arrays specially: ``` r which(x.arr > 9) ``` ``` ## [1] 4 ``` Many functions _do_ preserve array structure: ``` r y.arr <- -x.arr y.arr + x.arr ``` ``` ## [,1] [,2] ## [1,] 0 0 ## [2,] 0 0 ``` --- Other functions specifically act on each row or column of the array separately: ``` r x.arr ``` ``` ## [,1] [,2] ## [1,] 7 0 ## [2,] 8 45 ``` ``` r rowSums(x.arr) ``` ``` ## [1] 7 53 ``` ``` r colSums(x.arr) ``` ``` ## [1] 15 45 ``` --- ## Matrices In R, a matrix is a specialization of a 2D array ``` r (myMat <- matrix(c(40, 1, 60, 3), nrow = 2)) ``` ``` ## [,1] [,2] ## [1,] 40 60 ## [2,] 1 3 ``` ``` r is.array(myMat) ``` ``` ## [1] TRUE ``` ``` r is.matrix(myMat) ``` ``` ## [1] TRUE ``` Other arguments to `matrix()`: - `ncol` - `byrow = TRUE` to fill by rows. --- ## Matrices - Element-wise operations with the usual arithmetic and comparison operators work ``` r myMat / 3 ``` ``` ## [,1] [,2] ## [1,] 13.3333333 20 ## [2,] 0.3333333 1 ``` Compare whole matrices with `identical()` or `all.equal()` ``` r identical(myMat, x.arr) ``` ``` ## [1] FALSE ``` --- ## Matrix multiplication Gets a special operator ``` r six.sevens <- matrix(rep(7, 6), ncol = 3) six.sevens ``` ``` ## [,1] [,2] [,3] ## [1,] 7 7 7 ## [2,] 7 7 7 ``` ``` r myMat %*% six.sevens # [2x2] * [2x3] ``` ``` ## [,1] [,2] [,3] ## [1,] 700 700 700 ## [2,] 28 28 28 ``` <small>What happens if you try `six.sevens %*% myMat`?</small> --- ## Multiplying matrices and vectors Numeric vectors can act like proper vectors: ``` r myVec <- c(10, 20) myMat %*% myVec ``` ``` ## [,1] ## [1,] 1600 ## [2,] 70 ``` ``` r myVec %*% myMat ``` ``` ## [,1] [,2] ## [1,] 420 660 ``` <small>R silently casts the vector as either a row or a column matrix (to make the matrix multiplication operation make sense)</small> --- ## Other matrix operators Transpose: ``` r t(myMat) ``` ``` ## [,1] [,2] ## [1,] 40 1 ## [2,] 60 3 ``` Determinant: ``` r det(myMat) ``` ``` ## [1] 60 ``` Diagonal: The `diag()` function can extract the diagonal entries of a matrix: ``` r diag(myMat) ``` ``` ## [1] 40 3 ``` --- ## Other matrix operators Inverting a matrix: ``` r solve(myMat) ``` ``` ## [,1] [,2] ## [1,] 0.05000000 -1.0000000 ## [2,] -0.01666667 0.6666667 ``` ``` r myMat %*% solve(myMat) ``` ``` ## [,1] [,2] ## [1,] 1 0 ## [2,] 0 1 ``` --- ## Doing the same thing to each row or column - We already saw this with arrays: - `rowSums()`, `colSums()` return the row and column sums - Also the mean: `rowMeans()`, `colMeans()` - Input is a matrix, output is a vector ``` r myMat; colMeans(myMat) ``` ``` ## [,1] [,2] ## [1,] 40 60 ## [2,] 1 3 ``` ``` ## [1] 20.5 31.5 ``` --- - `summary()`: vector-style summary of *column* ``` r myMat; summary(myMat) ``` ``` ## [,1] [,2] ## [1,] 40 60 ## [2,] 1 3 ``` ``` ## V1 V2 ## Min. : 1.00 Min. : 3.00 ## 1st Qu.:10.75 1st Qu.:17.25 ## Median :20.50 Median :31.50 ## Mean :20.50 Mean :31.50 ## 3rd Qu.:30.25 3rd Qu.:45.75 ## Max. :40.00 Max. :60.00 ``` --- `apply()`, takes 3 arguments, `X`, `MARGIN`, `FUN` - `X`: the array or matrix - `MARGIN`: 1 for rows and 2 for columns - `FUN`: name of the function to apply to each ``` r rowMeans(myMat) ``` ``` ## [1] 50 2 ``` ``` r apply(myMat, 1, mean) ``` ``` ## [1] 50 2 ``` <small>What would `apply(myMat, 1, max)` do?</small> --- ## Lists - Lists are a generic container - Sequence of values, _not_ necessarily all of the same type ``` r my.distribution <- list("exponential", 7, FALSE) my.distribution ``` ``` ## [[1]] ## [1] "exponential" ## ## [[2]] ## [1] 7 ## ## [[3]] ## [1] FALSE ``` - Most of what you can do with vectors you can also do with lists - This is an unnamed list --- ## Lists - Elements can be vectors of any type, or other data structures like matrices - This is a named list ``` r l <- list( x = 1:4, y = c("hi", "hello", "jello"), z = matrix(c(TRUE, FALSE, FALSE, FALSE), nrow = 2) ) l ``` ``` ## $x ## [1] 1 2 3 4 ## ## $y ## [1] "hi" "hello" "jello" ## ## $z ## [,1] [,2] ## [1,] TRUE FALSE ## [2,] FALSE FALSE ``` --- ## Lists Make an empty list to fill in later ``` r myList <- vector(mode = "list", length = 4) myList ``` ``` ## [[1]] ## NULL ## ## [[2]] ## NULL ## ## [[3]] ## NULL ## ## [[4]] ## NULL ``` --- ## Accessing pieces of lists Can use `[ ]` as with vectors or use `[[ ]]`, but only with a single index `[[ ]]` drops names and structures, `[ ]` does not ``` r l[1] ``` ``` ## $x ## [1] 1 2 3 4 ``` ``` r l[[1]] ``` ``` ## [1] 1 2 3 4 ``` <small>Does `l[[1:2]]` work?</small> --- ## Accessing pieces of lists Helpful illustration from R for Data Science (Chapter 20.5.3): .pull-left[ <img src="img/pepperShaker1.png" width="110%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="img/pepperShaker2.png" width="110%" style="display: block; margin: auto;" /> ] --- ## Working with lists .pull-left[ ``` r my.distribution ``` ``` ## [[1]] ## [1] "exponential" ## ## [[2]] ## [1] 7 ## ## [[3]] ## [1] FALSE ``` ] .pull-right[ ``` r is.character(my.distribution) ``` ``` ## [1] FALSE ``` ``` r is.character(my.distribution[[1]]) ``` ``` ## [1] TRUE ``` ``` r my.distribution[[2]]^2 ``` ``` ## [1] 49 ``` ] <small>What happens if you try `my.distribution[2]^2`?</small> <small>What happens if you try `[[ ]]` on a vector?</small> --- ## Filling in lists ``` r myList[[1]] <- 1:10 myList ``` ``` ## [[1]] ## [1] 1 2 3 4 5 6 7 8 9 10 ## ## [[2]] ## NULL ## ## [[3]] ## NULL ## ## [[4]] ## NULL ``` <small>What happens if you try `myList[1] <- 1:10`?</small> --- ## Expanding and contracting lists Add to lists with `c()` (also works with vectors): ``` r my.distribution <- c(my.distribution, 7) my.distribution ``` ``` ## [[1]] ## [1] "exponential" ## ## [[2]] ## [1] 7 ## ## [[3]] ## [1] FALSE ## ## [[4]] ## [1] 7 ``` --- Chop off the end of a list by setting the length to something smaller (also works with vectors): ``` r length(my.distribution) ``` ``` ## [1] 4 ``` ``` r length(my.distribution) <- 3 my.distribution ``` ``` ## [[1]] ## [1] "exponential" ## ## [[2]] ## [1] 7 ## ## [[3]] ## [1] FALSE ``` --- ## Summary -- - Arrays - Accessing array values - Operating on arrays - Array functions - Matrices - Matrix multiplication - Other matrix operators - Introduction to lists - Accessing pieces of lists - Working with lists