Fundamentals of R: More Data Structures

class: center, middle, inverse, title-slide

.title[
# Fundamentals of R: More Data Structures
]
.subtitle[
## STA35A: Statistical Data Science 1
]
.author[
### Xiao Hui Tai
]
.date[
### October 2, 2024
]

---

layout: true

---

## Recap: Vectors and arrays

--
- Vectors: 
  - Additional attributes: Factors, dates
  - Vector arithmetic
  - Other functions on vectors
    - Comparison operators 
    - Indexing operators
  - Named components (today)

- Arrays
  - Vectors with dimension

---
## Today
- Arrays

- Matrices

- Introduction to lists

---

## Named components

You can give names to elements or components of vectors

.small[

``` r
x <- c(7, 8, 10, 45)
y <- c(-7, -8, -10, -45)
(names(x) <- c("v1", "v2", "v3", "fred"))
```

```
## [1] "v1"   "v2"   "v3"   "fred"
```

``` r
x[c("fred", "v1")]
```

```
## fred   v1 
##   45    7
```

``` r
x[c(4, 1)]
```

```
## fred   v1 
##   45    7
```
]

Note the labels are in what R prints; not actually part of the value

---
`names(x)` is just another vector (of characters):

``` r
names(y) <- names(x)
sort(names(x))
```

```
## [1] "fred" "v1"   "v2"   "v3"
```

``` r
which(names(x) == "fred")
```

```
## [1] 4
```

---
## Arrays

- Many data structures in R are made by adding bells and whistles to vectors

- **Arrays** are vectors with *dimensions*

- For example a two-dimensional array:

``` r
x <- c(7, 8, 10, 45)
x.arr <- array(x, dim = c(2, 2))
x.arr
```

```
##      [,1] [,2]
## [1,]    7   10
## [2,]    8   45
```

---
## Arrays

- Filled column-wise (by columns)

- `dim` says how many rows and columns

``` r
dim(x.arr)
```

```
## [1] 2 2
```
---

## Arrays with more than two dimensions
- Arrays can have three dimensions ( `$r \times c \times h$`; think about stacking `$r \times c$` matrices)

- Can also have `$4, 5, \ldots n$` dimensional arrays

- `dim` is a length `$n$` vector
  - Says the size/number of indices of each component
  - e.g., a `$4 \times 3$` array has `dim` length 2, elements are 4 and 3

.small[

``` r
myArr <- array(1:12, dim = c(4, 3))
myArr
```

```
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
```

``` r
dim(myArr)
```

```
## [1] 4 3
```
]

---

## Arrays with more than two dimensions
- A `$4 \times 3 \times 2$` array has `dim` length 3, elements are 4, 3 and 2

``` r
myArr <- array(1:24, dim = c(4, 3, 2))
myArr
```

```
## , , 1
## 
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12
## 
## , , 2
## 
##      [,1] [,2] [,3]
## [1,]   13   17   21
## [2,]   14   18   22
## [3,]   15   19   23
## [4,]   16   20   24
```

---

## Arrays with more than two dimensions
- A `$4 \times 3 \times 2$` array has `dim` length 3, elements are 4, 3 and 2

``` r
dim(myArr)
```

```
## [1] 4 3 2
```

Some other properties of the array:

``` r
is.vector(myArr)
```

```
## [1] FALSE
```

``` r
is.array(myArr)
```

```
## [1] TRUE
```

---

``` r
typeof(myArr)
```

```
## [1] "integer"
```

``` r
str(myArr)
```

```
##  int [1:4, 1:3, 1:2] 1 2 3 4 5 6 7 8 9 10 ...
```

``` r
attributes(myArr)
```

```
## $dim
## [1] 4 3 2
```
`typeof()` returns the type of the _elements_

`str()` gives the **structure**: here, a numeric array, with three dimensions, size/indices, and then the actual numbers

---

## Arrays: Accessing and operating on arrays

Can access a 2-D array either by pairs of indices or by the underlying vector:

``` r
x <- c(7, 8, 10, 45)
(x.arr <- array(x, dim = c(2, 2)))
```

```
##      [,1] [,2]
## [1,]    7   10
## [2,]    8   45
```

``` r
x.arr[1, 2]
```

```
## [1] 10
```

``` r
x.arr[3]
```

```
## [1] 10
```
Remember that arrays are filled column-wise

---
## Arrays: Accessing and operating on arrays
Changing array values:

``` r
x.arr
```

```
##      [,1] [,2]
## [1,]    7   10
## [2,]    8   45
```

``` r
x.arr[3] <- 0
x.arr
```

```
##      [,1] [,2]
## [1,]    7    0
## [2,]    8   45
```

---
## Arrays: Accessing and operating on arrays
Omitting an index means "all of it":

``` r
x.arr[c(1:2), 2]
```

```
## [1]  0 45
```

``` r
x.arr[, 2]
```

```
## [1]  0 45
```

---
## Functions on arrays
Using a **vector-style function** on an array structure will go down to the underlying vector, *unless* the function is set up to handle arrays specially:

``` r
which(x.arr > 9)
```

```
## [1] 4
```

Many functions _do_ preserve array structure:

``` r
y.arr <- -x.arr
y.arr + x.arr
```

```
##      [,1] [,2]
## [1,]    0    0
## [2,]    0    0
```

---
Other functions specifically act on each row or column of the array separately:

``` r
x.arr
```

```
##      [,1] [,2]
## [1,]    7    0
## [2,]    8   45
```

``` r
rowSums(x.arr)
```

```
## [1]  7 53
```

``` r
colSums(x.arr)
```

```
## [1] 15 45
```

---
## Matrices
In R, a matrix is a specialization of a 2D array

``` r
(myMat <- matrix(c(40, 1, 60, 3), nrow = 2))
```

```
##      [,1] [,2]
## [1,]   40   60
## [2,]    1    3
```

``` r
is.array(myMat)
```

```
## [1] TRUE
```

``` r
is.matrix(myMat)
```

```
## [1] TRUE
```

Other arguments to `matrix()`:
- `ncol`
- `byrow = TRUE` to fill by rows.

---
## Matrices
- Element-wise operations with the usual arithmetic and comparison operators work

``` r
myMat / 3
```

```
##            [,1] [,2]
## [1,] 13.3333333   20
## [2,]  0.3333333    1
```

Compare whole matrices with `identical()` or `all.equal()`

``` r
identical(myMat, x.arr)
```

```
## [1] FALSE
```

---
## Matrix multiplication
Gets a special operator

``` r
six.sevens <- matrix(rep(7, 6), ncol = 3)
six.sevens
```

```
##      [,1] [,2] [,3]
## [1,]    7    7    7
## [2,]    7    7    7
```

``` r
myMat %*% six.sevens # [2x2] * [2x3]
```

```
##      [,1] [,2] [,3]
## [1,]  700  700  700
## [2,]   28   28   28
```

What happens if you try `six.sevens %*% myMat`?

---
## Multiplying matrices and vectors
Numeric vectors can act like proper vectors:

``` r
myVec <- c(10, 20)
myMat %*% myVec
```

```
##      [,1]
## [1,] 1600
## [2,]   70
```

``` r
myVec %*% myMat
```

```
## [,1] [,2]
## [1,] 420 660
```
R silently casts the vector as either a row or a column matrix (to make the matrix multiplication operation make sense)

---
## Other matrix operators

Transpose:

``` r
t(myMat)
```

```
##      [,1] [,2]
## [1,]   40    1
## [2,]   60    3
```

Determinant:

``` r
det(myMat)
```

```
## [1] 60
```

Diagonal:
The `diag()` function can extract the diagonal entries of a matrix:

``` r
diag(myMat)
```

```
## [1] 40  3
```

---
## Other matrix operators
Inverting a matrix:

``` r
solve(myMat)
```

```
##             [,1]       [,2]
## [1,]  0.05000000 -1.0000000
## [2,] -0.01666667  0.6666667
```

``` r
myMat %*% solve(myMat)
```

```
##      [,1] [,2]
## [1,]    1    0
## [2,]    0    1
```

---
## Doing the same thing to each row or column
- We already saw this with arrays:
  - `rowSums()`, `colSums()` return the row and column sums
  - Also the mean: `rowMeans()`, `colMeans()`
  - Input is a matrix, output is a vector

``` r
myMat; colMeans(myMat)
```

```
##      [,1] [,2]
## [1,]   40   60
## [2,]    1    3
```

```
## [1] 20.5 31.5
```

---
- `summary()`: vector-style summary of *column*

``` r
myMat; summary(myMat)
```

```
##      [,1] [,2]
## [1,]   40   60
## [2,]    1    3
```

```
##        V1              V2       
##  Min.   : 1.00   Min.   : 3.00  
##  1st Qu.:10.75   1st Qu.:17.25  
##  Median :20.50   Median :31.50  
##  Mean   :20.50   Mean   :31.50  
##  3rd Qu.:30.25   3rd Qu.:45.75  
##  Max.   :40.00   Max.   :60.00
```
---
`apply()`, takes 3 arguments, `X`, `MARGIN`, `FUN`
- `X`: the array or matrix
- `MARGIN`: 1 for rows and 2 for columns
- `FUN`: name of the function to apply to each

``` r
rowMeans(myMat)
```

```
## [1] 50  2
```

``` r
apply(myMat, 1, mean)
```

```
## [1] 50  2
```

What would `apply(myMat, 1, max)` do?
---
## Lists
- Lists are a generic container
- Sequence of values, _not_ necessarily all of the same type

``` r
my.distribution <- list("exponential", 7, FALSE)
my.distribution
```

```
## [[1]]
## [1] "exponential"
## 
## [[2]]
## [1] 7
## 
## [[3]]
## [1] FALSE
```

- Most of what you can do with vectors you can also do with lists
- This is an unnamed list
---
## Lists
- Elements can be vectors of any type, or other data structures like matrices
- This is a named list

``` r
l <- list(
 x = 1:4,
 y = c("hi", "hello", "jello"),
 z = matrix(c(TRUE, FALSE, FALSE, FALSE), nrow = 2)
)
l
```

```
## $x
## [1] 1 2 3 4
## 
## $y
## [1] "hi"    "hello" "jello"
## 
## $z
##       [,1]  [,2]
## [1,]  TRUE FALSE
## [2,] FALSE FALSE
```

---
## Lists
Make an empty list to fill in later

``` r
myList <- vector(mode = "list", length = 4)
myList
```

```
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
```
---
## Accessing pieces of lists
Can use `[ ]` as with vectors  
or use `[[ ]]`, but only with a single index  
`[[ ]]` drops names and structures, `[ ]` does not

``` r
l[1]
```

```
## $x
## [1] 1 2 3 4
```

``` r
l[[1]]
```

```
## [1] 1 2 3 4
```
Does `l[[1:2]]` work?

---
## Accessing pieces of lists
Helpful illustration from R for Data Science (Chapter 20.5.3):

.pull-left[
<img src="img/pepperShaker1.png" width="110%" style="display: block; margin: auto;" />
]

.pull-right[
<img src="img/pepperShaker2.png" width="110%" style="display: block; margin: auto;" />
]
---
## Working with lists 
.pull-left[

``` r
my.distribution
```

```
## [[1]]
## [1] "exponential"
## 
## [[2]]
## [1] 7
## 
## [[3]]
## [1] FALSE
```
]

.pull-right[

``` r
is.character(my.distribution)
```

```
## [1] FALSE
```

``` r
is.character(my.distribution[[1]])
```

```
## [1] TRUE
```

``` r
my.distribution[[2]]^2
```

```
## [1] 49
```
]

What happens if you try `my.distribution[2]^2`?
What happens if you try `[[ ]]` on a vector?

---
## Filling in lists

``` r
myList[[1]] <- 1:10
myList
```

```
## [[1]]
## [1] 1 2 3 4 5 6 7 8 9 10
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL
## 
## [[4]]
## NULL
```
What happens if you try `myList[1] <- 1:10`?

---
## Expanding and contracting lists
Add to lists with `c()` (also works with vectors):

``` r
my.distribution <- c(my.distribution, 7)
my.distribution
```

```
## [[1]]
## [1] "exponential"
## 
## [[2]]
## [1] 7
## 
## [[3]]
## [1] FALSE
## 
## [[4]]
## [1] 7
```

---
Chop off the end of a list by setting the length to something smaller (also works with vectors):

``` r
length(my.distribution)
```

```
## [1] 4
```

``` r
length(my.distribution) <- 3
my.distribution
```

```
## [[1]]
## [1] "exponential"
## 
## [[2]]
## [1] 7
## 
## [[3]]
## [1] FALSE
```

---
## Summary
--

- Arrays
  - Accessing array values
  - Operating on arrays
  - Array functions

- Matrices
  - Matrix multiplication
  - Other matrix operators

- Introduction to lists 
  - Accessing pieces of lists 
  - Working with lists