class: center, middle, inverse, title-slide .title[ # Bayes’ Theorem and Random Variables ] .subtitle[ ##
STA35A: Statistical Data Science 1 ] .author[ ### Xiao Hui Tai ] .date[ ### November 1, 2024 ] --- layout: true <!-- <div class="my-footer"> --> <!-- <span> --> <!-- <a href="https://datasciencebox.org" target="_blank">datasciencebox.org</a> --> <!-- </span> --> <!-- </div> --> --- <style type="text/css"> .tiny .remark-code { font-size: 60%; } .small .remark-code { font-size: 80%; } .tiny { font-size: 60%; } .small { font-size: 80%; } </style> ## Today - Bayes' Theorem - Random variables - Expectation and variance - Discrete and continuous random variables --- ## Bayes' Theorem - Often we know `\(P(B|A)\)` when we really want `\(P(A|B)\)` - For example, a 40-year-old woman has a positive screening mammogram. - `\(A\)` = cancer - `\(B\)` = positive mammogram screening result. -- - Using Bayes' Theorem is sometimes described as "updating our beliefs" - Without any information on the test result: the probability of cancer is `\(P(A)\)`. With test result we can calculate `\(P(A|B)\)` - Want `\(P(A|B)\)` <!-- , which is the probability of having cancer given a positive screening result. --> -- - Need `\(P(A)\)`, the prevalance of breast cancer among 40-year-old women. - Properties of screening tool: - `\(P(B | A)\)` - `\(P(B | A^c)\)` - `\(P(B)\)` <!-- (called **sensitivity**, the probability of a positive result given that a person has cancer) --> --- ## Bayes' Theorem **Bayes' theorem** says that `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)`. - The last equality is by the law of total probability <img src="img/bayes.webp" width="50%" style="display: block; margin: auto;" /> Credit: Matt Buck, Flickr, CC BY-SA 2.0 --- ## Bayes' Theorem `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)` - `\(A\)` = cancer, `\(B\)` = positive screening result. - Say `\(P(A) = .01\)` - `\(P(B | A)= .85\)` - `\(P(B|A^c) = .1\)` (probability of a positive screening result given that a person does not have cancer) What is `\(P(A \mid B)\)`? --- ## Bayes' Theorem in Action `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)` - `\(P(A) = .01\)` - `\(P(B | A) = .85\)` - `\(P(B|A^c) = .1\)` - Then `\(P(A \mid B) =\frac{.85*.01}{.85*.01 + .1*(1 - .01)} = 0.079\)` --- ## Behind the scenes: Hypothetical 10,000 Consider a hypothetical population of 10,000 40-year-old women. .pull-left[ We have 1. The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`. 2. The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`. 3. The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`. ] .pull-right[ | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | | | | | Mammo - | | | | | Total | | | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ We have 1. **The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`.** 2. The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`. 3. The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`. ] .pull-right[ | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | | | | | Mammo - | | | | | Total | 100 | 9900 | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ We have 1. The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`. 2. **The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`.** 3. The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`. ] .pull-right[ | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | 85 | | | | Mammo - | 15 | | | | Total | 100 | 9900 | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ We have 1. The prevalance of breast cancer among 40-year-old women, `\(P(A)=.01\)`. 2. The sensitivity of a screening mammogram for diagnosing cancer, `\(P(B | A)=.85\)`. 3. **The probability of a positive screening result given that a person does not have cancer, `\(P(B|A^c) = .1\)`.** ] .pull-right[ | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | 85 | 990 | | | Mammo - | 15 | 8910 | | | Total | 100 | 9900 | 10000 | ] --- ## Behind the scenes: Hypothetical 10,000 .pull-left[ - Now fill in row totals - Conditional probability of cancer given a positive mammogram? - This entire computation is equivalent to doing `\(P(A \mid B) =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)` ] .pull-right[ | | Cancer (A) | No Cancer | Total | |---|---:|---:|---:| | Mammo + (B) | 85 | 990 | 1075 | | Mammo - | 15 | 8910 | 8925 | | Total | 100 | 9900 | 10000 | ] --- ## Exercise Edison Research gathered exit poll results from several sources for the Wisconsin recall election of Scott Walker. They found that 53% of the respondents voted in favor of Scott Walker. Additionally, they estimated that of those who did vote in favor for Scott Walker, 37% had a college degree, while 44% of those who voted against Scott Walker had a college degree. Suppose we randomly sampled a person who participated in the exit poll and found that he had a college degree. What is the probability that he voted in favor of Scott Walker? --- ## Random variables - A random variable = mapping from possible outcomes in a sample space to a probability space. - Recall: **sample space** is the set of all possible outcomes from a **random process** <img src="img/coinToss.png" width="60%" style="display: block; margin: auto;" /> .small[ Source: https://medium.com/jun94-devpblog/prob-stats-1-random-variable-483c45242b3c ] --- ## Random variables <img src="img/coinToss.png" width="60%" style="display: block; margin: auto;" /> .small[ Source: https://medium.com/jun94-devpblog/prob-stats-1-random-variable-483c45242b3c ] - Let `\(X\)` be the random variable indicating whether a coin flip results in heads. - Instead of saying `\(P(\texttt{heads})\)`, we say `\(P(X = 1)\)` - This representation allows us to apply mathematical frameworks and get a better understanding of real-world phenomenon --- ## Random variables - Denoted by capital letters, e.g., `\(X\)`, `\(Y\)`, `\(Z\)` - A realization or draw of the random variable is denoted by a lowercase letter, `\(x\)`, `\(y\)`, `\(z\)` -- - Other examples of random variables: - Mass of classroom chairs - Ages of students at UC Davis -- - **Discrete** random variables: - Each outcome has an associated probability `\(P(X = x_i)\)` where `\(i = 1, ..., k\)` ( `\(k\)` outcomes are denoted by lower-case, `\(x_1, ..., x_k\)`) - Sometimes also written as `\(p_1, ..., p_k\)` --- ## Expectation - Two books assigned: textbook and study guide. - 20% of students do not buy either book, 55% buy the textbook only, and 25% buy both books - Say the class has 100 students. How many books should the bookstore expect to sell to the class? -- - How many books should the bookstore expect to sell per student? - Intuitively: `\(.2*0 + .55*1 + .25*2 = 1.05\)` --- ## Expectation - Let `\(X=\)` number of books sold per student - The three possible outcomes are `\(x_1\)` = 0 books, `\(x_2\)` = 1 book (1 textbook for each student), `\(x_3\)` = 2 books (1 textbook and 1 study guide for each student) i | 1 | 2 | 3 --|--|--|-- `\(x_i\)` | 0 | 1 | 2 `\(P(X = x_i)\)` | .2 | .55 | .25 $$ `\begin{aligned} E(X) &= x_1 \times P(X = x_1) + x_2 \times P(X = x_2) + ... + x_k \times P(X = x_k)\\ &= \sum_{i=1}^k x_i P(X = x_i) \end{aligned}` $$ - Using this definition: `\(E(X) = 0*.2 + 1* .55 + 2 * .25 = 1.05\)`. --- ## Expectation - Say we are interested in the **amount of revenue** that the bookstore can expect to earn per student. - The textbook costs $ 137 and the study guide costs $ 33. - What modifications do we need? Before: - Let `\(X=\)` number of books sold per student - The three possible outcomes are `\(x_1\)` = 0 books, `\(x_2\)` = 1 books (1 textbook for each student), `\(x_3\)` = 2 books (1 textbook and 1 study guide for each student) i | 1 | 2 | 3 --|--|--|-- `\(x_i\)` | 0 | 1 | 2 `\(P(X = x_i)\)` | .2 | .55 | .25 - `\(E(X) = 0*.2 + 1* .55 + 2 * .25 = 1.05\)`. --- ## Expectation - Let `\(X=\)` revenue from books sold per student - The three possible outcomes are `\(x_1\)` = $ 0, `\(x_2\)` = $ 137 (1 textbook for each student), `\(x_3\)` = $ 170 (1 textbook and 1 study guide for each student) i | 1 | 2 | 3 --|--|--|-- `\(x_i\)` | 0 | 137 | 170 `\(P(X = x_i)\)` | .2 | .55 | .25 - Using this definition: `\(E(X) = 0*.2 + 137* .55 + 170 * .25 = 117.85\)`. --- ## Expectation - Denoted by `\(E(X)\)`, `\(\mu\)` or `\(\mu_x\)` - **Expected or average outcome** of `\(X\)`, where `\(X\)` is a random variable -- - Given a probability distribution for a **discrete random variable**, we can calculate it using $$ `\begin{aligned} E(X) &= x_1 \times P(X = x_1) + x_2 \times P(X = x_2) + ... + x_k \times P(X = x_k)\\ &= \sum_{i=1}^k x_i P(X = x_i) \end{aligned}` $$ - Recall: this is a **population parameter**, a fixed quantity - The sample version, the **sample statistic**, is the sample mean `\(\bar{X}\)` --- ## Properties of the expectation - `\(E[c] = c\)`, where c is a constant - `\(E[aX] = aE[X]\)` - `\(E[aX + c] = aE[X] + c\)` - To calculate `\(E(X^2)\)` (will be useful later), simply replace `\(x_i\)` in the sum by `\(x_i^2\)`, i.e., $$ `\begin{aligned} E(X^2) &= x_1^2 \times P(X = x_1) + x_2^2 \times P(X = x_2) + ... + x_k^2 \times P(X = x_k)\\ &= \sum_{i=1}^k x_i^2 P(X = x_i) \end{aligned}` $$ - More generally, `\(E[g(X)] = \sum_{i=1}^k g(x_i) P(X = x_i)\)` (Law of the unconscious statistician) --- ## Variance - Recall: we saw the **sample variance**, calculated for a data set - Take the square of deviations and find the mean - `\(s^2 = \frac{(x_1 - \bar{x})^2 + (x_2 - \bar{x})^2 + ... + (x_n - \bar{x})^2}{n - 1}\)` - **Population variance** is often denoted by `\(\sigma^2\)`, `\(\sigma_x^2\)`, or `\(Var(X)\)` - Given a probability distribution for a discrete random variable, we can calculate it using .small[ $$ `\begin{aligned} Var(X) &= E[(X-\mu)^2] \\ &= (x_1 - \mu)^2 \times P(X = x_1) + (x_2 - \mu)^2 \times P(X = x_2) + ... + (x_k - \mu)^2 \times P(X = x_k)\\ &= \sum_{i=1}^k (x_i - \mu)^2 P(X = x_i) \end{aligned}` $$ ] - Note: rather than summing over observations, these are over possible outcomes, weighted by their probabilities --- ## Variance Another common way to write the variance is $$ `\begin{aligned} Var(X) &= E[(X-E(X))^2] \\ &= E[ (X^2 - 2XE(X) + [E(X)]^2) ] \\ &= E(X^2) - 2E(X)E(X) + [E(X)]^2 \\ &= E(X^2) - [E(X)]^2 \end{aligned}` $$ - Recall: `\(E(X) = \sum_{i=1}^k x_i P(X = x_i)\)` - To calculate `\(E(X^2)\)`, simply replace `\(x_i\)` in the sum above by `\(x_i^2\)`, i.e., $$ `\begin{aligned} E(X^2) &= x_1^2 \times P(X = x_1) + x_2^2 \times P(X = x_2) + ... + x_k^2 \times P(X = x_k)\\ &= \sum_{i=1}^k x_i^2 P(X = x_i) \end{aligned}` $$ --- ## Properties of the variance - `\(Var[c] = 0\)`, where c is a constant - `\(Var[aX] = a^2Var[X]\)` - `\(Var[aX + c] = a^2Var[X]\)` --- ## Linear combinations of random variables - Often we care not just about a single random variable, but a combination of them - E.g., - The total revenue of our bookstore is a combination of books from different classes, not just our one statistics class - The total gain or loss in a stock portfolio is the sum of the gains and losses in its components - Total weekly commute time is a combination of daily commute --- ## Linear combinations of random variables - Let `\(W\)` be the weekly commute time per student at UC Davis - `\(X_1\)` = commute time per student on Monday - `\(X_2\)` = commute time per student on Tuesday - ... - `\(X_5\)` = commute time per student on Friday - `\(W = X_1 + X_2 + ... + X_5\)` is also a random variable --- ## Linear combinations of random variables - A **linear combination** of two random variables, `\(X\)` and `\(Y\)`, is of the form $$ aX + bY, $$ where `\(a\)` and `\(b\)` are constants. - `\(a\)` and `\(b\)` are also called coefficients - In our example, `\(W = X_1 + X_2 + ... + X_5\)` is a linear combination with coefficients 1. --- ## Expectation of linear combinations of random variables - The expectation for a linear combination of random variables is given by $$ E(aX + bY) = aE(X) + bE(Y) $$ - In our example, say `\(E(X_1) = ... = E(X_5) = 21\)` minutes. - Then, `\(E(W) = 1*21 + 1*21 + 1*21 + 1*21 + 1*21 = 105\)` minutes. --- ## Variance of linear combinations of random variables - The variance for a linear combination of **independent** random variables $$ Var(aX + bY) = a^2 Var(X) + b^2 Var(Y) $$ - **Note: this is only true if `\(X\)` and `\(Y\)` are independent**. -- - In our example, say `\(Var(X_1) = ... = Var(X_5) = 5\)` minutes. - Commute times on each day of the week are independent. - Then, `\(Var(W) = 1^2*5 + 1^2*5 + 1^2*5 + 1^2*5 + 1^2*5 = 25\)` minutes. --- ## Exercise A portfolio's value increases by 18% during a financial boom and by 9% during normal times. It decreases by 12% during a recession. What is the expected return on this portfolio if each scenario is equally likely? --- ## Exercise Suppose we have independent observations `\(X_1\)` and `\(X_2\)` from a distribution with mean `\(\mu\)` and standard deviation `\(\sigma\)`. What is the variance of the mean of the two values: `\(\frac{X_1 + X_2}{2}\)`? -- Suppose we have 3 independent observations `\(X_1\)`, `\(X_2\)` and `\(X_3\)` from a distribution with mean `\(\mu\)` and standard deviation `\(\sigma\)`. What is the variance of the mean of these 3 values: `\(\frac{X_1 + X_2 + X_3}{3}\)`? -- Suppose we have n independent observations `\(X_1\)`, `\(X_2\)`, ..., `\(X_n\)` from a distribution with mean `\(\mu\)` and standard deviation `\(\sigma\)`. What is the variance of the mean of these n values: `\(\frac{X_1 + X_2 + ... + X_n}{n}\)`? --- ## Summary -- - Bayes' Theorem - `\(P(A \mid B) =\frac{P(A \cap B)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B)} =\frac{P(B \mid A)P(A)}{P(B \mid A)P(A) + P(B \mid A^c)P(A^c)}\)` - Random variables - Expectation and variance - Discrete random variables