Instructions

Upload a PDF file, named with your UC Davis email ID and homework number (e.g., xtai_hw1.pdf), to Gradescope (accessible through Canvas). You will give the commands to answer each question in its own code block, which will also produce output that will be automatically embedded in the output file. Each answer must be supported by any written statements as well as any code used.

All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE or include = FALSE as options anywhere). qmd/Rmd files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.

Students may choose to collaborate with each other on the homework, but must clearly indicate with whom they collaborated.

Please assign the pages with your answers to the corresponding questions when submitting your homework on Gradescope. Points will be taken off if you fail to do so.

Problem 1 (20 points)

Assume that tree heights follow a symmetric, bell-shaped distribution with mean 50 feet and variance 16.

  1. What common probability distribution do tree heights follow, and what are the parameters?

  2. A randomly selected tree has height 25 feet. What is the probability of a tree being taller than this tree?

  3. What is the probability of heights between 60 and 80 feet?

Problem 2 (25 points)

Recall that in class (lecture 10), we learned the following rules of thumb for symmetric, bell-shaped distributions: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively. This is actually derived from the normal distribution.

  1. What are the relevant z-scores represting one, two, and three standard deviations of the mean?

  2. Use differences of two pnorm() values to derive each of the above three percentages.

  3. For the one standard deviation version, how do we derive the same required probability in (b) using only a single call of pnorm() with a negative z-score?

Problem 3 (25 points)

Suppose \(X_1, X_2, ..., X_n\) are independent and identically distributed random variables, distributed \(N(\mu, \sigma^2)\). Let \(Y = \frac{\sum_{i = 1}^n X_i}{4n}\).

  1. What is the distribution of \(Y\)? Include information about the values of the population mean and variance.

  2. Now assume \(n\) is large. Use the Central Limit Theorem to get an approximate distribution for \(Y\), and derive its mean and variance.

Problem 4 (30 points)

Assume that the number of branches on a redwood tree follows a normal distribution with mean 150 and standard deviation 30. Let the random variable \(X_i\) denote the number of branches on the \(i\)th redwood tree, where \(i = 1, ..., n\). Then, \(X_i \sim N(150, 30^2)\).

  1. What is the probability of a tree having more than 180 branches? Calculate this in R using the original \(X_i \sim N(150, 30^2)\) distribution, and after standardizing to a standard normal distribution.

  2. Assume the samples are independent. What is the approximate distribution of the sampling distribution of the sample mean, \(\overline{X}\)?

  3. Simulate 1000 draws, \(X_1\) to \(X_{1000}\), and calculate the sample mean.

  4. The following code repeats (c) 5000 times. Each row is one of the 5000 experiments, and each column is one of 1000 observations in each sample. Calculate the 5000 sample means. What are the mean and standard deviation of these 5000 sample means? Is it close to what you would expect?

set.seed(0)
my5000samples <- t(replicate(5000, rnorm(1000, 150, 30)))

Appendix

sessionInfo()
## R version 4.4.0 (2024-04-24)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.36     R6_2.5.1          fastmap_1.2.0     xfun_0.46        
##  [5] cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1 rmarkdown_2.27   
##  [9] lifecycle_1.0.4   cli_3.6.2         sass_0.4.9        jquerylib_0.1.4  
## [13] compiler_4.4.0    rstudioapi_0.16.0 tools_4.4.0       evaluate_0.24.0  
## [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.8