Instructions

Upload a PDF file, named with your UC Davis email ID and homework number (e.g., xtai_hw1.pdf), to Gradescope (accessible through Canvas). You will give the commands to answer each question in its own code block, which will also produce output that will be automatically embedded in the output file. Each answer must be supported by any written statements as well as any code used.

All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE or include = FALSE as options anywhere). Rmd files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.

Students may choose to collaborate with each other on the homework, but must clearly indicate with whom they collaborated.

Problem 1 (20 points)

Assume that tree heights follow a symmetric, bell-shaped distribution with average 100 feet and standard deviation 25.

  1. What common probability distribution do tree heights follow, and what are the parameters?

  2. We find a tree with height 25 feet. What is the probability of a tree being taller than this tree?

  3. What is the probability of heights between 120 and 180 feet?

Problem 2 (25 points)

Recall that in class (lecture 10), we learned the following rules of thumb for symmetric, bell-shaped distributions: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively. This is actually derived from the normal distribution.

  1. Use differences of two pnorm() values to derive each of the above three percentages.

  2. For the two standard deviation version, do the same as in (a) using symmetry of the normal distribution (i.e., you should only use a single call of pnorm()).

Problem 3 (25 points)

Suppose \(X_1, X_2, ..., X_n\) are independent \(N(\mu, \sigma^2)\) random variables. Let \(Y = \frac{\sum_{i = 1}^n X_i}{2n}\).

  1. What is the distribution of \(Y\)? Include information about the values of the population mean and variance.

  2. Now assume \(n\) is large. Use the Central Limit Theorem to get an approximate distribution for \(Y\), and derive its mean and variance.

Problem 4 (30 points)

Assume that the number of branches on a redwood tree follows a normal distribution with mean 150 and standard deviation 30. Let the random variable \(X_i\) denote the number of branches on the \(i\)th redwood tree, where \(i = 1, ..., n\). Then, \(X_i \sim N(150, 30^2)\).

  1. What is the probability of a tree having more than 180 branches? Calculate this in R using the original \(X_i \sim N(150, 30^2)\) distribution, and after standardizing to a standard normal distribution.

  2. Assume the samples are independent. What is the approximate distribution of the sampling distribution of the sample mean, \(\overline{X}\)?

  3. Simulate 1000 draws, \(X_1\) to \(X_{1000}\), and calculate the sample mean.

  4. Repeat (c) 5000 times and calculate the 5000 sample means. What are the mean and standard deviation of these 5000 sample means? Is it close to what you would expect?

Appendix

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS  10.16
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.6.33   R6_2.5.1        jsonlite_1.7.1  magrittr_2.0.3 
##  [5] evaluate_0.16   stringi_1.7.8   cachem_1.0.5    rlang_1.1.1    
##  [9] cli_3.3.0       rstudioapi_0.13 jquerylib_0.1.4 bslib_0.5.1    
## [13] rmarkdown_2.24  tools_4.0.2     stringr_1.4.1   xfun_0.40      
## [17] yaml_2.2.1      fastmap_1.1.1   compiler_4.0.2  htmltools_0.5.6
## [21] knitr_1.40      sass_0.4.1