Homework 5

Instructions

Upload a PDF file, named with your UC Davis email ID and homework number (e.g., xtai_hw1.pdf), to Gradescope (accessible through Canvas). You will give the commands to answer each question in its own code block, which will also produce output that will be automatically embedded in the output file. Each answer must be supported by any written statements as well as any code used.

All code used to produce your results must be shown in your PDF file (e.g., do not use echo = FALSE or include = FALSE as options anywhere). Rmd files do not need to be submitted, but may be requested by the TA and must be available when the assignment is submitted.

Students may choose to collaborate with each other on the homework, but must clearly indicate with whom they collaborated.

Please assign the pages with your answers to the corresponding questions when submitting your homework on Gradescope. Points will be taken off if you fail to do so.

Problem 1 (30 points)

Consider a population in which 80% of those who are college-educated are employed, and 60% of those who are not college-educated are employed. In this population, 55% of individuals are not college-educated.

What is the probability of being employed? (You may handwrite your answer to this part, if you prefer. You can include an image in a code chunk using knitr::include_graphics("myImg.png").)
If I pick five people at random from this population, what is the probability that none of those chosen is employed? (Hint: what random variable can we define? What distribution does this random variable follow?) Calculate the required probability by hand (you may use R as a calculator), then in R using a single function.

Problem 2 (45 points)

Assume that a college can admit at most 930 freshmen. Assume that it sends out 1500 acceptances and that each student comes to the college with probability .6, and that the students make decisions independently of one another.

What is the probability that the college ends up exactly the number of students it can accommodate?
What is the probability that the college ends up with with more students than it can accommodate?
What is the (theoretical) mean and variance of the distribution that you used in (a) and (b)?
In R, simulate the 1500 decisions that the accepted students make (i.e., create a binary vector of length 1500, indicating whether or not students attended the college). How many students, out of 1500, attended the college in your simulation? This number represents a single draw from the distribution that you used in (a) and (b).
In (d), what distribution did you use for each of the draws? What is (are) the parameter(s), and what is the theoretical mean and variance?
(Continued from (e)) As the sample size grows, what value do you expect the sample mean to converge to, and why? Does your answer in (d) make sense? If we had a sample size of 10000 (instead of 1500), what value would we expect for the number of students, out of 10000, that attend?

Problem 3 (25 points)

Assume the number of accidents at a busy intersection is one a month on average, and follows a distribution commonly used to model rare events.

What distribution does the number of accidents at that intersection in a year follow, and what is (are) the parameter(s)? What is the mean and variance?
What is the probability that there are 10 or fewer accidents in a year? Show how you would work out the required probability by hand (you don’t have to compute the value), and compute it in R.
Would it be considered unusual to have 3 or fewer accidents in a year?

Appendix

sessionInfo()

## R version 4.4.0 (2024-04-24)
## Platform: x86_64-apple-darwin20
## Running under: macOS Sonoma 14.6.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] dplyr_1.1.4   ggplot2_3.5.1
## 
## loaded via a namespace (and not attached):
##  [1] vctrs_0.6.5       cli_3.6.2         knitr_1.48        rlang_1.1.4      
##  [5] xfun_0.46         generics_0.1.3    jsonlite_1.8.8    glue_1.7.0       
##  [9] colorspace_2.1-0  htmltools_0.5.8.1 sass_0.4.9        fansi_1.0.6      
## [13] scales_1.3.0      rmarkdown_2.27    grid_4.4.0        evaluate_0.24.0  
## [17] munsell_0.5.1     jquerylib_0.1.4   tibble_3.2.1      fastmap_1.2.0    
## [21] yaml_2.3.10       lifecycle_1.0.4   compiler_4.4.0    pkgconfig_2.0.3  
## [25] rstudioapi_0.16.0 digest_0.6.36     R6_2.5.1          tidyselect_1.2.1 
## [29] utf8_1.2.4        pillar_1.9.0      magrittr_2.0.3    bslib_0.8.0      
## [33] withr_3.0.0       tools_4.4.0       gtable_0.3.5      cachem_1.1.0