Midterm next Friday 11/15
These formulas will be provided:
Common probability distributions
Use rbinom()
to get 5000 draws from the population
In R:
set.seed(0) # so results are reproducible binomDraws <- rbinom(n = 5000, size = 3, prob = .2)table(binomDraws)/5000
## binomDraws## 0 1 2 3 ## 0.5246 0.3638 0.1040 0.0076
data.frame(binomDraws) %>% ggplot(aes(x = binomDraws)) + geom_bar() + labs(x = "Number of Smokers", title = "5000 samples from Binomial(3, .2)")
set.seed(0) # so results are reproducible binomDraws100 <- rbinom(n = 5000, size = 100, prob = .2)data.frame(binomDraws100) %>% ggplot(aes(x = binomDraws100)) + geom_bar() + labs(x = "Number of Smokers", title = "5000 samples from Binomial(100, .2)")
set.seed(0) # so results are reproducible binomDraws500 <- rbinom(n = 5000, size = 500, prob = .2)data.frame(binomDraws500) %>% ggplot(aes(x = binomDraws500)) + geom_bar() + labs(x = "Number of Smokers", title = "5000 samples from Binomial(500, .2)")
data.frame(binomDraws) %>% bind_cols(size = 3) %>% bind_rows( data.frame(binomDraws100) %>% rename(binomDraws = binomDraws100) %>% bind_cols(size = 100) ) %>% bind_rows( data.frame(binomDraws500) %>% rename(binomDraws = binomDraws500) %>% bind_cols(size = 500) ) %>% ggplot(aes(x = binomDraws, fill = as.factor(size))) + geom_histogram(binwidth = 1, position = "identity", alpha = .7) + labs( x = "Number of smokers", y = "Frequency", title = "5000 samples each from Binomial(3, .2), Binomial(100, .2), Binomial(500, .2)", fill = "Size" )
set.seed(0) # so results are reproducible binomP.2 <- rbinom(n = 5000, size = 100, prob = .2)binomP.5 <- rbinom(n = 5000, size = 100, prob = .5)binomP.7 <- rbinom(n = 5000, size = 100, prob = .7)
Useful for estimating the number of events in a large population over a unit of time.
For example:
It is named after French mathematician Siméon Denis Poisson
E.g.: Number of people having heart attacks in New York City every year
Key ingredients
Fixed interval of time or space
Events happen with a known average rate, independently of time since the last event ("memoryless" property)
The parameter that defines a Poisson distributed random variable is the rate λ, where λ>0
Often used to model rare events
P(X=x)=λxe−λx!, defined over non-negative integer values of x
No upper limit, i.e., x can take very large non-negative integer values
E(X)=λ
Var(X)=λ
P(X=x)=λxe−λx! lets us calculate probabilities of taking a certain value
For x=2 and λ=3, we have
P(X=2)=32e−32!=9(e−3)2(1)=0.2240418
dpois(x = 2, lambda = 3)
## [1] 0.2240418
dpois(x = 10, lambda = 3)
## [1] 0.0008101512
dpois(x = 0:10, lambda = 3)
## [1] 0.0497870684 0.1493612051 0.2240418077 0.2240418077## [5] 0.1680313557 0.1008188134 0.0504094067 0.0216040315## [9] 0.0081015118 0.0027005039 0.0008101512
data.frame(x = 0:10, y = dpois(0:10, lambda = 3)) %>% ggplot(aes(x = x, y = y)) + geom_bar(stat = "identity") + labs(title = "Probability distribution of Poisson(3)", y = "P(X = x)")
data.frame(x = 0:30, y = dpois(0:30, lambda = 3), lambda = 3) %>% bind_rows(data.frame(x = 0:30, y = dpois(0:30, lambda = 10), lambda = 10)) %>% bind_rows(data.frame(x = 0:30, y = dpois(0:30, lambda = 20), lambda = 20)) %>% ggplot(aes(x = x, y = y, fill = as.factor(lambda))) + geom_bar(stat = "identity", position = "identity", alpha = .5) + labs(title = "Probability distribution of \nPoisson(3), Poisson(10), Poisson(20)", y = "P(X = x)", fill = "Lambda")
Simulate random draws using the rpois()
function
rpois()
has the arguments
n
, the number of draws from the distribution lambda
, the meanset.seed(0) # so results are reproducible inputLambda <- 3poissonDraws <- rpois(n = 100, lambda = inputLambda)poissonDraws
## [1] 5 2 2 3 5 2 5 6 4 3 1 2 1 4 2 4 3 4 8 2 4 6 2 4 1 2 2 0 2 5## [31] 2 3 3 3 1 5 4 4 1 4 2 5 3 4 3 3 4 0 3 4 4 3 5 3 2 1 1 2 3 4## [61] 2 5 2 3 2 4 2 3 4 1 5 2 5 2 2 3 5 5 2 4 6 3 4 2 2 4 2 4 1 2## [91] 1 2 1 3 5 4 4 3 2 4
set.seed(0) # so results are reproducible poissonL3 <- rpois(n = 5000, lambda = 3)poissonL10 <- rpois(n = 5000, lambda = 10)poissonL20 <- rpois(n = 5000, lambda = 20)
data.frame(poissonL3) %>% rename(outcome = poissonL3) %>% bind_cols(lambda = 3) %>% bind_rows( data.frame(poissonL10) %>% rename(outcome = poissonL10) %>% bind_cols(lambda = 10) ) %>% bind_rows( data.frame(poissonL20) %>% rename(outcome = poissonL20) %>% bind_cols(lambda = 20) ) %>% ggplot(aes(x = outcome, fill = as.factor(lambda))) + geom_histogram(binwidth = 1, position = "identity", alpha = .7) + labs( x = "Number of occurrences", y = "Frequency", title = "5000 samples each from \nPoisson(3), Poisson(10), Poisson(20)", fill = "Lambda" )
An insurance agency determines that 70% of individuals do not exceed their deductible.
An insurance agency determines that 70% of individuals do not exceed their deductible.
Suppose the insurance agency is considering a random sample of four individuals they insure. What is the probability that exactly one of them will exceed the deductible?
What is the probability that 3 of 8 randomly selected individuals will have exceeded the insurance deductible, i.e., that 5 of 8 will not exceed the deductible?
A very skilled court stenographer makes one typographical error (typo) per hour on average.
What probability distribution is most appropriate for calculating the probability of a given number of typos this stenographer makes in an hour?
What are the mean and the standard deviation of the number of typos this stenographer makes?
Would it be considered unusual if this stenographer made 4 or more typos in a given hour?
Calculate the probability that this stenographer makes at most 2 typos in a given hour.
Common probability distributions: Poisson
Theoretical properties: probability density function, parameters, mean and variance, effect of varying parameters
R functions, e.g.:
dpois()
for densities ppois()
for P(X≤x)rpois()
for random sampleMidterm next Friday 11/15
These formulas will be provided:
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |