Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Poisson Distribution



STA35A: Statistical Data Science 1

Xiao Hui Tai

November 8, 2024

1 / 20

Announcements : Midterm 2

  • Midterm next Friday 11/15

    • Will cover material from after Midterm 1 until today's lecture
    • Monday: holiday
    • Wednesday: review (OR)
    • Thursday: no lab, XHT OH 12-1pm on Zoom
    • Same rules apply: closed-book, no computers or calculators, no make-up exams
  • These formulas will be provided:

    • Bayes' theorem: P(AB)=P(BA)P(A)P(B).
    • Probability mass functions:
      • Binomial: P(X=x)=(nx)px(1p)nx
      • Poisson: P(X=x)=λxeλx!, λ>0
2 / 20

Today

  • Common probability distributions

    • Poisson distribution
3 / 20

Random samples from binomial distribution

  • Use rbinom() to get 5000 draws from the population

  • In R:

    set.seed(0) # so results are reproducible
    binomDraws <- rbinom(n = 5000, size = 3, prob = .2)
    table(binomDraws)/5000
    ## binomDraws
    ## 0 1 2 3
    ## 0.5246 0.3638 0.1040 0.0076
4 / 20

Frequency distribution for Binomial(3, .2)

data.frame(binomDraws) %>%
ggplot(aes(x = binomDraws)) +
geom_bar() +
labs(x = "Number of Smokers",
title = "5000 samples from Binomial(3, .2)")

5 / 20

Varying the number of Bernoulli trials: 100 trials

set.seed(0) # so results are reproducible
binomDraws100 <- rbinom(n = 5000, size = 100, prob = .2)
data.frame(binomDraws100) %>%
ggplot(aes(x = binomDraws100)) +
geom_bar() +
labs(x = "Number of Smokers",
title = "5000 samples from Binomial(100, .2)")

6 / 20

Varying the number of Bernoulli trials: 500 trials

set.seed(0) # so results are reproducible
binomDraws500 <- rbinom(n = 5000, size = 500, prob = .2)
data.frame(binomDraws500) %>%
ggplot(aes(x = binomDraws500)) +
geom_bar() +
labs(x = "Number of Smokers",
title = "5000 samples from Binomial(500, .2)")

7 / 20

Frequency distribution varying number of Bernoulli trials

data.frame(binomDraws) %>%
bind_cols(size = 3) %>%
bind_rows(
data.frame(binomDraws100) %>%
rename(binomDraws = binomDraws100) %>%
bind_cols(size = 100)
) %>%
bind_rows(
data.frame(binomDraws500) %>%
rename(binomDraws = binomDraws500) %>%
bind_cols(size = 500)
) %>%
ggplot(aes(x = binomDraws,
fill = as.factor(size))) +
geom_histogram(binwidth = 1, position = "identity", alpha = .7) +
labs(
x = "Number of smokers",
y = "Frequency",
title = "5000 samples each from Binomial(3, .2), Binomial(100, .2), Binomial(500, .2)",
fill = "Size"
)

8 / 20

Frequency distribution varying probability of success

set.seed(0) # so results are reproducible
binomP.2 <- rbinom(n = 5000, size = 100, prob = .2)
binomP.5 <- rbinom(n = 5000, size = 100, prob = .5)
binomP.7 <- rbinom(n = 5000, size = 100, prob = .7)
9 / 20

Poisson distribution

  • Useful for estimating the number of events in a large population over a unit of time.

  • For example:

    • The number of people having heart attacks in New York City every year
    • The number of accidents occurring at an intersection per hour
    • The number of typos in every 100 pages of a book
  • It is named after French mathematician Siméon Denis Poisson

10 / 20

Poisson distribution

  • E.g.: Number of people having heart attacks in New York City every year

  • Key ingredients

    • Fixed interval of time or space

    • Events happen with a known average rate, independently of time since the last event ("memoryless" property)

      • One person having a heart attack does not change the probability of another person having a heart attack, hence the timing of the next heart attack
  • The parameter that defines a Poisson distributed random variable is the rate λ, where λ>0

    • Rate = average number of occurrences per unit of time
  • Often used to model rare events

11 / 20

Probability mass function, mean and variance

  • P(X=x)=λxeλx!, defined over non-negative integer values of x

    • Recall: n!=n(n1)(n2)(1).
  • No upper limit, i.e., x can take very large non-negative integer values

  • E(X)=λ

  • Var(X)=λ

12 / 20

Poisson probabilities

  • P(X=x)=λxeλx! lets us calculate probabilities of taking a certain value

  • For x=2 and λ=3, we have

P(X=2)=32e32!=9(e3)2(1)=0.2240418

  • In R:
dpois(x = 2, lambda = 3)
## [1] 0.2240418
  • For large values of x, the probability is very small because of the large denominator
dpois(x = 10, lambda = 3)
## [1] 0.0008101512
13 / 20

Probability distribution

  • In the same manner, we can derive the entire probability distribution
dpois(x = 0:10, lambda = 3)
## [1] 0.0497870684 0.1493612051 0.2240418077 0.2240418077
## [5] 0.1680313557 0.1008188134 0.0504094067 0.0216040315
## [9] 0.0081015118 0.0027005039 0.0008101512
data.frame(x = 0:10, y = dpois(0:10, lambda = 3)) %>%
ggplot(aes(x = x, y = y)) +
geom_bar(stat = "identity") +
labs(title = "Probability distribution of Poisson(3)",
y = "P(X = x)")

14 / 20

Probability distribution varying lambda

data.frame(x = 0:30, y = dpois(0:30, lambda = 3), lambda = 3) %>%
bind_rows(data.frame(x = 0:30, y = dpois(0:30, lambda = 10), lambda = 10)) %>%
bind_rows(data.frame(x = 0:30, y = dpois(0:30, lambda = 20), lambda = 20)) %>%
ggplot(aes(x = x, y = y, fill = as.factor(lambda))) +
geom_bar(stat = "identity",
position = "identity",
alpha = .5) +
labs(title = "Probability distribution of \nPoisson(3), Poisson(10), Poisson(20)",
y = "P(X = x)",
fill = "Lambda")

15 / 20

Sampling from Poisson distribution in R

  • Simulate random draws using the rpois() function

  • rpois() has the arguments

    • n, the number of draws from the distribution
    • lambda, the mean
set.seed(0) # so results are reproducible
inputLambda <- 3
poissonDraws <- rpois(n = 100, lambda = inputLambda)
poissonDraws
## [1] 5 2 2 3 5 2 5 6 4 3 1 2 1 4 2 4 3 4 8 2 4 6 2 4 1 2 2 0 2 5
## [31] 2 3 3 3 1 5 4 4 1 4 2 5 3 4 3 3 4 0 3 4 4 3 5 3 2 1 1 2 3 4
## [61] 2 5 2 3 2 4 2 3 4 1 5 2 5 2 2 3 5 5 2 4 6 3 4 2 2 4 2 4 1 2
## [91] 1 2 1 3 5 4 4 3 2 4
16 / 20

Frequency distribution varying lambda

set.seed(0) # so results are reproducible
poissonL3 <- rpois(n = 5000, lambda = 3)
poissonL10 <- rpois(n = 5000, lambda = 10)
poissonL20 <- rpois(n = 5000, lambda = 20)
data.frame(poissonL3) %>%
rename(outcome = poissonL3) %>%
bind_cols(lambda = 3) %>%
bind_rows(
data.frame(poissonL10) %>%
rename(outcome = poissonL10) %>%
bind_cols(lambda = 10)
) %>%
bind_rows(
data.frame(poissonL20) %>%
rename(outcome = poissonL20) %>%
bind_cols(lambda = 20)
) %>%
ggplot(aes(x = outcome,
fill = as.factor(lambda))) +
geom_histogram(binwidth = 1, position = "identity", alpha = .7) +
labs(
x = "Number of occurrences",
y = "Frequency",
title = "5000 samples each from \nPoisson(3), Poisson(10), Poisson(20)",
fill = "Lambda"
)

17 / 20

Exercises

An insurance agency determines that 70% of individuals do not exceed their deductible.

  • Suppose the insurance agency is considering a random sample of four individuals they insure. What is the probability that exactly one of them will exceed the deductible?
18 / 20

Exercises

An insurance agency determines that 70% of individuals do not exceed their deductible.

  • Suppose the insurance agency is considering a random sample of four individuals they insure. What is the probability that exactly one of them will exceed the deductible?

  • What is the probability that 3 of 8 randomly selected individuals will have exceeded the insurance deductible, i.e., that 5 of 8 will not exceed the deductible?

18 / 20

Exercises

A very skilled court stenographer makes one typographical error (typo) per hour on average.

  • What probability distribution is most appropriate for calculating the probability of a given number of typos this stenographer makes in an hour?

  • What are the mean and the standard deviation of the number of typos this stenographer makes?

  • Would it be considered unusual if this stenographer made 4 or more typos in a given hour?

  • Calculate the probability that this stenographer makes at most 2 typos in a given hour.

19 / 20

Summary

  • Common probability distributions: Poisson

    • Theoretical properties: probability density function, parameters, mean and variance, effect of varying parameters

    • R functions, e.g.:

      • dpois() for densities
      • ppois() for P(Xx)
      • rpois() for random sample
20 / 20

Announcements : Midterm 2

  • Midterm next Friday 11/15

    • Will cover material from after Midterm 1 until today's lecture
    • Monday: holiday
    • Wednesday: review (OR)
    • Thursday: no lab, XHT OH 12-1pm on Zoom
    • Same rules apply: closed-book, no computers or calculators, no make-up exams
  • These formulas will be provided:

    • Bayes' theorem: P(AB)=P(BA)P(A)P(B).
    • Probability mass functions:
      • Binomial: P(X=x)=(nx)px(1p)nx
      • Poisson: P(X=x)=λxeλx!, λ>0
2 / 20
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow