Hypothesis tests for population mean and proportion
p-value approach
Critical value approach
Note that all computations are done assuming that H0 is true, i.e., to be precise, the decision rule is reject H0 if the p-value P(|Z|≥|z|∣H0)=P(Z≥|z| or Z≤−|z|∣H0)<α
The blue distribution is the distribution under the null hypothesis
P(|Z|≥|z|)=P(Z≥|z|)+P(Z≤−|z|) (shaded area)
The value of the test statistic is z (value on horizontal axis)
p-value = P(|Z|≥|z|) under H0 (for a two-sided test; more details coming). It is the probability of getting a result as extreme as what we got, if H0 were true.
Recall the decision rule: reject H0 if P(|Z|≥|z|)=P(Z≥|z| or Z≤−|z|)<α. Alternatively, the p-value can be interpreted as the smallest significance level that we would reject H0.
Probability of getting data like ours or more extreme data if H0 were true
Common misinterpretation: "p-value is the probability that H0 is true". The p-value is calculated assuming that H0 is true. It cannot be used to tell us how likely it is that assumption is correct.
Decision rule: reject H0 if p-value <α
Setup: Xi∼N(μ,42)
H0: μ=μ0=20
HA: μ≠20
Under H0, Z=¯X−μ0σ/√n∼N(0,1)
Value of test statistic: z=¯x−μ0σ/√n
Decision rule: reject H0 if P(|Z|≥|z|)=P(Z≥|z| or Z≤−|z|)<α
set.seed(0)myDraws <- t(replicate(10000, rnorm(1000, 20, 4)))sampleMeans <- rowMeans(myDraws)testStat <- (sampleMeans - 20)/(4/sqrt(1000))pValues <- 2*pnorm(abs(testStat), lower.tail = FALSE)
Decision rule: reject if p-value <α
What proportion out of the 10,000 experiments should we expect to reject H0?
Decision rule: reject if p-value <α
What proportion out of the 10,000 experiments should we expect to reject H0?
Given by significance level α
When α=.05, reject 5% of the time to produce the required property that P(reject H0 | H0 true) =α
sum(pValues < .05)
## [1] 486
mean(pValues < .05)
## [1] 0.0486
ggplot(data.frame(pValues), aes(x = pValues)) + geom_histogram(binwidth = .05, boundary = 0) + labs(title = "10,000 p-values", x = "p-value", y = "Count")
The rejection region is |z|>zα/2 or |z|>1.96 when α=.05. This is a portion of the x-axis.
The boundaries of the rejection region are called critical values.
Significance level is the probability over the rejection region, the red area: P(|Z|>zα2)=α
Say Xi has mean μ and standard deviation σ. The test statistic we will use is Z=¯X−μσ/√n. By CLT, Z≈N(0,1) when n large.
H0: μ=μ0
HA: μ≠μ0
Under H0, Z=¯X−μ0σ/√n≈N(0,1)
Value of test statistic: z=¯x−μ0σ/√n
Decision rule:
Say Xi has mean μ and standard deviation σ. The test statistic we will use is Z=¯X−μS/√n. Z≈N(0,1) when n large. (Here notice that σ has been replaced by S)
H0: μ=μ0
HA: μ≠μ0
Under H0, Z=¯X−μ0S/√n≈N(0,1) (Here notice that σ has been replaced by S)
Value of test statistic: z=¯x−μ0s/√n (replace σ by s)
Decision rule:
Say Xi∼ Bernoulli(p). The test statistic we will use is Z=ˆP−p√ˆP(1−ˆP)/√n. Z≈N(0,1) when n large.
H0: p=p0
HA: p≠p0
Under H0, Z=ˆP−p0√ˆP(1−ˆP)/√n≈N(0,1)
Value of test statistic: z=ˆp−p0√ˆp(1−ˆp)/√n
Decision rule:
Say Xi∼ Bernoulli(p). The test statistic we will use is Z=ˆP−p√p(1−p)/√n. By CLT, Z≈N(0,1) when n large.
H0: p=p0
HA: p≠p0
Under H0, Z=ˆP−p0√p0(1−p0)/√n≈N(0,1) (Here notice that p is replaced by p0)
Value of test statistic: z=ˆp−p0√p0(1−p0)/√n
Decision rule:
Assume that the heights of redwood trees in California follow a distribution with standard deviation 25 feet. Let the random variable Xi denote the height of the ith redwood tree.
We guess that the unknown population mean is 230, and would like to test this hypothesis against the alternative that μ≠230. We collect data on the heights of 300 randomly sampled redwood trees. Assume the samples are indepedent. We get a sample mean of 220. Construct a hypothesis test at a 5% significance level.
H0: μ=230
HA: μ≠230
Test statistic: Z=¯X−μσ/√n. By CLT, Z≈N(0,1) when n large.
Under H0, Z=¯X−23025/√300≈N(0,1)
Value of test statistic: z=220−23025/√300=−6.928203
The rejection region is |z|>1.96 when α=.05
|z|=6.928203>1.96. The test statistic is in the rejection region, so we reject H0 that μ=230. There is sufficient evidence at a 5% level to reject the null hypothesis that the mean height of a Californian redwood tree is 230 feet.
(Same set up as last slide)
Value of test statistic: z=220−23025/√300=−6.928203
The p-value is P(|Z|≥|z|), in this case P(|Z|≥6.928203)=P(Z≥6.928203 or Z≤−6.928203)
2*pnorm(-6.928203)
## [1] 4.262199e-12
The p-value is less than .05, so we reject H0 that μ=230. There is sufficient evidence at a 5% level to reject the null hypothesis that the mean height of a Californian redwood tree is 230 feet.
Assume that the heights of redwood trees in California follow a distribution with unknown mean and standard deviation. Let the random variable Xi denote the height of the ith redwood tree.
We guess that the unknown population mean is 230, and would like to test this hypothesis against the alternative that μ≠230. We collect data on the heights of 300 randomly sampled redwood trees. Assume the samples are indepedent. We get a sample mean of 220 and sample standard deviation of 24. Construct a hypothesis test at a 5% significance level.
We are interested in the population proportion of likely voters that approve of President Biden. We guess that this is .4 and would like to test this hypothesis against the alternative that it is different from .4. We conduct a random sample of 1500 likely voters, and the proportion among them that approve of President Biden is .3. Construct a hypothesis test at a 5% significance level to determine if our hypothesis is plausible.
Let Xi be a binary random variable denoting whether or not the ith sampled voter approves of President Biden. Now, Xi∼ Bernoulli(p), and by CLT, Z=ˆP−p√p(1−p)/√n≈N(0,1) when n large.
H0: p=.4
HA: p≠.4
Under H0, Z=ˆP−.4√.4(1−.4)/√1500≈N(0,1)
Value of test statistic: z=.3−.4√.4(1−.4)/√1500=−7.91
The rejection region is |z|>1.96 when α=.05
|z|=7.91>1.96. The test statistic is in the rejection region, so we reject H0 that p=.4. There is sufficient evidence at a 5% level to reject the null hypothesis that the population proportion of likely voters that approve of President Biden is .4.
Recall: The boundaries of the rejection region are called critical values
qnorm(.95) # alpha = .1 (5% in each tail)
## [1] 1.644854
qnorm(.975) # alpha = .05 (2.5% in each tail)
## [1] 1.959964
qnorm(.995) # alpha = .01 (.5% in each tail)
## [1] 2.575829
Decision rule: Reject H0 if |z|>zα/2
For two-sided z-tests:
α | Critical value approach | p-value approach |
---|---|---|
.01 | |z|>zα/2≈2.58 | P(|Z|≥|z|∣H0)<.01 |
.05 | |z|>1.96 | P(|Z|≥|z|∣H0)<.05 |
.1 | |z|>1.64 | P(|Z|≥|z|∣H0)<.1 |
Hypothesis tests for population mean and proportion
p-value approach: reject H0 if P(|Z|≥|z|∣H0)=P(Z≥|z| or Z≤−|z|∣H0)<α
Critical value approach: reject if |z|>zα/2
Test statistics (all approximately standard normal):
Hypothesis tests for population mean and proportion
p-value approach
Critical value approach
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |