Introduction to hypothesis testing
Framework
Errors from hypothesis tests
Hypothesis test for population mean
Example: Honor Court
Suppose a case of suspected cheating is brought to a university Honor Court. There are two opposing claims.
Professor: The student did cheat on the exam.
Honor Court assumes students are innocent until proven guilty
The professor must provide evidence to support their claim
Evidence: there were two different versions of the exam; the student on three separate problems used numbers from the other version of the exam
Example: Honor Court
Suppose a case of suspected cheating is brought to a university Honor Court. There are two opposing claims.
Professor: The student did cheat on the exam.
Honor Court assumes students are innocent until proven guilty
The professor must provide evidence to support their claim
Evidence: there were two different versions of the exam; the student on three separate problems used numbers from the other version of the exam
The honor court members agree that this would be extremely unlikely if it were true that the student did not cheat. The professor's evidence is very strong; there is sufficient evidence to reject the student's claim that they did not cheat on the exam.
Steps in hypothesis testing:
Start with two opposing claims about the population (claim 1 and claim 2)
Choose a sampling strategy and collect data
Figure out how likely it is to see data like what we got, or more extreme results, if claim 1 is true
If our data would have been unlikely if claim 1 were true, then we reject claim 1. Otherwise, do not reject claim 1.
Note: we never "accept" claim 1. We can never "accept" claim 2 either. The test only tells us if we have sufficient evidence to reject claim 1. The outcomes are (1) reject claim 1, (2) fail to reject claim 1.
Claim 1: null hypothesis H0
Claim 2: alternative hypothesis, H1 or HA
In this example:
H0: Student did not cheat
HA: Student cheated
Gather data
Assess how likely we are to observe data, or more extreme results, if H0 were true (p-value)
In Honor Court example, if the student did not cheat, it is very unlikely that the student would have numbers from the other version of the exam in three separate problems
A certain ultra-low dose oral contraceptive pill is supposed to contain 0.02 mg of estrogen
If the dose is higher, the user may risk side effects, and if the dose is lower, the user may get pregnant
Manufacturer wishes to check whether the mean concentration in a large shipment is the needed 0.02 mg or not
A random sample of n=500 pills is tested, and the sample mean concentration is 0.017 mg with a sample standard deviation of 0.008 mg
A certain ultra-low dose oral contraceptive pill is supposed to contain 0.02 mg of estrogen
If the dose is higher, the user may risk side effects, and if the dose is lower, the user may get pregnant
Manufacturer wishes to check whether the mean concentration in a large shipment is the needed 0.02 mg or not
A random sample of n=500 pills is tested, and the sample mean concentration is 0.017 mg with a sample standard deviation of 0.008 mg
Is this sufficient evidence that the mean concentraion is not 0.02 mg? What about if our sample mean was 0.019 mg?
State the claims
Strategy: Sample 500 pills at random and use a hypothesis test to evaluate whether they are consistent with a population with mean 0.02 mg estrogen
Data: 500 pills have a sample mean ˉx=0.017 and sample standard deviation s=0.008
Assess how likely we are to observe ˉx=0.017, or more extreme results, if H0 were true
Conclusion: A probability of .01 is pretty unlikely. Reject claim 1.
"There is sufficient evidence to reject the null hypothesis that μ=.02, that the population mean amount of estrogen is 0.02mg."
Suppose the probability of getting a result like ours, or more extreme, was relatively large, say 0.20
We would not say that evidence leads us to accept Claim 1.
Same as in the US judicial system
Defendants are "innocent until proven guilty"
Find someone "guilty" or "not guilty"
Do not say someone is "innocent"; we say there is insufficient evidence to say they are guilty
Hypothesis testing does not tell us the probability that Claim 1 is true.
Assumed claim 1 was true before we did our calculation
Calculated a probability about data like ours or more extreme than ours under that assumption
What are H0 and HA in each case?
Researchers would like to know whether a new intervention for informing children in developing countries of their HIV status is associated with different mental health quality of life.
Researchers would like to know if lead levels in the water from Flint exceed the EPA action level of 15 ppb.
The World Health Organization would like to know if the prevalence of the omicron variant this month is the same as last month.
Step 2 is to make a plan for data collection and analysis, get random sample, and summarize the data.
Need to define a test statistic ( T), which is a random variable that is computed from the data, e.g., a sample mean ( ¯X)
Need to know the distribution of the test statistic ( T) under the null hypothesis
Calculate the probability of "getting data like ours, or more extreme than ours," if H0 is actually true
From Step 2, we have the distribution of T
Step 3: compute the value of the test statistic ( t) based on the data collected, and calculate the probability of getting a test statistic that is equally or more extreme than the one that we got, based on the distribution of the test statistic ( T)
This is a conditional probability (conditional on H0 being true), called a p-value.
p-value: probability of getting a specific test statistic ( t) based on the data, or one more extreme, if H0 were true
Recall that the two possible outcomes are (1) Reject claim 1, and (2) fail to reject claim 1
Reject Claim 1 when the probability of seeing our data (or more extreme data) when Claim 1 is true is small
What qualifies as "small" depends on the significance level of the test
We defined the significance level, α, when discussing confidence intervals:
In a hypothesis test:
When the null hypothesis is true, if we repeat the experiment a large number of times, we would expect to make the wrong decision only α (e.g., 5%) of the time
Decision rule: reject H0 if p-value <α
p-value <α = "statistically significant"
p-value ≥α: insufficient evidence to reject H0
A cat is on trial. Did it commit the crime? Evidence is presented as part of the trial by jury.
Truly Innocent | Truly Guilty | |
---|---|---|
Jury: Not Guilty | ✓ | × |
Jury: Guilty | × | ✓ |
Cat is innocent unless proven guilty.
Right decisions? Mistakes?
Truly Innocent | Truly Guilty | |
---|---|---|
Jury: Not Guilty | ✓ | × |
Jury: Guilty | × | ✓ |
In a hypothesis testing framework: H0: Cat is innocent vs. HA: Cat is guilty
H0 true | HA true | |
---|---|---|
Decision: Do Not Reject H0 | ✓ | × |
Decision: Reject H0 | × | ✓ |
Suppose we wish to test that the population mean equals some value, say μ0.
Test of H0:μ=μ0
Truly μ=μ0 | Truly μ≠μ0 | |
---|---|---|
Decision: Do Not Reject H0 | ✓ | × |
Decision: Reject H0 | × | ✓ |
Type I error: rejecting H0 when it is really true
α is the maximum allowable Type I error rate
We specify α at the design stage of the study and use it in making decisions with hypothesis tests.
Steps in hypothesis testing:
Start with two claims about the population, H0 and HA
Choose a sampling strategy, collect data, and summarize data, i.e., define test statistic and compute statistic from the data
Figure out how likely it is to see data like what we got, or more extreme results, if H0 is true, i.e., compute p-value
Draw conclusions, i.e., if our data would have been unlikely if H0 were true, then reject H0. Otherwise, do not reject H0.
Say Xi has mean μ and variance 4.
H0: μ=20
HA: μ≠20
Test statistic: By CLT, Z=¯X−μσ/√n≈N(0,1) when n large. Collect a sample with n=100. Under H0, Z=¯X−202/√100≈N(0,1). From the sample, we get ˉx=21
To get the value of the test statistic based on our data ( z), simply substitute ˉx=21 to get z=21−202/√100=5
Probability under H0 of getting data like what we got, or more extreme, is P(|Z|≥|z|)=P(Z≥5 or Z≤−5).
2*pnorm(-5)
## [1] 5.733031e-07
2*pnorm(-5)
is very small (on the order of 10−7).
Using a significance level of α=.05, P(|Z|≥5)<α, so reject H0. At a 5% level, there is sufficient evidence to reject the null hypothesis that μ=20.
Say Xi has mean μ and standard deviation σ. The test statistic we will use is Z=¯X−μσ/√n. By CLT, Z≈N(0,1) when n large.
H0: μ=μ0
HA: μ≠μ0
Under H0, Z=¯X−μ0σ/√n≈N(0,1)
Value of test statistic: z=¯x−μ0σ/√n
Decision rule: reject H0 if P(|Z|≥|z|)=P(Z≥|z| or Z≤−|z|)<α
Hypothesis testing framework
Null and alternative hypotheses
Test statistics
p-values
Significance level
Errors from hypothesis tests
Hypothesis test for population mean
Introduction to hypothesis testing
Framework
Errors from hypothesis tests
Hypothesis test for population mean
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |