Processing math: 100%
+ - 0:00:00
Notes for current slide
Notes for next slide

Hypothesis Testing



STA35A: Statistical Data Science 1

Xiao Hui Tai

December 2, 2024

1 / 23

Today

  • Introduction to hypothesis testing

    • Framework

    • Errors from hypothesis tests

  • Hypothesis test for population mean

2 / 23

Hypothesis testing

Example: Honor Court

Suppose a case of suspected cheating is brought to a university Honor Court. There are two opposing claims.

  • Student: I did not cheat on the exam.
  • Professor: The student did cheat on the exam.

  • Honor Court assumes students are innocent until proven guilty

  • The professor must provide evidence to support their claim

  • Evidence: there were two different versions of the exam; the student on three separate problems used numbers from the other version of the exam

3 / 23

Hypothesis testing

Example: Honor Court

Suppose a case of suspected cheating is brought to a university Honor Court. There are two opposing claims.

  • Student: I did not cheat on the exam.
  • Professor: The student did cheat on the exam.

  • Honor Court assumes students are innocent until proven guilty

  • The professor must provide evidence to support their claim

  • Evidence: there were two different versions of the exam; the student on three separate problems used numbers from the other version of the exam

The honor court members agree that this would be extremely unlikely if it were true that the student did not cheat. The professor's evidence is very strong; there is sufficient evidence to reject the student's claim that they did not cheat on the exam.

3 / 23

Hypothesis Testing Framework

Steps in hypothesis testing:

  • Start with two opposing claims about the population (claim 1 and claim 2)

  • Choose a sampling strategy and collect data

  • Figure out how likely it is to see data like what we got, or more extreme results, if claim 1 is true

  • If our data would have been unlikely if claim 1 were true, then we reject claim 1. Otherwise, do not reject claim 1.

Note: we never "accept" claim 1. We can never "accept" claim 2 either. The test only tells us if we have sufficient evidence to reject claim 1. The outcomes are (1) reject claim 1, (2) fail to reject claim 1.

4 / 23

Hypothesis Testing Framework

  • Claim 1: null hypothesis H0

  • Claim 2: alternative hypothesis, H1 or HA

  • In this example:
    H0: Student did not cheat
    HA: Student cheated

  • Gather data

  • Assess how likely we are to observe data, or more extreme results, if H0 were true (p-value)

  • In Honor Court example, if the student did not cheat, it is very unlikely that the student would have numbers from the other version of the exam in three separate problems

5 / 23

Example: Ultra Low Dose Contraceptives

  • A certain ultra-low dose oral contraceptive pill is supposed to contain 0.02 mg of estrogen

  • If the dose is higher, the user may risk side effects, and if the dose is lower, the user may get pregnant

  • Manufacturer wishes to check whether the mean concentration in a large shipment is the needed 0.02 mg or not

  • A random sample of n=500 pills is tested, and the sample mean concentration is 0.017 mg with a sample standard deviation of 0.008 mg

6 / 23

Example: Ultra Low Dose Contraceptives

  • A certain ultra-low dose oral contraceptive pill is supposed to contain 0.02 mg of estrogen

  • If the dose is higher, the user may risk side effects, and if the dose is lower, the user may get pregnant

  • Manufacturer wishes to check whether the mean concentration in a large shipment is the needed 0.02 mg or not

  • A random sample of n=500 pills is tested, and the sample mean concentration is 0.017 mg with a sample standard deviation of 0.008 mg

  • Is this sufficient evidence that the mean concentraion is not 0.02 mg? What about if our sample mean was 0.019 mg?

6 / 23

Example: Ultra Low Dose Contraceptives

  • State the claims

    • Claim 1: Shipment is consistent with a population mean of 0.02 mg estrogen. H0:μ=0.02
    • Claim 2: Shipment is not consistent with a population mean of 0.02 mg estrogen. HA:μ0.02
  • Strategy: Sample 500 pills at random and use a hypothesis test to evaluate whether they are consistent with a population with mean 0.02 mg estrogen

  • Data: 500 pills have a sample mean ˉx=0.017 and sample standard deviation s=0.008

  • Assess how likely we are to observe ˉx=0.017, or more extreme results, if H0 were true

    • Say the probability of getting a result like ours (or more extreme) is 0.01 if Claim 1 is true.
7 / 23

Example: Ultra Low Dose Contraceptives

  • Conclusion: A probability of .01 is pretty unlikely. Reject claim 1.

  • "There is sufficient evidence to reject the null hypothesis that μ=.02, that the population mean amount of estrogen is 0.02mg."

    • i.e., the manufacturing procedure may not be consistent with one that produces pills at the required 0.02 mg dose
  • Suppose the probability of getting a result like ours, or more extreme, was relatively large, say 0.20

    • Fail to reject Claim 1
8 / 23

Two comments

  1. We would not say that evidence leads us to accept Claim 1.

    • Same as in the US judicial system

    • Defendants are "innocent until proven guilty"

    • Find someone "guilty" or "not guilty"

    • Do not say someone is "innocent"; we say there is insufficient evidence to say they are guilty

  2. Hypothesis testing does not tell us the probability that Claim 1 is true.

    • Assumed claim 1 was true before we did our calculation

    • Calculated a probability about data like ours or more extreme than ours under that assumption

9 / 23

Step 1: State claim 1 and claim 2

What are H0 and HA in each case?

  1. Researchers would like to know whether a new intervention for informing children in developing countries of their HIV status is associated with different mental health quality of life.

  2. Researchers would like to know if lead levels in the water from Flint exceed the EPA action level of 15 ppb.

  3. The World Health Organization would like to know if the prevalence of the omicron variant this month is the same as last month.

10 / 23

Step 2

Step 2 is to make a plan for data collection and analysis, get random sample, and summarize the data.

  • Need to define a test statistic ( T), which is a random variable that is computed from the data, e.g., a sample mean ( ¯X)

  • Need to know the distribution of the test statistic ( T) under the null hypothesis

    • Type of test depends on this distribution, e.g., if our test statistic can be approximated by a normal distribution, we will use a Z-test
11 / 23

Step 3: Assess results

  • Calculate the probability of "getting data like ours, or more extreme than ours," if H0 is actually true

  • From Step 2, we have the distribution of T

  • Step 3: compute the value of the test statistic ( t) based on the data collected, and calculate the probability of getting a test statistic that is equally or more extreme than the one that we got, based on the distribution of the test statistic ( T)

  • This is a conditional probability (conditional on H0 being true), called a p-value.

  • p-value: probability of getting a specific test statistic ( t) based on the data, or one more extreme, if H0 were true

12 / 23

Step 4: Draw conclusions

  • Recall that the two possible outcomes are (1) Reject claim 1, and (2) fail to reject claim 1

  • Reject Claim 1 when the probability of seeing our data (or more extreme data) when Claim 1 is true is small

  • What qualifies as "small" depends on the significance level of the test

13 / 23

Significance level

  • We defined the significance level, α, when discussing confidence intervals:

    • Confidence level = 100(1α)%, i.e., a 95% confidence interval will need α=.05
    • P(CI contains true parameter) = 1α.
  • In a hypothesis test:

    • Defines the tolerable Type I error: the probability of rejecting H0 when H0 is actually true
    • Statistical property that we need: P(reject H0 | H0 true) =α
  • When the null hypothesis is true, if we repeat the experiment a large number of times, we would expect to make the wrong decision only α (e.g., 5%) of the time

14 / 23

Decision rule

  • Decision rule: reject H0 if p-value <α

    • We will demonstrate (in the next class) that this produces the required property that P(reject H0 | H0 true) =α
  • p-value <α = "statistically significant"

  • p-value α: insufficient evidence to reject H0

15 / 23

Errors from hypothesis tests

A cat is on trial. Did it commit the crime? Evidence is presented as part of the trial by jury.

Truly Innocent Truly Guilty
Jury: Not Guilty ×
Jury: Guilty ×
16 / 23

Errors from hypothesis tests

Cat is innocent unless proven guilty.

Right decisions? Mistakes?

Truly Innocent Truly Guilty
Jury: Not Guilty ×
Jury: Guilty ×

In a hypothesis testing framework: H0: Cat is innocent vs. HA: Cat is guilty

H0 true HA true
Decision: Do Not Reject H0 ×
Decision: Reject H0 ×
17 / 23

Errors from hypothesis tests

Suppose we wish to test that the population mean equals some value, say μ0.

Test of H0:μ=μ0

Truly μ=μ0 Truly μμ0
Decision: Do Not Reject H0 ×
Decision: Reject H0 ×
  • Type I error: rejecting H0 when it is really true

  • α is the maximum allowable Type I error rate

  • We specify α at the design stage of the study and use it in making decisions with hypothesis tests.

18 / 23

Recall: Hypothesis Testing Framework

Steps in hypothesis testing:

  • Start with two claims about the population, H0 and HA

  • Choose a sampling strategy, collect data, and summarize data, i.e., define test statistic and compute statistic from the data

  • Figure out how likely it is to see data like what we got, or more extreme results, if H0 is true, i.e., compute p-value

  • Draw conclusions, i.e., if our data would have been unlikely if H0 were true, then reject H0. Otherwise, do not reject H0.

19 / 23

Hypothesis Testing for the Population Mean

Say Xi has mean μ and variance 4.

  • Step 1: Start with two claims about the population

H0: μ=20
HA: μ20

  • Step 2: Choose a sampling strategy, collect data, and summarize data

Test statistic: By CLT, Z=¯Xμσ/nN(0,1) when n large. Collect a sample with n=100. Under H0, Z=¯X202/100N(0,1). From the sample, we get ˉx=21

  • Step 3: Figure out how likely it is to see data like what we got, or more extreme results, if H0 is true.

To get the value of the test statistic based on our data ( z), simply substitute ˉx=21 to get z=21202/100=5

20 / 23

Hypothesis Testing for the Population Mean

  • Step 3 (continued): Figure out how likely it is to see data like what we got, or more extreme results, if claim 1 is true.

Probability under H0 of getting data like what we got, or more extreme, is P(|Z||z|)=P(Z5 or Z5).

2*pnorm(-5)
## [1] 5.733031e-07

2*pnorm(-5) is very small (on the order of 107).

  • Step 4: If our data would have been unlikely if H0 were true, then reject H0. Otherwise, do not reject H0

Using a significance level of α=.05, P(|Z|5)<α, so reject H0. At a 5% level, there is sufficient evidence to reject the null hypothesis that μ=20.

21 / 23

Hypothesis Testing for the Population Mean

Say Xi has mean μ and standard deviation σ. The test statistic we will use is Z=¯Xμσ/n. By CLT, ZN(0,1) when n large.

H0: μ=μ0
HA: μμ0

Under H0, Z=¯Xμ0σ/nN(0,1)

Value of test statistic: z=¯xμ0σ/n

Decision rule: reject H0 if P(|Z||z|)=P(Z|z| or Z|z|)<α

22 / 23

Summary

  • Hypothesis testing framework

    • Null and alternative hypotheses

    • Test statistics

    • p-values

    • Significance level

  • Errors from hypothesis tests

    • Type I error
  • Hypothesis test for population mean

23 / 23

Today

  • Introduction to hypothesis testing

    • Framework

    • Errors from hypothesis tests

  • Hypothesis test for population mean

2 / 23
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow