When we have a quantitative variable and want to make inferences about the population mean, we use the methods in this module. Whether we're estimating a confidence interval or testing a hypothesis about a population mean, the process relies on understanding the sampling distribution of the sample mean and the t-distribution.
When we repeatedly sample from a population and calculate the sample mean (x) for each sample, those sample means follow a distribution. This sampling distribution has special properties:
If we take all possible samples of size n from a population:
The second and third points are guaranteed by the Central Limit Theorem (CLT). The CLT states:
Key concept: If we take samples of size n from any population (with finite mean and standard deviation), the sampling distribution of x is approximately normal when n is large enough (typically n >= 30). If the population itself is normal, then the sampling distribution of x is normal regardless of sample size.
The standard error (SE) measures how much sample means vary from sample to sample. It depends on two things:
1. The population standard deviation (σ): Larger σ means more variability in the data
2. The sample size (n): Larger samples give less variable sample means
The relationship is: SE = σ / √n
Notice that SE decreases as n increases. This is why larger samples give more precise estimates of the population mean.
In practice, we never know σ (the true population standard deviation), so we estimate it using the sample standard deviation s:
Estimated SE = s / √n
If we knew the true population standard deviation σ, we could use the z-distribution (standard normal). The test statistic would be:
However, in real life, we never know σ. We must estimate it using the sample standard deviation s. This introduces extra uncertainty.
When we substitute s for σ, we no longer follow the standard normal (z) distribution. Instead, we follow the t-distribution:
The t-distribution has the following properties:
The t-distribution is not a single distribution. Instead, it is a family of distributions, each determined by the degrees of freedom (df).
Key concept: For inference about a single population mean, df = n - 1. We lose one degree of freedom because we used the sample mean to estimate σ.
As df increases, the t-distribution has lighter tails and looks more like the standard normal. When df > 30, the t-distribution is very close to the normal distribution.
A confidence interval for the population mean μ is:
Where:
For a 95% confidence interval, t is the value such that 95% of the t-distribution lies between -t and t. This means 2.5% is in each tail. We use qt(0.975, df = n - 1) to find t.
Suppose we have data on adult male fixed Ragdoll cats:
We want a 95% confidence interval for the mean sleep time.
Step 1: Calculate the standard error.
SE = s / √n = 2.87 / √135 = 2.87 / 11.62 = 0.247 hours
Step 2: Find the critical value t*.
With df = 135 - 1 = 134, we look up qt(0.975, df = 134). This gives t* = 1.978.
Step 3: Calculate the margin of error.
Margin of error = t × SE = 1.978 0.247 = 0.489 hours
Step 4: Calculate the confidence interval.
CI = 16.02 ± 0.489 = [15.531, 16.509] hours
Key concept: If we repeated our sampling procedure infinitely many times and calculated a 95% CI each time, approximately 95% of those intervals would contain the true population mean.
This does NOT mean:
It DOES mean:
Calculating by hand:
Using the t.test() function (much simpler):
Different confidence levels use different critical values:
Higher confidence levels result in wider intervals.
A hypothesis test for a population mean has the form:
- Hₐ: μ != μ₀ (two-sided test)
- Hₐ: μ > μ₀ (right-tailed test)
- Hₐ: μ < μ₀ (left-tailed test)
We also set a significance level, usually α = 0.05.
The test statistic for testing a single mean is:
with df = n - 1.
The test statistic measures how many standard errors the sample mean is from the null hypothesis value.
The p-value is the probability of observing a test statistic as extreme as (or more extreme than) the one we calculated, assuming the null hypothesis is true.
For a two-sided test (Hₐ: μ != μ₀):
p-value = 2 * P(t < -|t_obs|)
Using R: 2 * pt(-abs(t_obs), df = n - 1)
For a right-tailed test (Hₐ: μ > μ₀):
p-value = P(t > t_obs)
Using R: 1 - pt(t_obs, df = n - 1)
For a left-tailed test (Hₐ: μ < μ₀):
p-value = P(t < t_obs)
Using R: pt(t_obs, df = n - 1)
Question: Is the mean sleep time of adult male fixed Ragdoll cats different from 17 hours?
H₀: μ = 17
Hₐ: μ != 17
α = 0.05
From our data: x = 16.02, s = 2.87, n = 135, SE = 0.247
Test statistic:
t = (16.02 - 17) / 0.247 = -0.98 / 0.247 = -3.96
P-value (two-sided):
p-value = 2 * P(t < -3.96) with df = 134
p-value = 2 pt(-3.96, df = 134) = 2 0.000065 = 0.00013
Conclusion:
Since p-value (0.00013) < α (0.05), we reject H0. We have strong evidence that the mean sleep time of adult male fixed Ragdoll cats is different from 17 hours.
R code:
Or using the t.test() function:
In t.test(), the μ argument specifies the null hypothesis value.
Question: Is the mean sleep time greater than 15.5 hours?
H₀: μ = 15.5
Hₐ: μ > 15.5
α = 0.05
Test statistic:
t = (16.02 - 15.5) / 0.247 = 0.52 / 0.247 = 2.11
P-value (right-tailed):
p-value = P(t > 2.11) = 1 - pt(2.11, df = 134) = 0.0184
Conclusion:
Since p-value (0.0184) < α (0.05), we reject H0. We have evidence that the mean sleep time is greater than 15.5 hours.
An alternative to the p-value approach is the rejection region (or critical value) approach.
In this approach:
1. Calculate the critical value(s) from the t-distribution
2. Reject H0 if the test statistic falls in the rejection region
For a two-sided test with α = 0.05 and df = 134:
Reject H0 if t < qt(0.025, df = 134) or t > qt(0.975, df = 134)
R code:
Since our test statistic t = -3.96 is less than -1.978, we reject H0.
For a right-tailed test (Hₐ: μ > μ₀) with α = 0.05:
Reject H0 if t > qt(0.95, df = 134) = 1.656
For a left-tailed test (Hₐ: μ < μ₀) with α = 0.05:
Reject H0 if t < qt(0.05, df = 134) = -1.656
There is a direct relationship between a (1 - α) * 100% confidence interval and a hypothesis test with significance level α.
Key concept: For a two-sided hypothesis test with significance level α, we reject H₀: μ = μ₀ if and only if μ₀ lies outside the (1 - α) * 100% confidence interval.
We calculated a 95% CI for mean sleep time: [15.531, 16.509]
For the test H₀: μ = 17 vs Hₐ: μ != 17 with α = 0.05:
Since 17 is outside the 95% CI, we reject H0.
This matches our p-value result.
For the test H₀: μ = 16 vs Hₐ: μ != 16 with α = 0.05:
Since 16 is inside the 95% CI, we fail to reject H0.
This provides an intuitive way to understand hypothesis tests: if the hypothesized mean is outside the confidence interval, it's an implausible value for the true mean.
Before conducting a t-test, we should verify certain conditions:
The variable should be quantitative (numerical), not categorical.
The data should come from a random sample of the population. If it doesn't, our inference may be biased.
One of the following should be true:
The t-test is robust to moderate violations of normality when n is large.
For the cat sleep data:
R code to check normality with a histogram:
If the histogram is roughly symmetric and unimodal, the normality condition is reasonably satisfied.
Incorrect: "There is a 95% probability that the true mean is in the interval [15.531, 16.509]."
Correct: "If we repeated our sampling procedure many times and calculated a 95% CI each time, about 95% of those intervals would contain the true mean."
Once we compute an interval, the true mean either is or is not in it. The probability is either 0 or 1, not 0.95.
Incorrect: "The p-value is the probability that H0 is true."
Correct: "The p-value is the probability of observing data as extreme as (or more extreme than) what we observed, assuming H0 is true."
A small p-value suggests the data is incompatible with H0, but it does not directly tell us the probability that H0 is true.
Incorrect: "We fail to reject H0, so H0 is true."
Correct: "We fail to reject H0, so we don't have sufficient evidence to reject it. This doesn't mean H0 is true; it means the data don't provide strong evidence against it."
A small p-value indicates statistical significance (the effect is unlikely to be due to chance), but the effect might still be small in practical terms.
Example: Suppose we test H₀: μ = 16 hours vs Hₐ: μ != 16 hours, and our sample mean is 16.02 hours with p-value = 0.01.
We reject H0 (statistically significant), but the difference of 0.02 hours (about 1 minute) is negligible in practical terms.
A Type I error occurs when we reject H0 when it is actually true. The probability of a Type I error is α (the significance level).
A Type II error occurs when we fail to reject H0 when it is actually false. The probability of a Type II error is β.
We control α by setting it before the analysis, but β depends on the true mean, the sample size, and the variability of the data. Larger sample sizes reduce β.
To conduct inference about a single population mean:
1. Check the conditions (quantitative data, random sample, normality or n >= 30)
2. Calculate the sample mean, sample standard deviation, and standard error
3. For a confidence interval, use *x ± t × SE**
4. For a hypothesis test, calculate t = (x - μ₀) / SE and find the p-value
5. Interpret results carefully, keeping practical significance in mind
6. Remember that confidence intervals and p-values are tools for inference, not statements about the true parameters