We almost never have access to the entire population -- we work with a sample. A sample mean x gives us our best guess at the true population mean μ, but how precise is that guess?
A confidence interval answers this by giving a range of plausible values rather than a single point:
Interpreting a 95% CI: If we repeated our study many times and built a 95% CI each time, 95% of those intervals would contain the true μ. The specific interval we built either does or doesn't contain μ -- we can't know which, but we're 95% confident.
When sigma is unknown -- which is always in practice -- we estimate it with the sample standard deviation s. This introduces additional uncertainty that the normal distribution doesn't account for. The t-distribution has heavier tails to compensate.
The t-distribution has a parameter called degrees of freedom (df = n - 1). As n gets larger, the t-distribution approaches the normal distribution.
The degrees of freedom determine how heavy the tails are:
Practical example:
For df = 5 (small sample, n = 6), the critical value is further from 0 than for df = 30 (n = 31). This wider interval reflects our uncertainty when working with small samples.
Rule of thumb: For large n (> 30), the difference between t and z is negligible. For small n, the heavier tails matter -- they make intervals wider, reflecting genuine uncertainty.
Reading t.test output: The 95% CI is (67.35, 69.45). Our best estimate is 68.4 inches, and we're 95% confident the true mean height is between 67.35 and 69.45 inches.
A hypothesis test formally evaluates a specific claim about a population parameter.
Step 1: State hypotheses
Step 2: Compute the test statistic
Step 3: Find the p-value -- the probability of seeing a result as extreme as ours, assuming H0 is true.
Step 4: Decision -- if p-value < alpha (usually 0.05), reject H0.
Since p-value (0.004) < alpha (0.05), we reject H₀: μ = 70. The data provide significant evidence that the true mean height is not 70 inches.
A two-sided test checks if a parameter differs in either direction from the null value:
A one-sided test checks if a parameter is specifically larger or smaller:
Use two-sided when:
Use one-sided when:
Key point: A one-sided p-value is half the corresponding two-sided p-value (when the test statistic points in the expected direction).
No test is perfect -- we can make two kinds of mistakes:
| H0 is True | H0 is False | |
|---|---|---|
| Reject H0 | Type I Error (rate = alpha) | Correct! (Power) |
| Don't Reject | Correct! | Type II Error (rate = beta) |
The tradeoff: Making alpha smaller (e.g., 0.01 instead of 0.05) reduces Type I errors but increases Type II errors. In high-stakes settings (like medical testing), you choose alpha carefully based on the cost of each error type.
Power = 1 - beta = the probability of correctly detecting a real effect. Power increases with larger samples, bigger true effects, and higher alpha.
To build a CI, find the critical z-values using qnorm():
For 90% CI: alpha/2 = 0.05
Pattern: For (100 - alpha)% CI, use qnorm(1 - alpha/2) for the upper critical value.
Scenario: Lady Bristol claims she can taste whether tea or milk was added first. She tastes 8 cups and guesses correctly on 6. Can we conclude she has ability, or is she just guessing?
P-value: P(X >= 6) assuming H0 is true:
Interpretation: Even by random chance, she has a 14.5% probability of guessing >= 6 correct. This is not unusual, so we do NOT reject H0. The data don't provide strong evidence she has tasting ability.
The p-value is the probability of observing a result as extreme as (or more extreme than) what we got, assuming the null hypothesis is true.
| Situation | Test Statistic | R Function |
|---|---|---|
| sigma known | z = (x - μ₀)/(sigma/sqrt(n)) | pnorm() |
| sigma unknown, any n | t = (x - μ₀)/(s/sqrt(n)), df=n-1 | t.test() |
In STAT 240, sigma is almost never known -- we use t-tests.
Compute the t* value for confidence intervals:
For a 95% CI with df=n-1, use qt(0.975, df) (the upper 2.5% tail).
The width of a confidence interval is:
width = 2 x t x (s/sqrt(n))**
To get a narrower CI:
Key exam concept: A 99% CI is always WIDER than a 95% CI for the same data. Higher confidence = wider interval.
A result can be statistically significant (small p-value, reject H0) but still lack practical significance (the effect is too small to matter in real life).
Example: Suppose you test if a new teaching method increases exam scores, and find:
But a 0.2-point difference is meaningless in practice. The effect size is negligible even though it's statistically significant.
Effect size measures the magnitude of a real difference. Common effect size measures:
- d = 0.2: small effect
- d = 0.5: medium effect
- d = 0.8: large effect
When reporting results, always include both:
1. p-value (is there a real effect?)
2. Effect size (how big is it?)
Important: A large sample can detect tiny effects, making them statistically significant even when practically irrelevant. A small effect size is a red flag that practical importance may be limited.
A 95% confidence interval and a two-sided test at alpha=0.05 always agree:
Correct: "If H0 were true, there is a p% chance of observing a result as extreme as ours (or more so)."
WRONG (common mistakes):