MODULE 1110 QUESTIONS

Confidence Intervals & t-Tests

ADAPTIVE FLASHCARDS
Flashcard Study Mode
Study this module with spaced repetition. Wrong answers come back weighted heavier.

Confidence Intervals & Hypothesis Testing

The Problem of Inference

We almost never have access to the entire population -- we work with a sample. A sample mean x gives us our best guess at the true population mean μ, but how precise is that guess?

A confidence interval answers this by giving a range of plausible values rather than a single point:

x ± t* × (s / sqrt(n))
  • x -- sample mean (our point estimate)
  • t* -- critical value from the t-distribution (depends on confidence level and sample size)
  • s/sqrt(n) -- standard error (measures how much x varies across samples)

Interpreting a 95% CI: If we repeated our study many times and built a 95% CI each time, 95% of those intervals would contain the true μ. The specific interval we built either does or doesn't contain μ -- we can't know which, but we're 95% confident.

Why the t-Distribution (Not Normal)?

When sigma is unknown -- which is always in practice -- we estimate it with the sample standard deviation s. This introduces additional uncertainty that the normal distribution doesn't account for. The t-distribution has heavier tails to compensate.

The t-distribution has a parameter called degrees of freedom (df = n - 1). As n gets larger, the t-distribution approaches the normal distribution.

How Degrees of Freedom Change the Shape

The degrees of freedom determine how heavy the tails are:

  • df = 1: Very heavy tails (much heavier than normal) -- critical values are far from 0
  • df = 10: Getting closer to normal -- tails still noticeably heavier
  • df = infinity: Exactly equals the standard normal distribution

Practical example:

R
1qt(0.975, df = 5) # 2.571 (wider interval needed)
2qt(0.975, df = 30) # 2.042 (closer to normal)
3qnorm(0.975) # 1.960 (standard normal for reference)

For df = 5 (small sample, n = 6), the critical value is further from 0 than for df = 30 (n = 31). This wider interval reflects our uncertainty when working with small samples.

Rule of thumb: For large n (> 30), the difference between t and z is negligible. For small n, the heavier tails matter -- they make intervals wider, reflecting genuine uncertainty.

Confidence Intervals in R

R
1# Height data for 25 students
2heights <- c(65, 67, 70, 68, 72, 64, 69, 71, 66, 68,
3 70, 73, 65, 67, 69, 68, 72, 71, 66, 64,
4 68, 70, 67, 69, 71)
5
6t.test(heights)

Reading t.test output: The 95% CI is (67.35, 69.45). Our best estimate is 68.4 inches, and we're 95% confident the true mean height is between 67.35 and 69.45 inches.

Hypothesis Testing

A hypothesis test formally evaluates a specific claim about a population parameter.

Step 1: State hypotheses

  • H0 (null): μ = μ₀ -- the "no effect" / status quo claim
  • Ha (alternative): μ != μ₀, or μ > μ₀, or μ < μ₀

Step 2: Compute the test statistic

t = (x - μ₀) / (s / sqrt(n))

Step 3: Find the p-value -- the probability of seeing a result as extreme as ours, assuming H0 is true.

Step 4: Decision -- if p-value < alpha (usually 0.05), reject H0.

R
1# Test if mean height equals 70 inches
2t.test(heights, mu = 70)

Since p-value (0.004) < alpha (0.05), we reject H₀: μ = 70. The data provide significant evidence that the true mean height is not 70 inches.

One-Sided vs Two-Sided Tests

A two-sided test checks if a parameter differs in either direction from the null value:

  • H₀: μ = μ₀
  • Hₐ: μ != μ₀
  • Default in most software

A one-sided test checks if a parameter is specifically larger or smaller:

  • Right-tailed: Hₐ: μ > μ₀ (looking for evidence the mean is greater)
  • Left-tailed: Hₐ: μ < μ₀ (looking for evidence the mean is less)

When to Use One-Sided vs Two-Sided

Use two-sided when:

  • You have no prior directional hypothesis
  • A difference in either direction matters equally
  • This is the safer, more conservative default

Use one-sided when:

  • Theory or context suggests a specific direction
  • You only care about one direction
  • Example: Does a new drug improve (not worsen) a condition?

One-Sided Tests in R

R
1# Two-sided test (default)
2t.test(heights, mu = 70)
3# Ha: mu != 70
4
5# One-sided test: left tail (testing if mu < 70)
6t.test(heights, mu = 70, alternative = "less")
7
8# One-sided test: right tail (testing if mu > 70)
9t.test(heights, mu = 70, alternative = "greater")

Key point: A one-sided p-value is half the corresponding two-sided p-value (when the test statistic points in the expected direction).

Type I and Type II Errors

No test is perfect -- we can make two kinds of mistakes:

H0 is TrueH0 is False
Reject H0Type I Error (rate = alpha)Correct! (Power)
Don't RejectCorrect!Type II Error (rate = beta)

The tradeoff: Making alpha smaller (e.g., 0.01 instead of 0.05) reduces Type I errors but increases Type II errors. In high-stakes settings (like medical testing), you choose alpha carefully based on the cost of each error type.

Power = 1 - beta = the probability of correctly detecting a real effect. Power increases with larger samples, bigger true effects, and higher alpha.

Constructing Confidence Intervals with qnorm()

To build a CI, find the critical z-values using qnorm():

R
1# For 95% CI: alpha = 0.05, so alpha/2 = 0.025
2qnorm(0.025) # lower tail
3qnorm(0.975) # upper tail

For 90% CI: alpha/2 = 0.05

R
1qnorm(0.05)
2qnorm(0.95)

Pattern: For (100 - alpha)% CI, use qnorm(1 - alpha/2) for the upper critical value.

The Lady Tasting Tea: A Hypothesis Test Example

Scenario: Lady Bristol claims she can taste whether tea or milk was added first. She tastes 8 cups and guesses correctly on 6. Can we conclude she has ability, or is she just guessing?

  • H₀: p = 0.5 (guessing randomly)
  • Hₐ: p != 0.5 (has ability)
  • X ~ Binomial(8, 0.5), and she got X = 6

P-value: P(X >= 6) assuming H0 is true:

R
11 - pbinom(5, size = 8, prob = 0.5)

Interpretation: Even by random chance, she has a 14.5% probability of guessing >= 6 correct. This is not unusual, so we do NOT reject H0. The data don't provide strong evidence she has tasting ability.

P-Value Definition

The p-value is the probability of observing a result as extreme as (or more extreme than) what we got, assuming the null hypothesis is true.

  • Small p-value (< alpha) -> Result is surprising under H0 -> Reject H0
  • Large p-value (>= alpha) -> Result is consistent with H0 -> Fail to reject

When to Use z vs t

SituationTest StatisticR Function
sigma knownz = (x - μ₀)/(sigma/sqrt(n))pnorm()
sigma unknown, any nt = (x - μ₀)/(s/sqrt(n)), df=n-1t.test()

In STAT 240, sigma is almost never known -- we use t-tests.

t Critical Values from qt()

Compute the t* value for confidence intervals:

R
1qt(0.975, df = 24) # 95% CI, n=25
2qt(0.995, df = 24) # 99% CI, n=25
3qt(0.95, df = 29) # 90% CI, n=30

For a 95% CI with df=n-1, use qt(0.975, df) (the upper 2.5% tail).

CI Width

The width of a confidence interval is:

width = 2 x t x (s/sqrt(n))**

To get a narrower CI:

  • Increase sample size n (reduces s/sqrt(n))
  • Lower confidence level (smaller t*)
  • Reduce variability s

Key exam concept: A 99% CI is always WIDER than a 95% CI for the same data. Higher confidence = wider interval.

Practical vs Statistical Significance

A result can be statistically significant (small p-value, reject H0) but still lack practical significance (the effect is too small to matter in real life).

Example: Suppose you test if a new teaching method increases exam scores, and find:

  • Old method: mean = 75.0
  • New method: mean = 75.2
  • The difference is statistically significant (p = 0.04)

But a 0.2-point difference is meaningless in practice. The effect size is negligible even though it's statistically significant.

Effect size measures the magnitude of a real difference. Common effect size measures:

  • Cohen's d = (mean1 - mean2) / pooled_SD

- d = 0.2: small effect

- d = 0.5: medium effect

- d = 0.8: large effect

When reporting results, always include both:

1. p-value (is there a real effect?)

2. Effect size (how big is it?)

Important: A large sample can detect tiny effects, making them statistically significant even when practically irrelevant. A small effect size is a red flag that practical importance may be limited.

CI and Hypothesis Test Equivalence

A 95% confidence interval and a two-sided test at alpha=0.05 always agree:

  • If μ₀ is inside the 95% CI -> fail to reject H₀: μ = μ₀ at alpha=0.05
  • If μ₀ is outside the 95% CI -> reject H₀: μ = μ₀ at alpha=0.05
R
1# 95% CI is (67.35, 69.45)
2# H0: mu = 70 -> 70 is outside -> reject H0
3# H0: mu = 68 -> 68 is inside -> fail to reject H0

Correct p-value Interpretation

Correct: "If H0 were true, there is a p% chance of observing a result as extreme as ours (or more so)."

WRONG (common mistakes):

  • "The probability that H0 is true is p" -- NO, p is not the probability the null is true
  • "The result happened by chance with probability p" -- NO, p assumes the null is true, not that chance caused the result