MODULE 1110 QUESTIONS

Confidence Intervals & t-Tests

ADAPTIVE FLASHCARDS

Flashcard Study Mode

Study this module with spaced repetition. Wrong answers come back weighted heavier.

Confidence Intervals & Hypothesis Testing

The Problem of Inference

We almost never have access to the entire population -- we work with a sample. A sample mean x gives us our best guess at the true population mean μ, but how precise is that guess?

A confidence interval answers this by giving a range of plausible values rather than a single point:

x  ±  t*  ×  (s / sqrt(n))

›x -- sample mean (our point estimate)
›t* -- critical value from the t-distribution (depends on confidence level and sample size)
›s/sqrt(n) -- standard error (measures how much x varies across samples)

Interpreting a 95% CI: If we repeated our study many times and built a 95% CI each time, 95% of those intervals would contain the true μ. The specific interval we built either does or doesn't contain μ -- we can't know which, but we're 95% confident.

Why the t-Distribution (Not Normal)?

When sigma is unknown -- which is always in practice -- we estimate it with the sample standard deviation s. This introduces additional uncertainty that the normal distribution doesn't account for. The t-distribution has heavier tails to compensate.

The t-distribution has a parameter called degrees of freedom (df = n - 1). As n gets larger, the t-distribution approaches the normal distribution.

How Degrees of Freedom Change the Shape

The degrees of freedom determine how heavy the tails are:

›df = 1: Very heavy tails (much heavier than normal) -- critical values are far from 0
›df = 10: Getting closer to normal -- tails still noticeably heavier
›df = infinity: Exactly equals the standard normal distribution

Practical example:

1qt(0.975, df = 5) # 2.571 (wider interval needed)

2qt(0.975, df = 30) # 2.042 (closer to normal)

3qnorm(0.975) # 1.960 (standard normal for reference)

For df = 5 (small sample, n = 6), the critical value is further from 0 than for df = 30 (n = 31). This wider interval reflects our uncertainty when working with small samples.

Rule of thumb: For large n (> 30), the difference between t and z is negligible. For small n, the heavier tails matter -- they make intervals wider, reflecting genuine uncertainty.

Confidence Intervals in R

1# Height data for 25 students

2heights <- c(65, 67, 70, 68, 72, 64, 69, 71, 66, 68,

3 70, 73, 65, 67, 69, 68, 72, 71, 66, 64,

4 68, 70, 67, 69, 71)

6t.test(heights)

Reading t.test output: The 95% CI is (67.35, 69.45). Our best estimate is 68.4 inches, and we're 95% confident the true mean height is between 67.35 and 69.45 inches.

Hypothesis Testing

A hypothesis test formally evaluates a specific claim about a population parameter.

Step 1: State hypotheses

›H0 (null): μ = μ₀ -- the "no effect" / status quo claim
›Ha (alternative): μ != μ₀, or μ > μ₀, or μ < μ₀

Step 2: Compute the test statistic

t = (x - μ₀) / (s / sqrt(n))

Step 3: Find the p-value -- the probability of seeing a result as extreme as ours, assuming H0 is true.

Step 4: Decision -- if p-value < alpha (usually 0.05), reject H0.

1# Test if mean height equals 70 inches

2t.test(heights, mu = 70)

Since p-value (0.004) < alpha (0.05), we reject H₀: μ = 70. The data provide significant evidence that the true mean height is not 70 inches.

One-Sided vs Two-Sided Tests

A two-sided test checks if a parameter differs in either direction from the null value:

›H₀: μ = μ₀
›Hₐ: μ != μ₀
›Default in most software

A one-sided test checks if a parameter is specifically larger or smaller:

›Right-tailed: Hₐ: μ > μ₀ (looking for evidence the mean is greater)
›Left-tailed: Hₐ: μ < μ₀ (looking for evidence the mean is less)

When to Use One-Sided vs Two-Sided

Use two-sided when:

›You have no prior directional hypothesis
›A difference in either direction matters equally
›This is the safer, more conservative default

Use one-sided when:

›Theory or context suggests a specific direction
›You only care about one direction
›Example: Does a new drug improve (not worsen) a condition?

One-Sided Tests in R

1# Two-sided test (default)

2t.test(heights, mu = 70)

3# Ha: mu != 70

5# One-sided test: left tail (testing if mu < 70)

6t.test(heights, mu = 70, alternative = "less")

8# One-sided test: right tail (testing if mu > 70)

9t.test(heights, mu = 70, alternative = "greater")

Key point: A one-sided p-value is half the corresponding two-sided p-value (when the test statistic points in the expected direction).

Type I and Type II Errors

No test is perfect -- we can make two kinds of mistakes:

H0 is True	H0 is False
Reject H0	Type I Error (rate = alpha)	Correct! (Power)
Don't Reject	Correct!	Type II Error (rate = beta)

The tradeoff: Making alpha smaller (e.g., 0.01 instead of 0.05) reduces Type I errors but increases Type II errors. In high-stakes settings (like medical testing), you choose alpha carefully based on the cost of each error type.

Power = 1 - beta = the probability of correctly detecting a real effect. Power increases with larger samples, bigger true effects, and higher alpha.

Constructing Confidence Intervals with qnorm()

To build a CI, find the critical z-values using qnorm():

1# For 95% CI: alpha = 0.05, so alpha/2 = 0.025

2qnorm(0.025) # lower tail

3qnorm(0.975) # upper tail

For 90% CI: alpha/2 = 0.05

1qnorm(0.05)

2qnorm(0.95)

Pattern: For (100 - alpha)% CI, use qnorm(1 - alpha/2) for the upper critical value.

The Lady Tasting Tea: A Hypothesis Test Example

Scenario: Lady Bristol claims she can taste whether tea or milk was added first. She tastes 8 cups and guesses correctly on 6. Can we conclude she has ability, or is she just guessing?

›H₀: p = 0.5 (guessing randomly)
›Hₐ: p != 0.5 (has ability)
›X ~ Binomial(8, 0.5), and she got X = 6

P-value: P(X >= 6) assuming H0 is true:

11 - pbinom(5, size = 8, prob = 0.5)

Interpretation: Even by random chance, she has a 14.5% probability of guessing >= 6 correct. This is not unusual, so we do NOT reject H0. The data don't provide strong evidence she has tasting ability.

P-Value Definition

The p-value is the probability of observing a result as extreme as (or more extreme than) what we got, assuming the null hypothesis is true.

›Small p-value (< alpha) -> Result is surprising under H0 -> Reject H0
›Large p-value (>= alpha) -> Result is consistent with H0 -> Fail to reject

When to Use z vs t

Situation	Test Statistic	R Function
sigma known	z = (x - μ₀)/(sigma/sqrt(n))	`pnorm()`
sigma unknown, any n	t = (x - μ₀)/(s/sqrt(n)), df=n-1	`t.test()`

In STAT 240, sigma is almost never known -- we use t-tests.

t Critical Values from qt()

Compute the t* value for confidence intervals:

1qt(0.975, df = 24) # 95% CI, n=25

2qt(0.995, df = 24) # 99% CI, n=25

3qt(0.95, df = 29) # 90% CI, n=30

For a 95% CI with df=n-1, use qt(0.975, df) (the upper 2.5% tail).

CI Width

The width of a confidence interval is:

width = 2 x t x (s/sqrt(n))**

To get a narrower CI:

›Increase sample size n (reduces s/sqrt(n))
›Lower confidence level (smaller t*)
›Reduce variability s

Key exam concept: A 99% CI is always WIDER than a 95% CI for the same data. Higher confidence = wider interval.

Practical vs Statistical Significance

A result can be statistically significant (small p-value, reject H0) but still lack practical significance (the effect is too small to matter in real life).

Example: Suppose you test if a new teaching method increases exam scores, and find:

›Old method: mean = 75.0
›New method: mean = 75.2
›The difference is statistically significant (p = 0.04)

But a 0.2-point difference is meaningless in practice. The effect size is negligible even though it's statistically significant.

Effect size measures the magnitude of a real difference. Common effect size measures:

›Cohen's d = (mean1 - mean2) / pooled_SD

- d = 0.2: small effect

- d = 0.5: medium effect

- d = 0.8: large effect

When reporting results, always include both:

1. p-value (is there a real effect?)

2. Effect size (how big is it?)

Important: A large sample can detect tiny effects, making them statistically significant even when practically irrelevant. A small effect size is a red flag that practical importance may be limited.

CI and Hypothesis Test Equivalence

A 95% confidence interval and a two-sided test at alpha=0.05 always agree:

›If μ₀ is inside the 95% CI -> fail to reject H₀: μ = μ₀ at alpha=0.05
›If μ₀ is outside the 95% CI -> reject H₀: μ = μ₀ at alpha=0.05

1# 95% CI is (67.35, 69.45)

2# H0: mu = 70 -> 70 is outside -> reject H0

3# H0: mu = 68 -> 68 is inside -> fail to reject H0

Correct p-value Interpretation

Correct: "If H0 were true, there is a p% chance of observing a result as extreme as ours (or more so)."

WRONG (common mistakes):

›"The probability that H0 is true is p" -- NO, p is not the probability the null is true
›"The result happened by chance with probability p" -- NO, p assumes the null is true, not that chance caused the result