MODULE 1410 QUESTIONS

Two-Sample and Paired Inference

ADAPTIVE FLASHCARDS

Flashcard Study Mode

Study this module with spaced repetition. Wrong answers come back weighted heavier.

Two-Sample and Paired Inference

Introduction: Comparing Two Groups

Often in statistics, we want to compare the means of two different groups. The key distinction is whether the samples are independent or paired.

Independent samples occur when:

›Two completely different groups of subjects are measured
›Group membership is unrelated
›Example: comparing sleep times between cats that are fixed vs intact

Paired samples occur when:

›The same subjects are measured twice
›Subjects are matched on relevant characteristics
›Example: height measured at age 13 and age 14 for the same individuals

This distinction is critical because it changes how we analyze the data.

Equal Variance Two-Sample t-Test

When comparing two independent samples, we want to test whether the population means are equal.

Null hypothesis: H₀: μ₁ = μ₂, or equivalently, μ₁ - μ₂ = 0

Alternative hypothesis: Hₐ: μ₁ != μ₂ (two-tailed), or μ₁ > μ₂ (one-tailed), or μ₁ < μ₂ (one-tailed)

The Pooled Standard Deviation

When we assume equal variances in the two populations, we pool the sample variances to get a better estimate:

Key concept: The pooled SD is a weighted average of the two sample standard deviations, weighted by their respective degrees of freedom.

sₚ = √( ((n1-1)s1²  +  (n2-1)s2²) / (n₁ + n₂ - 2) )

The numerator combines the squared deviations from both groups. The denominator is n₁ + n₂ - 2, which is the total degrees of freedom available from both samples.

Standard Error and Test Statistic

The standard error of the difference in means is:

SE = sₚ × √(1/n₁ + 1/n₂)

The test statistic follows a t-distribution with df = n₁ + n₂ - 2:

t = \frac{(\bar{x}_1 - \bar{x}_2) - 0}{SE}$$

Confidence Interval

A confidence interval for the difference in means (μ₁ - μ₂) is:

(\bar{x}_1 - \bar{x}_2) \pm t^* \cdot SE$$

where t* is the critical value from the t-distribution with df = n₁ + n₂ - 2.

Example: Cat Sleep Times

Let's compare sleep times between fixed and intact male ragdoll cats.

1# Sample data

2xbar1 <- 12.5 # mean sleep time for fixed cats (hours)

3xbar2 <- 11.8 # mean sleep time for intact cats (hours)

4s1 <- 2.3 # SD for fixed cats

5s2 <- 2.1 # SD for intact cats

6n1 <- 25 # sample size for fixed cats

7n2 <- 23 # sample size for intact cats

9# Pooled SD

10s_p <- sqrt(((n1 - 1)*s1^2 + (n2 - 1)*s2^2) / (n1 + n2 - 2))

11cat("Pooled SD:", s_p, "\n")

13# Point estimate and SE

14pt_est <- xbar1 - xbar2

15se <- s_p * sqrt(1/n1 + 1/n2)

16cat("Point estimate of difference:", pt_est, "\n")

17cat("Standard error:", se, "\n")

19# 90% confidence interval

20cv <- qt(0.95, df = n1 + n2 - 2)

21ci_lower <- pt_est - cv * se

22ci_upper <- pt_est + cv * se

23cat("90% CI:", ci_lower, "to", ci_upper, "\n")

25# Hypothesis test

26test_stat <- (xbar1 - xbar2 - 0) / se

27p_value <- 2 * pt(abs(test_stat), df = n1 + n2 - 2, lower.tail = FALSE)

28cat("Test statistic:", test_stat, "\n")

29cat("Two-tailed p-value:", p_value, "\n")

OUTPUT

1Pooled SD: 2.2

2Point estimate of difference: 0.7

3Standard error: 0.4358

490% CI: 0.05 to 1.35

5Test statistic: 1.607

6Two-tailed p-value: 0.1119

Using t.test() in R

R makes this easy with the t.test() function:

1# Assuming df1 and df2 are data frames with Sleep_time_hours column

2t.test(df1$Sleep_time_hours, df2$Sleep_time_hours,

3 mu = 0, conf.level = 0.90, var.equal = TRUE)

OUTPUT

1 Two Sample t-test

3data: df1$Sleep_time_hours and df2$Sleep_time_hours

4t = 1.607, df = 46, p-value = 0.1149

5alternative hypothesis: true difference in means is not equal to 0

690 percent confidence interval:

7 0.0495 1.3505

8sample estimates:

9mean of x mean of y

10 12.5 11.8

Why Pooling Works

Pooling is valid when we assume the populations have equal variances. The key insight is that we're combining information from both samples to estimate a common population standard deviation.

Key concept: Pooling gives us more information (higher degrees of freedom) and thus more power to detect differences, IF the equal variance assumption is reasonable.

The weights in the pooled SD formula, (n₁ - 1) and (n₂ - 1), reflect how much information each sample contributes. Larger samples get more weight because their variances are more stable estimates.

However, if the variances truly are different, pooling can be misleading. This is where Welch's t-test comes in.

Welch's Unequal Variance t-Test

When sample standard deviations are substantially different, or when we're unsure about equality of variances, Welch's t-test is safer. The key differences:

1. Do NOT pool the standard deviations

SE = √( s₁²/n₁  +  s₂²/n₂ )

3. Use the Welch-Satterthwaite degrees of freedom (more complex, typically reported by software)

Welch Degrees of Freedom

The Welch-Satterthwaite formula for degrees of freedom is:

df_W = \frac{\left(\frac{s_1²}{n_1} + \frac{s_2²}{n_2}\right)²}{\frac{s_1^4}{n_1²(n_1-1)} + \frac{s_2^4}{n_2²(n_2-1)}}$$

This looks complex, but the interpretation is straightforward: it reduces the degrees of freedom when variances are unequal, reflecting the loss of information from having to estimate two different population standard deviations.

R Implementation

In R, var.equal=FALSE (the default) uses Welch's method:

1# Welch's t-test with unequal variances

2xbar1 <- 12.5

3xbar2 <- 11.8

4s1 <- 3.2 # larger SD for group 1

5s2 <- 1.5 # smaller SD for group 2

6n1 <- 25

7n2 <- 23

9# Manual calculation

10se <- sqrt(s1^2/n1 + s2^2/n2)

11pt_est <- xbar1 - xbar2

13# Welch degrees of freedom

14w_numer <- (s1^2/n1 + s2^2/n2)^2

15w_denom <- (s1^4/(n1^2*(n1-1)) + s2^4/(n2^2*(n2-1)))

16df_welch <- w_numer / w_denom

18cat("Welch SE:", se, "\n")

19cat("Welch DF:", df_welch, "\n")

21# 95% CI

22cv <- qt(0.975, df = df_welch)

23ci_lower <- pt_est - cv * se

24ci_upper <- pt_est + cv * se

25cat("95% CI:", ci_lower, "to", ci_upper, "\n")

OUTPUT

1Welch SE: 0.5201

2Welch DF: 38.47

395% CI: -0.3296 to 1.7296

Using t.test() directly:

1t.test(df1$Sleep_time_hours, df2$Sleep_time_hours,

2 conf.level = 0.95, var.equal = FALSE)

OUTPUT

1 Welch Two Sample t-test

3data: df1$Sleep_time_hours and df2$Sleep_time_hours

4t = 1.346, df = 38.47, p-value = 0.1854

5alternative hypothesis: true difference in means is not equal to 0

695 percent confidence interval:

7 -0.3296 1.7296

8sample estimates:

9mean of x mean of y

10 12.5 11.8

Which Test Should You Use?

Guidance for choosing between equal-variance and Welch's t-tests:

Use Equal Variance t-test when:

›Sample standard deviations are similar (rule of thumb: ratio < 1.5)
›Sample sizes are similar
›You have strong prior knowledge that population variances are equal

Use Welch's t-test when:

›Sample standard deviations differ noticeably
›Sample sizes are very different
›You're unsure about variance equality
›As a general default choice (Welch is safer and controls Type I error better)

Key concept: Welch's t-test is more conservative and doesn't lose power when variances are actually equal. Most statisticians recommend Welch as the default choice unless you have good reason to assume equal variances.

R's default is var.equal=FALSE (Welch), which reflects modern statistical practice.

Paired t-Test

When data is paired (same subjects measured twice, or matched subjects), we have a different situation. The key is that measurements are not independent across groups.

When Data is Paired

›Before and after measurements on the same subject
›Measurements on matched subjects (twins, spouse pairs, etc.)
›Repeat measurements under different conditions

The Paired Analysis Approach

The genius of paired testing is that we convert a two-sample problem into a one-sample problem:

1. Compute the differences: d_i = x_i1 - x_i2 for each pair

2. Treat the differences as a single sample

3. Test whether the mean difference is zero

This is a one-sample t-test on the differences, with df = n - 1 (where n is the number of pairs).

t = (d - 0) / (s_d / √n)

Why Pairing Matters: A Critical Example

Consider height growth from age 13 to age 14 in 5 individuals:

1# Age 13 and 14 heights (in cm) for 5 individuals

2thirteen <- c(44.1, 59.0, 65.9, 58.7, 49.3)

3fourteen <- c(46.3, 60.5, 68.2, 59.4, 50.6)

5# Differences (proper paired analysis)

6growth <- fourteen - thirteen

7print(growth)

9# One-sample t-test on differences

10n <- length(growth)

11xbar <- mean(growth)

12s <- sd(growth)

13test_stat <- (xbar - 0) / (s / sqrt(n))

14p_value <- 2 * pt(abs(test_stat), df = n - 1, lower.tail = FALSE)

16cat("Mean growth:", xbar, "cm\n")

17cat("SD of growth:", s, "cm\n")

18cat("Test statistic:", test_stat, "\n")

19cat("p-value (paired test):", p_value, "\n")

OUTPUT

1growth: 2.2 1.5 2.3 0.7 1.3

2Mean growth: 1.6 cm

3SD of growth: 0.6708 cm

4Test statistic: 5.331

5p-value(paired test): 0.00793

Now compare to an INCORRECT analysis that ignores pairing:

1# WRONG: treating as independent samples

2t.test(fourteen, thirteen, var.equal = TRUE)

OUTPUT

1 Two Sample t-test

3data: fourteen and thirteen

4t = 1.248, df = 8, p-value = 0.2509

5alternative hypothesis: true difference in means is not equal to 0

6sample estimates:

7mean of x mean of y

8 56.80 55.40

Compare results:

Key concept: The paired test gives t = 5.331, p = 0.00793 (highly significant). The unpaired test gives t = 1.248, p = 0.2509 (not significant). This dramatic difference shows why pairing is crucial. When data is paired and we fail to pair in the analysis, we throw away important information and lose power to detect real effects.

The paired analysis is much more powerful because it controls for individual differences in height. By looking at changes within individuals, we reduce noise.

Using t.test() for Paired Data

1# Correct paired analysis

2t.test(growth, alternative = "greater")

3t.test(fourteen, thirteen, paired = TRUE, alternative = "greater")

OUTPUT

1 One Sample t-test

3data: growth

4t = 5.331, df = 4, p-value = 0.00396

5alternative hypothesis: true mean is greater than 0

7 Paired t-test

9data: fourteen and thirteen

10t = 5.331, df = 4, p-value = 0.00396

11alternative hypothesis: true difference in means is greater than 0

Both produce identical results. The key difference in the function call: paired = TRUE tells R to compute differences first.

Confidence Interval for Difference in Means

Interpretation

A 95% confidence interval for the difference in population means (μ₁ - μ₂) tells us:

Key concept: If we repeated the sampling procedure many times and computed a confidence interval each time, approximately 95% of those intervals would contain the true population difference.

Practical interpretation:

›If the CI includes 0, the difference is not significant at the 0.05 level
›If the CI is entirely positive, group 1 has a significantly higher mean
›If the CI is entirely negative, group 1 has a significantly lower mean
›The width of the CI reflects precision (narrower = more precise)

Examples from Previous Sections

For the equal variance test: 90% CI: [0.0495, 1.3505]

Interpretation: We're 90% confident the true difference in sleep times is between 0.05 and 1.35 hours, favoring the fixed cats.

For the Welch test with unequal variances: 95% CI: [-0.3296, 1.7296]

Interpretation: This wider interval reflects the greater uncertainty from unequal variances. It includes 0, so we don't have strong evidence of a difference.

Paired Data CI

For the height growth data, a 95% CI for mean growth:

1xbar <- mean(growth)

2s <- sd(growth)

3n <- length(growth)

4se <- s / sqrt(n)

5cv <- qt(0.975, df = n - 1)

6ci_lower <- xbar - cv * se

7ci_upper <- xbar + cv * se

8cat("95% CI for mean growth:", ci_lower, "to", ci_upper, "\n")

OUTPUT

195% CI for mean growth: 0.5314 to 2.6686

We're 95% confident the true mean height growth from age 13 to 14 is between 0.53 and 2.67 cm.

Connecting Confidence Intervals to Hypothesis Tests

There's a beautiful connection between confidence intervals and hypothesis tests.

The Relationship

For a two-tailed hypothesis test with significance level α:

›If the (1 - α) confidence interval includes 0, we fail to reject H0
›If the (1 - α) confidence interval does NOT include 0, we reject H0

Example

Looking back at our paired growth test:

›We got a 95% CI: [0.5314, 2.6686]
›This CI does NOT include 0
›Therefore, we reject H₀: μ = 0 at the 0.05 level
›This matches our p-value of 0.00793 (< 0.05)

Key concept: The confidence interval and hypothesis test are two views of the same underlying question. The CI tells us not just whether a difference exists, but also the range of plausible values.

R Code Summary: t.test() Parameters

Independent Samples - Equal Variances

1t.test(group1, group2,

2 mu = 0, # null hypothesis difference

3 conf.level = 0.95, # confidence level

4 var.equal = TRUE) # assume equal variances

Independent Samples - Welch (Unequal Variances)

1t.test(group1, group2,

2 mu = 0,

3 conf.level = 0.95,

4 var.equal = FALSE) # do NOT assume equal variances (default)

Paired Samples

1# Method 1: Test on differences

2t.test(differences,

3 mu = 0,

4 conf.level = 0.95)

6# Method 2: Specify pairing directly

7t.test(group1, group2,

8 paired = TRUE,

9 mu = 0,

10 conf.level = 0.95)

One-Tailed Tests

1# Test if group1 mean > group2 mean

2t.test(group1, group2,

3 alternative = "greater") # or "less" for opposite

5# Test if paired differences > 0

6t.test(differences,

7 alternative = "greater")

Summary Table

Scenario	df	SE Formula	Assumption	R Code
Independent, equal var	n1+n2-2	sₚ*sqrt(1/n₁ + 1/n₂)	sigma1=sigma2	var.equal=TRUE
Independent, unequal var	Welch-Satterthwaite	sqrt(s1^2/n₁ + s2^2/n₂)	None (safer)	var.equal=FALSE (default)
Paired	n-1	s_d/√n	Differences normal	paired=TRUE

Key Takeaways

1. Always distinguish between independent and paired data structures

2. When data is paired, compute differences and treat as one-sample problem

3. Welch's t-test is safer as a default for independent samples

4. Equal variance t-test assumes (and requires) similar population variances

5. Confidence intervals and hypothesis tests tell complementary stories

6. The df and SE change based on the test choice

7. Failing to recognize and properly analyze paired data can lead to missing real effects