MODULE 0810 QUESTIONS

Binomial Distribution

ADAPTIVE FLASHCARDS
Flashcard Study Mode
Study this module with spaced repetition. Wrong answers come back weighted heavier.

Binomial Distribution

When to Use the Binomial

The binomial distribution models the number of successes in a fixed number of Bernoulli trials — independent experiments that can only end in success or failure.

Four conditions must ALL hold:

1. Fixed n — you know how many trials before you start

2. Independence — each trial's outcome doesn't affect others

3. Constant p — same probability of success on every trial

4. Binary — only two possible outcomes per trial

Classic examples: Number of heads in 10 coin flips. Number of patients who recover out of 20 given a treatment. Number of correct guesses on a 10-question T/F test.

The PMF Formula

For X ~ Binomial(n, p), the probability of exactly k successes:

P(X = k) = C(n,k) p^k (1-p)^(n-k)

The C(n,k) term counts the number of ways to arrange k successes among n trials. The p^k and (1-p)^(n-k) give the probability of any specific arrangement.

Example: P(exactly 3 heads in 5 flips) = C(5,3) 0.5^3 0.5^2 = 10 0.125 0.25 = 0.3125.

R Functions

In R, you'll never compute the PMF formula by hand — use these functions:

R
1# P(X = 3) when X ~ Binomial(10, 0.4)
2dbinom(3, size = 10, prob = 0.4)
3
4# P(X <= 5) — cumulative probability
5pbinom(5, size = 10, prob = 0.4)
6
7# P(X >= 6) — upper tail: 1 - P(X <= 5)
81 - pbinom(5, size = 10, prob = 0.4)
9
10# P(X >= 6) directly with lower.tail = FALSE
11pbinom(5, size = 10, prob = 0.4, lower.tail = FALSE)

dbinom vs pbinom: d = density (exact probability at one value). p = probability (cumulative, all values up to and including k). For "at least", remember that P(X >= k) = 1 - P(X <= k-1), not 1 - P(X <= k).

Mean and Variance

For X ~ Binomial(n, p):

  • Mean: E[X] = np
  • Variance: Var(X) = np(1-p)
  • Standard deviation: SD(X) = sqrt(np(1-p))
R
1n <- 20
2p <- 0.3
3mean_X <- n * p
4var_X <- n * p * (1 - p)
5sd_X <- sqrt(var_X)

Intuition for the mean: If you flip a coin 20 times (p = 0.5), you expect 10 heads. If p = 0.3, you expect 20 * 0.3 = 6 successes. The mean is just n times the probability of a single success.

Visualizing the Binomial

R
1tibble(k = 0:15) %>%
2 mutate(prob = dbinom(k, size = 15, prob = 0.4)) %>%
3 ggplot(aes(x = k, y = prob)) +
4 geom_col(fill = "p0
5 labs(title = "Binomial(15, 0.4)", x = "Number of successes k", y = "P(X = k)")

The distribution is symmetric only when p = 0.5. For p < 0.5 it's right-skewed; for p > 0.5 it's left-skewed.

Combinations and Factorials

How many ways can you arrange k successes in n trials?

C(n,k) = n! / (k! * (n-k)!)

In R:

R
1choose(5, 3) # C(5,3) = 10 ways to choose 3 items from 5
2factorial(5) # 5! = 120

The Four Binomial Functions

dbinom(k, n, p) — exact probability P(X = k):

R
1dbinom(5, size = 10, prob = 0.4) # P(X = 5) for Binom(10, 0.4)

pbinom(k, n, p) — cumulative probability P(X <= k):

R
1pbinom(5, size = 10, prob = 0.4) # P(X <= 5)

qbinom(q, n, p) — quantile (inverse CDF). Find the smallest x where P(X <= x) >= q:

R
1qbinom(0.7, size = 10, prob = 0.5) # returns 6, since P(X <= 5)=0.623 < 0.7 but P(X <= 6)=0.828 >= 0.7

rbinom(n_samples, size, prob) — simulate random samples:

R
1rbinom(5, size = 10, prob = 0.5) # generate 5 random Binomial(10, 0.5) values

Quick reference: d = density (exact), p = probability (cumulative), q = quantile, r = random sample.

BINS Mnemonic for Binomial

The four required conditions for Binomial(n, p):

  • Binary — each trial has exactly two outcomes (success/failure, yes/no)
  • Independent — outcome of one trial doesn't affect others
  • Number fixed — n is determined before the experiment
  • Same p — probability of success is identical for every trial

Common violation: Sampling without replacement from a small population violates Independence. Rule of thumb: If population size >= 20n, independence is approximately satisfied.

P(X >= k) vs P(X > k) — Critical Distinction

For a DISCRETE distribution these are NOT the same:

  • P(X >= k) = "at least k" = 1 - P(X <= k-1) → use 1 - pbinom(k-1, n, p)
  • P(X > k) = "more than k" = 1 - P(X <= k) → use 1 - pbinom(k, n, p)

Example: X ~ Binomial(10, 0.4)

R
1# P(X >= 4) — at least 4 successes
21 - pbinom(3, 10, 0.4)
R
1# P(X > 4) — more than 4 (i.e., at least 5)
21 - pbinom(4, 10, 0.4)

This is the #1 source of exam errors. "At least 4" means >= 4, so subtract P(X <= 3), NOT P(X <= 4).

Simulation with rbinom()

You can simulate binomial experiments using rbinom():

R
1# Simulate 100,000 experiments: n=10 trials, p=0.4
2sims <- rbinom(100000, size = 10, prob = 0.4)
3mean(sims) # should be close to n*p = 4

The simulated mean (approximately 4) confirms our formula E[X] = np = 10 0.4 = 4. Simulation is a powerful way to verify theoretical results.