MODULE 0710 QUESTIONS

Probability Foundations

ADAPTIVE FLASHCARDS
Flashcard Study Mode
Study this module with spaced repetition. Wrong answers come back weighted heavier.

Probability Foundations

What is Probability?

Probability is a number between 0 and 1 that measures how likely an event is to occur. P(A) = 0 means A never happens; P(A) = 1 means A always happens.

Before we can compute probabilities, we need to define:

  • Sample space (S) — the set of ALL possible outcomes
  • Event — any subset of the sample space

For a fair six-sided die: S = {1, 2, 3, 4, 5, 6}. The event "roll an even number" = {2, 4, 6}.

Equally likely outcomes: When all outcomes are equally likely, P(A) = (# outcomes in A) / (# outcomes in S). Rolling an even number: P = 3/6 = 0.5.

The Complement Rule

Every event A has a complement A' ("not A"). Together they cover everything:

P(A) + P(A') = 1

P(A') = 1 - P(A)

Why this matters: Often it's easier to find P(not A) and subtract from 1. "P(at least one success in 10 trials)" is hard directly, but "P(zero successes)" is easy with the binomial — then subtract.

Union and Intersection

  • Intersection (A intersect B): "A AND B both occur"
  • Union (A union B): "A OR B (or both) occur"

The Addition Rule connects them:

P(A union B) = P(A) + P(B) - P(A intersect B)

We subtract the intersection to avoid double-counting it.

If A and B are mutually exclusive (can't both happen), then P(A intersect B) = 0 and the formula simplifies:

P(A union B) = P(A) + P(B)

Example: A standard deck of cards. P(King) = 4/52. P(Heart) = 13/52. P(King of Hearts) = 1/52. So P(King OR Heart) = 4/52 + 13/52 - 1/52 = 16/52.

Conditional Probability

P(A | B) means "the probability of A, given that B has already occurred." You're restricting your attention to the world where B happened:

P(A|B) = P(A intersect B) / P(B)

This is different from P(A) unless A and B are independent.

Intuition: If it's cloudy (B), the probability of rain (A) is higher than the unconditional P(rain). Knowing B happened updates your belief about A.

Independence

Two events are independent if knowing one occurred tells you nothing about the other:

P(A|B) = P(A) if and only if P(A intersect B) = P(A) × P(B)

The multiplication rule for independent events is used constantly in probability:

  • Flipping two fair coins: P(both heads) = P(H) × P(H) = 0.5 × 0.5 = 0.25
  • Assuming independence when it doesn't hold is one of the most common errors in statistics

Mutually exclusive ≠ Independent. This confuses many students. If A and B are mutually exclusive with positive probability, they CANNOT be independent, because knowing A occurred means B definitely didn't — that's information.

Bayes' Theorem

Bayes' theorem reverses conditional probability. You know P(B|A) but want P(A|B):

P(A|B) = [ P(B|A) × P(A) ] / P(B)

Example: A test for a disease is 99% accurate. The disease affects 1% of the population. If you test positive, what's the probability you actually have the disease?

  • P(positive | disease) = 0.99
  • P(disease) = 0.01
  • P(positive) = P(pos|disease)P(disease) + P(pos|no disease)P(no disease) = 0.990.01 + 0.010.99 approximately 0.0198
  • P(disease | positive) = (0.99 * 0.01) / 0.0198 approximately 0.5

The surprising result: Even a 99% accurate test gives only ~50% probability of actually having the disease when it's rare. This is why Bayes matters — intuition fails here.

Random Variables and Probability Distributions

A random variable X is a function that assigns a number to each outcome in a sample space. For example:

  • X = result of rolling a die (values 1-6)
  • X = number of heads in 3 coin flips (values 0-3)
  • X = height of a randomly chosen student (values 58-80 inches)

A probability distribution specifies the probabilities of all possible values. For a discrete random variable, we list:

  • Support: all possible values
  • Probability for each value: P(X = x)
  • Constraint: all probabilities sum to 1

Expected Value E[X]

The expected value is the long-run average — the mean of the distribution:

E[X] = sum of x * P(X = x)

In R:

R
1vals <- 0:10 # support for Apgar scores
2probs <- c(0.001, 0.006, 0.007, 0.008, 0.012, 0.02, 0.038,
3 0.099, 0.319, 0.437, 0.053)
4E_X <- sum(vals * probs)
5E_X

Variance and Standard Deviation

Variance measures spread around the mean:

Var(X) = sum of (x - mu)^2 * P(X = x)

In R:

R
1mu <- E_X
2Var_X <- sum((vals - mu)^2 * probs)
3SD_X <- sqrt(Var_X)
4Var_X
5SD_X

Visualizing a Discrete Distribution

R
1apgar_data <- tibble(
2 score = 0:10,
3 prob = c(0.001, 0.006, 0.007, 0.008, 0.012, 0.02, 0.038,
4 0.099, 0.319, 0.437, 0.053)
5)
6
7ggplot(apgar_data, aes(x = score, y = prob)) +
8 geom_col(fill = "steelblue", alpha = 0.8) +
9 labs(title = "Apgar Score Distribution", x = "Score", y = "Probability")

Why this matters: The shape of the distribution tells you about typical and unlikely outcomes. High concentration around 8-9 means most newborns have very good Apgar scores.

Population vs Sample — Definitions

Key distinction for all inference:

  • Population: entire group of interest; parameters (mu, sigma, p) describe it — fixed but unknown
  • Sample: subset we observe; statistics (x-bar, s, p-hat) describe it — vary from sample to sample
  • Inference: using sample statistics to estimate population parameters
QuantityPopulation ParameterSample Statistic
Meanmux-bar
Standard Deviationsigmas
Proportionpp-hat

Every hypothesis test and confidence interval is about using the sample statistic to learn about the population parameter.

Discrete vs Continuous Random Variables

Discrete RV: takes countable values (0, 1, 2, 3, ...). P(X = k) can be nonzero. Examples: number of successes, count of defects.

Continuous RV: takes any value in an interval. P(X = exactly 3.14159...) = 0. Probability is always computed as an area. Examples: height, time, weight.

Critical rule: For discrete (like Binomial): P(X >= k) ≠ P(X > k). For continuous (like Normal): P(X >= k) = P(X > k) since P(X=k)=0.

Law of Total Probability

If A and A' partition the sample space:

P(B) = P(B|A)P(A) + P(B|A')P(A')

This formula lets you compute the total probability of B by conditioning on whether A occurs. Used extensively in Bayes' theorem and diagnostic testing problems.

Preview: Cumulative Probabilities

In the next module, we'll work extensively with P(X <= k) — the probability that a random variable is at most k. This is called the cumulative distribution function (CDF).

In R: pbinom(k, size, prob) gives P(X <= k) for a binomial random variable. We'll use this to answer questions like "what's the probability of getting at most 3 successes?"