Probability is a number between 0 and 1 that measures how likely an event is to occur. P(A) = 0 means A never happens; P(A) = 1 means A always happens.
Before we can compute probabilities, we need to define:
For a fair six-sided die: S = {1, 2, 3, 4, 5, 6}. The event "roll an even number" = {2, 4, 6}.
Equally likely outcomes: When all outcomes are equally likely, P(A) = (# outcomes in A) / (# outcomes in S). Rolling an even number: P = 3/6 = 0.5.
Every event A has a complement A' ("not A"). Together they cover everything:
P(A) + P(A') = 1
P(A') = 1 - P(A)
Why this matters: Often it's easier to find P(not A) and subtract from 1. "P(at least one success in 10 trials)" is hard directly, but "P(zero successes)" is easy with the binomial — then subtract.
The Addition Rule connects them:
P(A union B) = P(A) + P(B) - P(A intersect B)
We subtract the intersection to avoid double-counting it.
If A and B are mutually exclusive (can't both happen), then P(A intersect B) = 0 and the formula simplifies:
P(A union B) = P(A) + P(B)
Example: A standard deck of cards. P(King) = 4/52. P(Heart) = 13/52. P(King of Hearts) = 1/52. So P(King OR Heart) = 4/52 + 13/52 - 1/52 = 16/52.
P(A | B) means "the probability of A, given that B has already occurred." You're restricting your attention to the world where B happened:
P(A|B) = P(A intersect B) / P(B)
This is different from P(A) unless A and B are independent.
Intuition: If it's cloudy (B), the probability of rain (A) is higher than the unconditional P(rain). Knowing B happened updates your belief about A.
Two events are independent if knowing one occurred tells you nothing about the other:
P(A|B) = P(A) if and only if P(A intersect B) = P(A) × P(B)
The multiplication rule for independent events is used constantly in probability:
Mutually exclusive ≠ Independent. This confuses many students. If A and B are mutually exclusive with positive probability, they CANNOT be independent, because knowing A occurred means B definitely didn't — that's information.
Bayes' theorem reverses conditional probability. You know P(B|A) but want P(A|B):
P(A|B) = [ P(B|A) × P(A) ] / P(B)
Example: A test for a disease is 99% accurate. The disease affects 1% of the population. If you test positive, what's the probability you actually have the disease?
The surprising result: Even a 99% accurate test gives only ~50% probability of actually having the disease when it's rare. This is why Bayes matters — intuition fails here.
A random variable X is a function that assigns a number to each outcome in a sample space. For example:
A probability distribution specifies the probabilities of all possible values. For a discrete random variable, we list:
The expected value is the long-run average — the mean of the distribution:
E[X] = sum of x * P(X = x)
In R:
Variance measures spread around the mean:
Var(X) = sum of (x - mu)^2 * P(X = x)
In R:
Why this matters: The shape of the distribution tells you about typical and unlikely outcomes. High concentration around 8-9 means most newborns have very good Apgar scores.
Key distinction for all inference:
| Quantity | Population Parameter | Sample Statistic |
|---|---|---|
| Mean | mu | x-bar |
| Standard Deviation | sigma | s |
| Proportion | p | p-hat |
Every hypothesis test and confidence interval is about using the sample statistic to learn about the population parameter.
Discrete RV: takes countable values (0, 1, 2, 3, ...). P(X = k) can be nonzero. Examples: number of successes, count of defects.
Continuous RV: takes any value in an interval. P(X = exactly 3.14159...) = 0. Probability is always computed as an area. Examples: height, time, weight.
Critical rule: For discrete (like Binomial): P(X >= k) ≠ P(X > k). For continuous (like Normal): P(X >= k) = P(X > k) since P(X=k)=0.
If A and A' partition the sample space:
P(B) = P(B|A)P(A) + P(B|A')P(A')
This formula lets you compute the total probability of B by conditioning on whether A occurs. Used extensively in Bayes' theorem and diagnostic testing problems.
In the next module, we'll work extensively with P(X <= k) — the probability that a random variable is at most k. This is called the cumulative distribution function (CDF).
In R: pbinom(k, size, prob) gives P(X <= k) for a binomial random variable. We'll use this to answer questions like "what's the probability of getting at most 3 successes?"