Logical operators compare two values and return TRUE or FALSE. You'll use these constantly inside filter() and if statements:
Common mistake: = assigns a value, == checks equality. Writing filter(species = "Adelie") will give an error; you need filter(species == "Adelie").
You can combine conditions with & (AND) and | (OR):
AND requires both conditions to be true. OR requires at least one.
NA means a value is absent — it's not zero, not empty string, it's genuinely unknown. R is strict about this: any arithmetic or comparison with NA propagates the NA:
Why does `NA == NA` return NA? Because you don't know what the missing value is. If person A's age is unknown and person B's age is unknown, you can't say they're equal — they might be different ages.
The correct way to test for NA is always is.na():
A vector is R's most fundamental data structure — an ordered sequence of values of the same type. Create one with c():
Type coercion happens silently when you mix types. R converts everything to the most flexible type (character > numeric > logical):
One of R's most useful features: math and comparisons apply to every element:
Key insight: sum() treats TRUE as 1 and FALSE as 0. So sum(scores > 80) is a clean way to count how many scores are above 80 — no loops needed.
A dataframe organizes multiple vectors as columns in a table — like a spreadsheet in R:
Access a column with $, or use [row, col] indexing:
tibble vs data.frame: tibble() is the tidyverse version of a dataframe. It prints more nicely, never converts strings to factors by default, and gives more helpful error messages. In STAT 240, we always use tibbles.
R supports the standard arithmetic operators:
Check if a value exists in a set:
The colon : creates a sequence:
When you mix different types in a single vector, R coerces everything to the most flexible type:
logical → numeric → character
Each arrow means "gets coerced to". Examples:
Important: NA is special — it exists in all types (logical, numeric, character). All three are just displayed as NA.
A factor stores categorical data — like species, color, or treatment group. Under the hood, it's an integer with category labels:
Factors matter in ggplot() (they determine the order of bars and axes) and in regression models (they're encoded as dummy variables).
R uses 1-based indexing (not 0-based like Python). Access elements with square brackets:
Negative indexing (x[-2]) means "all EXCEPT index 2".
The is.na() function tests for missing values.
There are two different approaches to handling NAs:
Option 1: Replace NAs with a value first, then compute
Option 2: Keep NAs, but tell the function to skip them
Key difference: Option 1 treats NAs as 0 (affecting the denominator). Option 2 computes the mean of only the non-NA values: (88 + 75 + 61) / 3 = 74.67.
Most functions accept na.rm=TRUE to ignore missing values.