MODULE 0211 QUESTIONS

Data Types & Structures

ADAPTIVE FLASHCARDS
Flashcard Study Mode
Study this module with spaced repetition. Wrong answers come back weighted heavier.

Data Types & Structures

Logical Operators

Logical operators compare two values and return TRUE or FALSE. You'll use these constantly inside filter() and if statements:

R
15 > 3 # greater than
25 < 3 # less than
35 >= 5 # greater than or equal
45 == 5 # equality — note double equals!
55 != 3 # not equal

Common mistake: = assigns a value, == checks equality. Writing filter(species = "Adelie") will give an error; you need filter(species == "Adelie").

You can combine conditions with & (AND) and | (OR):

R
1(5 > 3) & (2 < 1) # TRUE AND FALSE = FALSE
2(5 > 3) | (2 < 1) # TRUE OR FALSE = TRUE

AND requires both conditions to be true. OR requires at least one.

Missing Values: NA

NA means a value is absent — it's not zero, not empty string, it's genuinely unknown. R is strict about this: any arithmetic or comparison with NA propagates the NA:

R
1NA + 5
2NA == NA # even this is NA, not TRUE!

Why does `NA == NA` return NA? Because you don't know what the missing value is. If person A's age is unknown and person B's age is unknown, you can't say they're equal — they might be different ages.

The correct way to test for NA is always is.na():

R
1x <- NA
2is.na(x)

Vectors

A vector is R's most fundamental data structure — an ordered sequence of values of the same type. Create one with c():

R
1scores <- c(88, 92, 75, 95, 61)
2names <- c("Alice", "Bob", "Carol")
3scores[2] # 1-indexed — first element is [1]
4length(scores)

Type coercion happens silently when you mix types. R converts everything to the most flexible type (character > numeric > logical):

R
1c(2, TRUE, "banana") # all become character
2c(2, TRUE) # TRUE becomes 1

Operations Work Element-Wise

One of R's most useful features: math and comparisons apply to every element:

R
1scores - 70 # subtract 70 from each
2scores > 80 # TRUE/FALSE for each
3sum(scores > 80) # count how many passed
4mean(scores)

Key insight: sum() treats TRUE as 1 and FALSE as 0. So sum(scores > 80) is a clean way to count how many scores are above 80 — no loops needed.

Dataframes and Tibbles

A dataframe organizes multiple vectors as columns in a table — like a spreadsheet in R:

R
1library(tidyverse)
2students <- tibble(
3 name = c("Alice", "Bob", "Carol"),
4 score = c(88, 92, 75),
5 pass = c(TRUE, TRUE, FALSE)
6)
7students

Access a column with $, or use [row, col] indexing:

R
1students$score # extract as vector
2students[1, ] # entire first row
3students[, 2] # entire second column

tibble vs data.frame: tibble() is the tidyverse version of a dataframe. It prints more nicely, never converts strings to factors by default, and gives more helpful error messages. In STAT 240, we always use tibbles.

Mathematical Operators

R supports the standard arithmetic operators:

R
15 + 3 # addition
25 - 3 # subtraction
35 * 3 # multiplication
45 / 3 # division
52 ^ 10 # exponentiation (2 to the power of 10)
62 ** 10 # also exponentiation (equivalent to ^)

The %in% Operator

Check if a value exists in a set:

R
1prime_numbers <- c(2, 3, 5, 7, 11, 13)
22 %in% prime_numbers
34 %in% prime_numbers

Numeric Shortcuts

The colon : creates a sequence:

R
11:10
25:1

Common Vector Functions

R
1scores <- c(88, 92, 75, 95, 61)
2min(scores)
3max(scores)
4mean(scores)
5median(scores)
6sum(scores)
7log(scores) # natural logarithm

Exploring Dataframes

R
1head(students) # first 6 rows
2glimpse(students) # compact overview
3colnames(students) # column names
4dim(students) # dimensions (rows, cols)
5nrow(students) # number of rows
6ncol(students) # number of columns

Subsetting with [row, col]

R
1students[1, ] # first row, all columns
2students[, 2] # all rows, second column
3students$name # column by name
4students[1, 3] # first row, third column

Coercion Hierarchy

When you mix different types in a single vector, R coerces everything to the most flexible type:

logical → numeric → character

Each arrow means "gets coerced to". Examples:

R
1c(TRUE, 1, "a") # all become character
2c(TRUE, 1) # TRUE becomes 1
3c(TRUE, FALSE, NA) # stays logical

Important: NA is special — it exists in all types (logical, numeric, character). All three are just displayed as NA.

Factors

A factor stores categorical data — like species, color, or treatment group. Under the hood, it's an integer with category labels:

R
1sizes <- factor(c("small","large","medium","large","small"))
2levels(sizes) # "large" "medium" "small" (alphabetical by default)
3nlevels(sizes) # 3
4table(sizes) # count per level

Factors matter in ggplot() (they determine the order of bars and axes) and in regression models (they're encoded as dummy variables).

Accessing and Modifying Vectors

R uses 1-based indexing (not 0-based like Python). Access elements with square brackets:

R
1x <- c(10, 20, 30, 40, 50)
2x[3] # 30 — single element at position 3
3x[c(1,3,5)] # 10 30 50 — multiple positions
4x[-2] # all except index 2: 10 30 40 50
5x[x > 25] # logical subsetting: 30 40 50
6x[2] <- 99 # replace element 2
7x

Negative indexing (x[-2]) means "all EXCEPT index 2".

Checking and Replacing NAs

The is.na() function tests for missing values.

There are two different approaches to handling NAs:

Option 1: Replace NAs with a value first, then compute

R
1scores <- c(88, NA, 75, NA, 61)
2scores[is.na(scores)] <- 0 # NAs become 0
3mean(scores) # mean of c(88, 0, 75, 0, 61)

Option 2: Keep NAs, but tell the function to skip them

R
1scores <- c(88, NA, 75, NA, 61)
2mean(scores, na.rm = TRUE) # ignores the two NAs

Key difference: Option 1 treats NAs as 0 (affecting the denominator). Option 2 computes the mean of only the non-NA values: (88 + 75 + 61) / 3 = 74.67.

Most functions accept na.rm=TRUE to ignore missing values.