MODULE 0210 QUESTIONS

Data Types & Structures

ADAPTIVE FLASHCARDS

Flashcard Study Mode

Study this module with spaced repetition. Wrong answers come back weighted heavier.

Data Types & Structures

Logical Operators

Logical operators compare two values and return TRUE or FALSE. You'll use these constantly inside filter() and if statements:

15 > 3 # greater than

25 < 3 # less than

35 >= 5 # greater than or equal

45 == 5 # equality — note double equals!

55 != 3 # not equal

Common mistake: = assigns a value, == checks equality. Writing filter(species = "Adelie") will give an error; you need filter(species == "Adelie").

You can combine conditions with & (AND) and | (OR):

1(5 > 3) & (2 < 1) # TRUE AND FALSE = FALSE

2(5 > 3) | (2 < 1) # TRUE OR FALSE = TRUE

AND requires both conditions to be true. OR requires at least one.

Missing Values: NA

NA means a value is absent — it's not zero, not empty string, it's genuinely unknown. R is strict about this: any arithmetic or comparison with NA propagates the NA:

1NA + 5

2NA == NA # even this is NA, not TRUE!

Why does `NA == NA` return NA? Because you don't know what the missing value is. If person A's age is unknown and person B's age is unknown, you can't say they're equal — they might be different ages.

The correct way to test for NA is always is.na():

1x <- NA

2is.na(x)

Vectors

A vector is R's most fundamental data structure — an ordered sequence of values of the same type. Create one with c():

1scores <- c(88, 92, 75, 95, 61)

2names <- c("Alice", "Bob", "Carol")

3scores[2] # 1-indexed — first element is [1]

4length(scores)

Type coercion happens silently when you mix types. R converts everything to the most flexible type (character > numeric > logical):

1c(2, TRUE, "banana") # all become character

2c(2, TRUE) # TRUE becomes 1

Operations Work Element-Wise

One of R's most useful features: math and comparisons apply to every element:

1scores - 70 # subtract 70 from each

2scores > 80 # TRUE/FALSE for each

3sum(scores > 80) # count how many passed

4mean(scores)

Key insight: sum() treats TRUE as 1 and FALSE as 0. So sum(scores > 80) is a clean way to count how many scores are above 80 — no loops needed.

Dataframes and Tibbles

A dataframe organizes multiple vectors as columns in a table — like a spreadsheet in R:

1library(tidyverse)

2students <- tibble(

3 name = c("Alice", "Bob", "Carol"),

4 score = c(88, 92, 75),

5 pass = c(TRUE, TRUE, FALSE)

7students

Access a column with $, or use [row, col] indexing:

1students$score # extract as vector

2students[1, ] # entire first row

3students[, 2] # entire second column

tibble vs data.frame: tibble() is the tidyverse version of a dataframe. It prints more nicely, never converts strings to factors by default, and gives more helpful error messages. In STAT 240, we always use tibbles.

Mathematical Operators

R supports the standard arithmetic operators:

15 + 3 # addition

25 - 3 # subtraction

35 * 3 # multiplication

45 / 3 # division

52 ^ 10 # exponentiation (2 to the power of 10)

62 ** 10 # also exponentiation (equivalent to ^)

The %in% Operator

Check if a value exists in a set:

1prime_numbers <- c(2, 3, 5, 7, 11, 13)

22 %in% prime_numbers

34 %in% prime_numbers

Numeric Shortcuts

The colon : creates a sequence:

11:10

25:1

Common Vector Functions

1scores <- c(88, 92, 75, 95, 61)

2min(scores)

3max(scores)

4mean(scores)

5median(scores)

6sum(scores)

7log(scores) # natural logarithm

Exploring Dataframes

1head(students) # first 6 rows

2glimpse(students) # compact overview

3colnames(students) # column names

4dim(students) # dimensions (rows, cols)

5nrow(students) # number of rows

6ncol(students) # number of columns

Subsetting with [row, col]

1students[1, ] # first row, all columns

2students[, 2] # all rows, second column

3students$name # column by name

4students[1, 3] # first row, third column

Coercion Hierarchy

When you mix different types in a single vector, R coerces everything to the most flexible type:

logical → numeric → character

Each arrow means "gets coerced to". Examples:

1c(TRUE, 1, "a") # all become character

2c(TRUE, 1) # TRUE becomes 1

3c(TRUE, FALSE, NA) # stays logical

Important: NA is special — it exists in all types (logical, numeric, character). All three are just displayed as NA.

Factors

A factor stores categorical data — like species, color, or treatment group. Under the hood, it's an integer with category labels:

1sizes <- factor(c("small","large","medium","large","small"))

2levels(sizes) # "large" "medium" "small" (alphabetical by default)

3nlevels(sizes) # 3

4table(sizes) # count per level

Factors matter in ggplot() (they determine the order of bars and axes) and in regression models (they're encoded as dummy variables).

Accessing and Modifying Vectors

R uses 1-based indexing (not 0-based like Python). Access elements with square brackets:

1x <- c(10, 20, 30, 40, 50)

2x[3] # 30 — single element at position 3

3x[c(1,3,5)] # 10 30 50 — multiple positions

4x[-2] # all except index 2: 10 30 40 50

5x[x > 25] # logical subsetting: 30 40 50

6x[2] <- 99 # replace element 2

Negative indexing (x[-2]) means "all EXCEPT index 2".

Checking and Replacing NAs

The is.na() function tests for missing values.

There are two different approaches to handling NAs:

Option 1: Replace NAs with a value first, then compute

1scores <- c(88, NA, 75, NA, 61)

2scores[is.na(scores)] <- 0 # NAs become 0

3mean(scores) # mean of c(88, 0, 75, 0, 61)

Option 2: Keep NAs, but tell the function to skip them

1scores <- c(88, NA, 75, NA, 61)

2mean(scores, na.rm = TRUE) # ignores the two NAs

Key difference: Option 1 treats NAs as 0 (affecting the denominator). Option 2 computes the mean of only the non-NA values: (88 + 75 + 61) / 3 = 74.67.

Most functions accept na.rm=TRUE to ignore missing values.