MODULE 0110 QUESTIONS

Intro to R & RStudio

ADAPTIVE FLASHCARDS
Flashcard Study Mode
Study this module with spaced repetition. Wrong answers come back weighted heavier.

Introduction to R & RStudio

What is R?

R is a programming language built specifically for statistics and data analysis. Unlike Python, which is a general-purpose language, R was designed from the ground up for working with data — every feature, from its vector math to its plotting system, reflects that purpose.

Loading Packages

R has a huge ecosystem of add-on packages. The tidyverse bundle — which includes ggplot2, dplyr, tidyr, and more — is used throughout this course.

R
1install.packages("tidyverse") # run once to download
2library(tidyverse) # run every session to load

Note: You only need to run install.packages() once. Run library() at the top of every R script or R Markdown file.

RStudio is the IDE (integrated development environment) we use to write and run R code. Our actual files are R Markdown (.Rmd) documents — a combination of explanatory text and runnable code chunks.

Why R? In statistics and data science, R is the standard. The tidyverse ecosystem (ggplot2, dplyr, tidyr) gives you powerful, readable tools for data manipulation and visualization that would take much more code in other languages.

Variables and Assignment

A variable is a named container for a value. In R, the assignment operator is <-:

R
1x <- 4
2my_name <- "Miranda"
3is_raining <- TRUE

Notice that R doesn't print anything when you assign. To see the value, just type the variable name:

R
1x
2x + 10 # evaluates but does NOT change x
3x <- x + 1
4x

Important: x + 10 shows 14 but x is still 4. You need <- to actually change a variable. This is a common source of confusion.

Data Types

Every object in R has a class that determines how R treats it. The three you'll use constantly:

  • Numeric — any number (42, 3.14, -7.5)
  • Character — text, always wrapped in quotes ("hello", "TRUE")
  • Logical — exactly TRUE or FALSE (no quotes)
R
1class(42)
2class("hello")
3class(TRUE)

The type matters because operations only work with compatible types. Adding a number to a character string causes an error, not an automatic conversion.

Useful Built-in Functions

Functions in R take inputs (arguments) inside parentheses and return a result:

R
1sqrt(16)
2abs(-7)
3nchar("statistics")
4toupper("hello r")
5seq(from = 0, to = 10, by = 2)

Reading function documentation: If you're not sure what a function does, type ?function_name in the R console. For example, ?seq shows all the arguments seq() accepts.

Special Values

R has three special numeric values worth knowing:

R
10 / 0 # Not a number — undefined math
21 / 0 # Positive infinity
3NA # Missing data — absence of a value

NA is especially important in statistics — real datasets almost always have missing values, and R's treatment of NA is deliberate: any operation involving NA returns NA unless you explicitly tell R to ignore them.

R Markdown & Code Chunks

R Markdown (.Rmd) files blend text and executable R code. Code goes inside code chunks delimited by triple backticks:

R
1# This is a code chunk
2x <- 4
3x + 1

Inside a chunk, lines starting with # are comments. When you knit the document (Ctrl+Shift+K), R runs all chunks and outputs a polished HTML or PDF report.

R Markdown workflow: Write explanatory text → insert code chunks → knit → instant report. This is how actual data scientists document their work.

Case Sensitivity & Working Directory

R is case-sensitive: x and X are different variables. TRUE and true are not the same (R only recognizes TRUE, FALSE, NA).

R
1x <- 5
2X <- 10
3x
4X

To see your current working directory, use getwd():

R
1getwd()

More Useful Functions

paste() for combining strings

R
1paste("Hello", "world")
2paste("Name:", "Alice")
3paste("x =", 42)

Vectors with c()

The c() function combines values into a vector (a sequence of items of the same type):

R
1x <- c(3, 7, 2, 9, 1) # numeric vector
2y <- c("a", "b", "c") # character vector
3z <- c(TRUE, FALSE, TRUE) # logical vector

Vectors are the fundamental data structure in R — almost everything is a vector.

Variable Naming Rules

R variable names must follow these rules:

  • Must start with a letter or a dot (.)
  • Can contain letters, digits, underscores (_), and dots (.)
  • Cannot start with a digit (2ndvar is invalid)
  • Cannot be reserved words: TRUE, FALSE, NULL, NA, Inf, NaN
R
1my_var2 <- 10 # valid: starts with letter
2.hidden <- 5 # valid: starts with dot
3MyResult <- 3.14 # valid: camelCase works too
4
5# 2ndvar <- 1 # INVALID: starts with digit
6# TRUE <- 1 # INVALID: reserved word

Exam tip: .hidden_val is a valid name. 2ndvar is not. TRUE is always a reserved keyword.

sum() with multiple arguments

R
1sum(1:10) # sum of 1, 2, 3, ... 10
2sum(c(3, 7, 2)) # sum of 3, 7, 2

The %% Modulo Operator

The modulo operator %% returns the remainder after division:

R
110 %% 3 # remainder of 10 ÷ 3
27 %% 2 # check if odd (remainder 1)

Common use: x %% 2 == 1 checks if x is odd.

Type Conversion Functions

R can convert between types using as.numeric(), as.character(), and as.logical():

R
1as.numeric("3.14") # Convert string to number
2as.character(42) # Convert number to string
3as.numeric(TRUE) # Convert logical to number

When conversion fails, R returns NA with a warning:

R
1as.numeric("hello")

Built-in R Vectors

R comes with several pre-loaded vectors of useful values:

R
1letters # lowercase a–z
2LETTERS # uppercase A–Z
3month.name # "January" through "December"
4letters[10] # "j" — R vectors are 1-indexed

rep() — Replicate Values

The rep() function repeats values to create longer vectors:

R
1rep(0, 5) # [1] 0 0 0 0 0
2rep(c(1, 2), 3) # [1] 1 2 1 2 1 2
3rep(c("A","B"), each=2) # [1] "A" "A" "B" "B"

The each= argument repeats each element before moving to the next, while the default repeats the entire vector.