R is a programming language built specifically for statistics and data analysis. Unlike Python, which is a general-purpose language, R was designed from the ground up for working with data — every feature, from its vector math to its plotting system, reflects that purpose.
R has a huge ecosystem of add-on packages. The tidyverse bundle — which includes ggplot2, dplyr, tidyr, and more — is used throughout this course.
Note: You only need to run install.packages() once. Run library() at the top of every R script or R Markdown file.
RStudio is the IDE (integrated development environment) we use to write and run R code. Our actual files are R Markdown (.Rmd) documents — a combination of explanatory text and runnable code chunks.
Why R? In statistics and data science, R is the standard. The tidyverse ecosystem (ggplot2, dplyr, tidyr) gives you powerful, readable tools for data manipulation and visualization that would take much more code in other languages.
A variable is a named container for a value. In R, the assignment operator is <-:
Notice that R doesn't print anything when you assign. To see the value, just type the variable name:
Important: x + 10 shows 14 but x is still 4. You need <- to actually change a variable. This is a common source of confusion.
Every object in R has a class that determines how R treats it. The three you'll use constantly:
42, 3.14, -7.5)"hello", "TRUE")TRUE or FALSE (no quotes)The type matters because operations only work with compatible types. Adding a number to a character string causes an error, not an automatic conversion.
Functions in R take inputs (arguments) inside parentheses and return a result:
Reading function documentation: If you're not sure what a function does, type ?function_name in the R console. For example, ?seq shows all the arguments seq() accepts.
R has three special numeric values worth knowing:
NA is especially important in statistics — real datasets almost always have missing values, and R's treatment of NA is deliberate: any operation involving NA returns NA unless you explicitly tell R to ignore them.
R Markdown (.Rmd) files blend text and executable R code. Code goes inside code chunks delimited by triple backticks:
Inside a chunk, lines starting with # are comments. When you knit the document (Ctrl+Shift+K), R runs all chunks and outputs a polished HTML or PDF report.
R Markdown workflow: Write explanatory text → insert code chunks → knit → instant report. This is how actual data scientists document their work.
R is case-sensitive: x and X are different variables. TRUE and true are not the same (R only recognizes TRUE, FALSE, NA).
To see your current working directory, use getwd():
paste() for combining stringsThe c() function combines values into a vector (a sequence of items of the same type):
Vectors are the fundamental data structure in R — almost everything is a vector.
R variable names must follow these rules:
.)_), and dots (.)2ndvar is invalid)TRUE, FALSE, NULL, NA, Inf, NaNExam tip: .hidden_val is a valid name. 2ndvar is not. TRUE is always a reserved keyword.
sum() with multiple arguments%% Modulo OperatorThe modulo operator %% returns the remainder after division:
Common use: x %% 2 == 1 checks if x is odd.
R can convert between types using as.numeric(), as.character(), and as.logical():
When conversion fails, R returns NA with a warning:
R comes with several pre-loaded vectors of useful values:
The rep() function repeats values to create longer vectors:
The each= argument repeats each element before moving to the next, while the default repeats the entire vector.