ggplot2 is built on a concept called the grammar of graphics — the idea that every plot can be described by a small set of components assembled together. Once you understand the grammar, you can build any chart.
The three required components:
aes()) — which columns map to which visual propertiesWhy ggplot2? Most plotting tools require you to specify low-level details (this pixel, that color). ggplot2 lets you think in terms of your data — "map species to color" — and handles the rendering automatically.
Every ggplot starts the same way and grows with +:
Add layers to make it richer:
This distinction trips up almost everyone at first.
Variable aesthetic — inside aes(), maps a data column to a visual property. Each unique value gets a different appearance:
Constant aesthetic — outside aes(), applies the same value to every element:
The rule: If the value comes from your data, put it inside aes(). If it's a fixed visual setting, put it outside aes() directly in the geom.
Common mistake — putting a fixed color inside aes():
Global aesthetics go in ggplot(aes(...)) and are inherited by all geom layers:
Local aesthetics go inside a specific geom and only apply there:
| Goal | Geom | Notes |
|---|---|---|
| Distribution of continuous var | geom_histogram() | Use binwidth to control bin size |
| Smooth density curve | geom_density() | Better for comparing groups |
| Count of categories | geom_bar() | Counts rows automatically |
| You have y values already | geom_col() | You supply both x and y |
| Relationship between two vars | geom_point() | Add geom_smooth() for trend |
| Change over time | geom_line() | Connect points in order |
| Reference line | geom_hline() / geom_vline() | Constant line on plot |
geom_bar vs geom_col: This is a frequent exam question. geom_bar() counts rows for you — you only provide x. geom_col() plots y values you already have — you provide both x and y.
facet_wrap() splits your plot into separate panels by a variable. This is more honest than using color alone:
When to facet: When you have 3+ groups and they overlap so much that a single plot is unreadable. Facets trade space for clarity.
For continuous variables, plot the distribution with a histogram:
Control bin edges with boundary:
Overlay a smooth density curve:
Use geom_smooth() to add a trend line:
Control transparency with alpha (0 = fully transparent, 1 = fully opaque):
Control point shape with shape:
For plots with bars or filled shapes:
color colors the border or outlinefill colors the interiorControl plot colors and appearance with scale_*() functions and theme():
Common complete themes:
theme_minimal() — clean, minimal backgroundtheme_classic() — classic x/y axes, no gridtheme_bw() — white background with gridlinesBy default, factors are ordered alphabetically. Use fct_reorder() from the forcats package to reorder by another variable:
fct_reorder(x, y) orders the levels of x by the median of y (the default). Pass .fun = mean to sort by mean instead. This is essential for ranked bar charts and ordered boxplots.
Watch out for these:
geom_bar() counts rows; geom_col() uses your y values. Know which you need.aes(). Data-driven aesthetics go INSIDE.geom_smooth() adds a band by default. Use se=FALSE to remove it.facet_wrap(vars(col)) and facet_wrap(~col) work in ggplot2. The vars() style is preferred in tidyverse code, but ~col is not deprecated and won't cause errors.na.rm=TRUE to suppress warnings.