What will this code output? sum(baseball$position == infield)
sum(baseball$position == infield)Which lines of code correctly return the average BAT_AVG for the baseball dataset, ignoring NA values? Select ALL that apply.
Select all that apply — click all correct answers
Complete the argument to ignore NAs when calculating mean PIT_AVG:
mean(baseball$PIT_AVG, _____ = TRUE)A student wants to plot the number of players at each position as a bar chart. They already have the data (baseball has a 'position' column with one row per player). Which geom should they use?
In this code: ggplot(films, aes(x = year, y = duration, color = century, shape = century)) + geom_point(size = 2) Which are VARIABLE aesthetics (mapped to data)? (Select all that apply)
ggplot(mendota, aes(x = year, y = duration)) +
geom_point(aes(color = century, shape = century), size = 2)Select all that apply — click all correct answers
Fill in the correct geom to add a horizontal reference line at the mean duration:
ggplot(mendota, aes(x = year, y = duration)) +
geom_point() +
_____(yintercept = mean(mendota$duration), color = "red")The following code produces an error on geom_smooth(). What is the fix?
states %>%
ggplot() +
geom_point(aes(x = Income, y = Illiteracy)) +
geom_smooth()Which filter() conditions correctly find non-pitchers with more than 70 double plays AND fewer than 10 errors? Select ALL that apply.
Select all that apply — click all correct answers
Complete the join to return only players from baseball who have NOT previously won an award (from past_awardees, joined on id):
_____(baseball, past_awardees, by = "id")Given the children height/weight data in LONG format (columns: name, measurement, value), complete case_when to convert height to cm and weight to kg:
data %>%
mutate(metric = case_when(
measurement == "height" ~ value * _____,
measurement == "weight" ~ value * _____ ))After running pivot_wider(data, names_from=measurement, values_from=value) on the children long-format data, which statements are TRUE? Select ALL that apply.
Select all that apply — click all correct answers
You want to join the 'Major' and 'Language' tables so that ONLY the 15 students present in BOTH tables are in the result. Which join is correct?
Fill in the correct pivot function to go from WIDE format (one column per language) to LONG format (one row per student-language pair):
Language %>%
select(Name, English:Spanish) %>%
_____(English:Spanish, names_to = "Language", values_to = "Fluent")Match each scenario to geom_bar. Which scenarios should use geom_bar() (not geom_col, geom_histogram, or others)? Select ALL that apply.
Select all that apply — click all correct answers
What is the class() of the following vector? c(1, TRUE, "banana")
x <- c(1, TRUE, "banana")Consider: rent <- 1200; utilities_adj <- utilities + '10'. Assuming utilities is not defined, what happens when this code runs?
What does class(TRUE) return?
Which of these is an ILLEGAL variable name in R?
Run these lines: x <- 5; x + 10. What is the value of x after both lines execute?
What is the result of 0 / 0 in R?
What does the $ operator do in R?
What does nchar('statistics') return?
Given x <- NA, which of the following lines of code will return NA? Select ALL that apply.
# Assume starwars is loaded
x <- NA
(A) x + 5
(B) is.na(x)
(C) mean(c(1, NA, 3))
(D) x == NASelect all that apply — click all correct answers
Complete the code to combine 'Hello' and 'World' into a single string with a space between them:
Which of the following are VALID variable names in R? Select ALL that apply.
Select all that apply — click all correct answers
Given the starwars dataset (87 rows, 14 columns), what does starwars[1,] return?
What does starwars[,4] return?
What does starwars$hair_color return?
What does sum(c(TRUE, FALSE, TRUE, TRUE)) return?
What does c(2, TRUE, 'banana') produce in R?
Which code correctly checks if nothing is an NA value, where nothing <- NA?
What is the result of (height > 72) | (height + 6 < 70) when height = 60?
Given x <- c(10, 20, NA, 40), which of the following expressions evaluate to TRUE? (Select all that apply)
x <- c(10, 20, NA, 40)
# Which of these return TRUE (not NA)?Select all that apply — click all correct answers
Fill in the correct function to check if a value is missing. Using == will NOT work:
Given x <- c(2, TRUE, "banana"), which statements are TRUE? Select ALL that apply.
Select all that apply — click all correct answers
You have the noble gases data with columns Gas and num_isotopes. You want bars showing the number of isotopes for each gas. Which geom should you use?
An airport wants to count the number of flights per carrier from a raw flights dataset. Which geom is best?
In ggplot(mtcars, aes(y = mpg)) + geom_boxplot(aes(fill = as.factor(cyl))), where is fill — global or local? Variable or constant?
Consider this code: ggplot() + geom_point(aes(x = Illiteracy, y = Income), size = 3). What does size = 3 do?
Which code adds a horizontal red line at the mean duration of the mendota dataset?
A zoologist wants a scatter plot of body length vs height with color varying by species, all points semi-transparent. What does alpha control?
The code ggplot() + geom_point(aes(x = Income, y = Illiteracy)) + geom_smooth() returns an error. What is the fix?
In the following code: ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) + geom_point(size = 3, alpha = 0.7) Which of the following are VARIABLE aesthetics (mapped to data)? (Select all that apply)
ggplot(penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point(size = 3, alpha = 0.7)Select all that apply — click all correct answers
We want to color the INTERIOR of bars by a variable. Fill in the aesthetic name (not color, which colors the border):
Which of the following statements about geom_bar() and geom_col() are TRUE? Select ALL that apply.
Select all that apply — click all correct answers
You want to find pitchers with at least one shutout game (PIT_SHO > 0). Which filter condition is correct?
You want non-pitcher players with more than 70 double plays AND fewer than 10 errors. Which filter is correct?
You want to add a column metric_value where height (inches) is converted to cm (x 2.54) and weight (lbs) to kg (x 0.45). Which case_when is correct?
What does birthwt %>% group_by(smoke) %>% summarize(n = n()) compute?
What is the difference between summarize() and mutate() when used after group_by()?
What does slice_max(body_mass_g, n = 3) return?
What does drop_na() (no arguments) do to a dataframe?
The baseball dataset has a position column ('infield', 'pitcher', 'outfield'). Which filter() inputs correctly find pitchers with at least one shutout (PIT_SHO > 0)? Select ALL that apply.
Select all that apply — click all correct answers
Fill in the dplyr function that collapses grouped rows into summary statistics:
Which of the following dplyr operations keep ALL original rows of the dataframe? Select ALL that apply.
Select all that apply — click all correct answers
A hiring agency wants to interview students whose language fluency data IS available. The Major table has 20 students; Language has 50 students; 15 are in both. Which join gives only the 15 common students?
You want all players from the baseball dataset who have NOT previously won an award (stored in past_awardees). Which join do you use?
The long-format children dataset has 6 rows (3 kids x height/weight). After pivot_wider(names_from = measurement, values_from = value), what is true about wide_data?
The Language dataset has columns: Name, Mom_speaks, Dad_speaks, English, Chinese, French, Arabic, Spanish. You want only Name and the language columns (English through Spanish) in long format. Which code is correct?
Major has column Student_Name; Language has column Name. How do you join on these different key names?
What does left_join(A, B) do when A has a row with no match in B?
After pivot_wider on the children data (6 rows, 3 columns: name, measurement, value), which is TRUE about the resulting wide_data?
The 'students' table has 100 rows and 'grades' has 80 rows (only students who submitted work), joined on student_id. Which statements are TRUE? Select ALL that apply.
Select all that apply — click all correct answers
The 'Major' table uses 'Student_Name' but the 'Language' table uses 'Name'. Complete the join_by() to match them:
inner_join(Major, Language,
by = join_by(Student_Name == _____))Given a wide dataset with columns: Name, English, Chinese, French, Arabic, Spanish — which code correctly pivots to long format with columns Name, Language, Fluent? Select ALL that apply.
Select all that apply — click all correct answers