Study this module with spaced repetition. Wrong answers come back weighted heavier.
FINAL
Final Exam Review
0/80
ANSWERED
Q1
What does filter(df, !is.na(score)) do?
Q2
Which ggplot2 geom creates a histogram?
Q3
What does group_by(dept) %>% summarize(avg = mean(salary)) return?
Q4
pivot_longer() converts data from ______ format to ______ format.
Q5
X ~ Binomial(10, 0.3). What R code gives P(X >= 4)?
Q6
X ~ N(100, 225) (mean=100, sd=15). What R code gives P(X < 85)?
Q7
A population has mean 60 and sd 10. For samples of n=25, the sampling distribution of x-bar has:
Q8
What does a p-value of 0.003 mean?
Q9
For a single proportion CI, which formula gives the standard error?
Q10
A one-sample t-test gives t = 2.5 with df = 15. How do you get the two-sided p-value in R?
Q11
You want to test if two independent groups have different means, and their SDs are very different (s1=4, s2=18). Which test?
Q12
In the regression model y-hat = 12 + 0.5*x, what does the slope 0.5 mean?
Q13
summary(lm(y ~ x)) shows the slope estimate as 2.4 with p-value = 0.002. What null hypothesis was tested?
Q14
Which condition is required for the normal approximation to p-hat to be valid?
Q15
An inner_join(A, B, by='id') returns:
Q16FILL IN
In R, qt(0.975, df = 19) gives the t* critical value for a ___% confidence interval with n = 20.
Q17FILL IN
The pooled standard deviation formula is: s_p = sqrt(((n1-1)s1^2 + (n2-1)s2^2) / ___). Fill in the denominator.
Q18SELECT ALL
Which of the following are correct interpretations of a 95% confidence interval (42, 58) for mu? Select ALL that apply.
Select all that apply — click all correct answers
Q19SELECT ALL
Which conditions must hold for the normal approximation to p-hat to be valid? Select ALL that apply.
Select all that apply — click all correct answers
Q20FILL IN
For regression inference, the degrees of freedom for testing the slope is n minus ___.
Q21
Consider: rent <- 1200; utilities_adj <- utilities + '10'. Assuming utilities is not defined, what happens when this code runs?
Q22
What does class(TRUE) return?
Q23
Which of these is an ILLEGAL variable name in R?
Q24
Given the starwars dataset (87 rows, 14 columns), what does starwars[1,] return?
Q25
What does starwars[,4] return?
Q26
What does starwars$hair_color return?
Q27
You have the noble gases data with columns Gas and num_isotopes. You want bars showing the number of isotopes for each gas. Which geom should you use?
Q28
An airport wants to count the number of flights per carrier from a raw flights dataset. Which geom is best?
Q29
In ggplot(mtcars, aes(y = mpg)) + geom_boxplot(aes(fill = as.factor(cyl))), where is fill — global or local? Variable or constant?
Q30
You want to find pitchers with at least one shutout game (PIT_SHO > 0). Which filter condition is correct?
Q31
You want non-pitcher players with more than 70 double plays AND fewer than 10 errors. Which filter is correct?
Q32
You want to add a column metric_value where height (inches) is converted to cm (x 2.54) and weight (lbs) to kg (x 0.45). Which case_when is correct?
Q33
A hiring agency wants to interview students whose language fluency data IS available. The Major table has 20 students; Language has 50 students; 15 are in both. Which join gives only the 15 common students?
Q34
You want all players from the baseball dataset who have NOT previously won an award (stored in past_awardees). Which join do you use?
Q35
The long-format children dataset has 6 rows (3 kids x height/weight). After pivot_wider(names_from = measurement, values_from = value), what is true about wide_data?
Q36
If P(A) = 0.3, what is P(A')?
Q37
P(A) = 0.4, P(B) = 0.5, P(A and B) = 0.2. What is P(A or B)?
Q38
A and B are independent, P(A)=0.3, P(B)=0.4. What is P(A and B)?
Q39
What does dbinom(5, 10, 0.5) compute?
Q40
X ~ Binomial(15, 0.4). What is E[X]?
Q41
How would you compute P(X >= 3) for X ~ Binomial(8, 0.25)?
Q42
For X ~ N(50, 9) (mean=50, variance=9), what is the standard deviation?
Q43
What does pnorm(1.96) (default mean=0, sd=1) approximately return?
Q44
Heights ~ N(68, 9). About what % of people are between 65 and 71 inches?
Q45
A 95% CI for mu is (12.3, 18.7). Which interpretation is correct?
Q46
What happens to CI width as sample size n increases?
Q47
You get a p-value of 0.03 with alpha = 0.05. What is your conclusion?
Q48
What is a Type I error?
Q49
In R, t.test(x, mu = 10) tests:
Q50
Why use t-distribution instead of normal for most t-tests?
Q51
In a two-group proportion study, H0: p1 = p2. For the hypothesis test SE, should you use pooled or individual proportions?
Q52
SE of p-hat when p-hat = 0.6, n = 100?
Q53
For a 95% CI for a proportion, which z* value?
Q54
For two proportions, if the 95% CI for (p1 - p2) is (-0.05, 0.15), what can you conclude?
Q55
When comparing two proportions, why use the pooled proportion in the hypothesis test but individual proportions for the CI?
Q56
For the normal approximation to be valid for proportions, which condition must hold?
Q57
A researcher selects a random sample of 50 students and measures their study time per week. The sample mean is 12.5 hours and the sample standard deviation is 3.2 hours. Which distribution should be used to construct a confidence interval for the population mean?
Q58
For a sample of n = 100 from a population, the sample mean is 45 and the sample standard deviation is 8. The 95% confidence interval for the population mean is closest to which of the following? (Assume t* is approximately 1.984 for df = 99)
Q59
What does the standard error (SE) measure?
Q60
For a hypothesis test with H0: mu = 100 and Ha: mu not equal to 100, the computed test statistic is t = 2.5 with df = 40. The p-value should be calculated as:
Q61
You calculate a 95% confidence interval for a population mean: [22.3, 28.7]. If you test H0: mu = 30 vs Ha: mu != 30 at the 0.05 significance level using the same data, what should you conclude?
Q62
When we use the t-distribution instead of the z-distribution for inference about a mean, we do so because:
Q63
A study reports a p-value of 0.03 for a hypothesis test with alpha = 0.05. Which statement is correct?
Q64
What is the degrees of freedom (df) for a one-sample t-test with n = 85 observations?
Q65
In which situation should you use a paired t-test rather than an independent samples t-test?
Q66
In the pooled standard deviation formula s_p = sqrt(((n1-1)s1^2 + (n2-1)s2^2) / (n1+n2-2)), what role do the terms (n1-1) and (n2-1) play?
Q67
Which statement is correct about Welch's t-test compared to the standard equal-variance t-test?
Q68
What are the degrees of freedom for an equal-variance two-sample t-test with n1=30 and n2=28?
Q69
When conducting a paired t-test, what is the first calculation you must perform on your data?
Q70
A 95% confidence interval for the difference in means is calculated as [1.2, 4.8]. What can you conclude about the null hypothesis that mu1 = mu2 at the 0.05 significance level?
Q71
In R's t.test() function, what does the parameter var.equal=TRUE versus var.equal=FALSE specify?
Q72
When setting up a hypothesis test to compare two population means, which statement describes a correct null and alternative hypothesis pair?
Q73
A correlation of r = -0.8 between two variables means:
Q74
If R-squared = 0.81 for a regression model, what does this tell us?
Q75
In a regression model predicting height (inches) from age (months), the slope is estimated as b1 = 0.35. How should we interpret this?
Q76
You create a residual plot for your regression model and observe a curved pattern (upward then downward). Which assumption is violated?
Q77
When conducting a hypothesis test for the slope in regression with n = 25 observations, what are the degrees of freedom?
Q78
Why is a prediction interval (PI) always wider than a confidence interval (CI) for the same x value?
Q79
A regression model has b0 = 5 and b1 = 3. What is the predicted value when x = 4?
Q80
In a hypothesis test for the slope, H0: beta1 = 0 means: