2023 Boston Marathon - Understanding variability using ANOVA

ANOVA
Interactions
Multiple comparisons
Using ANOVA to explore factors associated with finish time for runners in the 2023 Boston Marathon
Author
Affiliation

Ivan Ramler

St. Lawrence University

Published

March 5, 2024

Module

Please note that these material have not yet completed the required pedagogical and industry peer-reviews to become a published module on the SCORE Network. However, instructors are still welcome to use these materials if they are so inclined.

Introduction

For this activity, you will be exploring the result times from female and male runners that finished the 2023 Boston Marathon.

In particular, you will examine both visualizations and Analysis of Variance (ANOVA) modules of result times to help understand the variation in finish times of participants.

Investigating these trends is useful for several reasons. Mainly, exploring these trends can help to deepen our understanding of how different factors, such as gender and age, impact marathon performances. Analyzing the distribution of finish times can also provide insights into the competitive landscape of the marathon.

This activity would be suitable for an in-class example or broken in several examples.

By the end of the activity, you will be able to:

  1. Analyze distributions using histograms and boxplots
  2. Understand how different factors in an ANOVA model can be used to explain variability in a response variable.
  3. Determine when it is appropriate to include interaction terms in a two-way ANOVA model.
  4. Use multi-comparison procedures (such as Tukey’s Honest Significant Differences) to determine statistically discernible differences in group means.

The provided worksheet and sample solutions assume students have access to, and familiarity with, R and the tidyverse suite of packages. However, these can be easily modified to use any statistical software that allows them to fit a two-way ANOVA model with interactions as well as perform Tukey’s HSD.

Data

The data set contains 26598 rows and 15 columns. Each row represents a runner who completed the Boston Marathon in 2023

Download data: boston_marathon_2023.csv

Variable Descriptions
Variable Description
age_group age group of the runner
place_overall finishing place of the runner out of all runners
place_gender finishing place of runner among the same gender
place_division finishing place of runner among runners of the same gender and age group
name name of runner
gender gender of runner
team team the runner is affiliated with
bib_number bib number of runner
half_time half marathon time of runner
finish_net finishing time timed from when they cross the starting gate
finish_gun finishing time of runner timed from when the starter gun is fired
age_group age group of the runner
half_time_sec half marathon time in seconds
finish_net_sec net finish in seconds
finish_gun_sec gun finish in seconds
finish_net_minutes net finish in minutes

Data Source

Boston Athletic Association

Materials

In conclusion, the Boston Marathon Times worksheet provides valuable learning opportunities for students in several key areas. It allows them to understand reasons by variability might exist and to discover multimodal distributions can occur simply due to excluding an important explanatory variable that otherwise confounds the analysis. The calculation of z-scores or other similar measurement of relative location enables students to compare and contrast the remarkable achievements of the top female and male finishers, shedding light on their talent in their respective fields. Overall, this worksheet allows students to critically analyze the 2023 marathon result data and draw meaningful conclusions about the extraordinary performances of athletes in the race.