WNBA Data Cleaning Module
Module
Please note that these material have not yet completed the required pedagogical and industry peer-reviews to become a published module on the SCORE Network. However, instructors are still welcome to use these materials if they are so inclined.
Introduction
The WNBA worksheet will introduce the idea of predicting playoff teams from their record earlier in the season. Most of the time, it is expected that teams with better records halfway through the season will have a higher chance of making it to the playoffs, but is this always the case? How do you know that the data you are using is a valid way of predicting this? By completing this worksheet, you will be guided through various data cleaning steps and will be able to answer these questions.
Data
The wnba_data
data set contains 8920 rows and 9 columns. Each row represents a game played by a WNBA team in one of the 2003 to 2022 regular seasons. Thus, each game is associated with two rows: one for each team. The columns are as follows:
Data: Variable Descriptions
Variable | Description |
---|---|
game_id | game id number |
season | season number |
season_type | binary predictor; 2 if regular season game; 3 if playoff game |
game_date | date of the game |
team_id | team id number |
team_display_name | full team name (name and city) |
team_winner | Boolean; True if the team won the game |
opponent_team_id | id number of the opponent |
team_home_away | Where the game was played; either “home” or “away” |
Download data: wnba_data.csv
Data Source
Gilani S, Hutchinson G (2022). wehoop: Access Women’s Basketball Play by Play Data. R package version 1.5.0, https://CRAN.R-project.org/package=wehoop.
Materials
We offer worksheets (and their solutions) in Quarto (using R) format.