Can Midseason Records Predict WNBA Playoff Teams?

Data Cleaning
Missing Data
Frequency Tables
Clean WNBA team-game data and investigate whether teams ranked in the top eight midway through the season tend to make the playoffs.
Authors
Affiliation

Kristen Varin

St. Lawrence University

Ivan Ramler

St. Lawrence University

Published

May 27, 2026

Module

Please note that these material have not yet completed the required pedagogical and industry peer-reviews to become a published module on the SCORE Network. However, instructors are still welcome to use these materials if they are so inclined.

Introduction

The WNBA worksheet introduces the idea of using team records from earlier in the season to predict which teams will make the playoffs. Most of the time, we would expect teams with better records halfway through the season to have a higher chance of making the playoffs, but is this always the case? Just as importantly, how do we know whether the data we are using are complete and reliable enough for this analysis? By completing this worksheet, you will work through several data cleaning steps involving WNBA team-game data and then use the cleaned data to investigate the relationship between midseason performance and playoff outcomes.

The Women’s National Basketball Association (WNBA) is a professional basketball league in the United States that was founded in 1996. It was established by the NBA as a counterpart to promote women’s basketball at the professional level. The league’s inaugural season began in 1997 with eight teams.

Throughout its history, the WNBA has been a pioneering force in women’s sports, providing a platform for talented athletes to showcase their skills and inspire fans globally. The league has expanded and contracted over the years and, typically consisting of 12 or 13 teams. As of 2026, the league consists of 15 teams with plans to expand further. The table below shows the size of the league since its inception.

WNBA expansion and contraction
Season(s) No. of teams
1997 8
1998 10
1999 12
2000–2002 16
2003 14
2004–2005 13
2006 14
2007 13
2008 14
2009 13
2010–2024 12
2025 13
2026 15

The WNBA’s playoff structure typically features the top eight teams from the regular season standings advancing to the postseason. The playoffs are organized into single-elimination rounds, culminating in the WNBA Finals, where the last two teams standing compete in a best-of-five series to determine the league champion.

Beyond its competitive play, the WNBA has been a leader in promoting social justice initiatives and advocating for equality both on and off the court. It continues to grow in popularity and influence, contributing significantly to the growth of women’s basketball worldwide.

For additonal background about the WNBA, please watch the following video:

The video below provides a brief overview of the WNBA league structure and playoff format.

Depending on the background of the student, this activity is designed for approximately 50–75 minutes of class time or as an outside of class activity.

By the end of this activity, you will be able to:

  • Use various dplyr, tidyr, and lubridate package functions to clean a data set for further use.

  • Identify data quality issues, such as inconsistent team names and missing game records.

  • Explain how missing or incomplete data can affect an analysis.

  • Create and interpret a two-way table that use custom built variables.

Technology requirement:

  • R version: The activity handout requires knowledge of Quarto and the following tidyverse packages: dplyr, tidyr, lubridate, and ggplot.

  • Python version: The activity handout requires knowledge of Jupyter notebooks and the following packages: pandas, numpy, plotnine, and statsmodels.

Data

The wnba_data data set contains 8920 rows and 9 columns. Each row represents a game played by a WNBA team in one of the 2003 to 2022 regular seasons. Thus, each game is associated with two rows: one for each team. The columns are as follows:

Data: Variable Descriptions
Variable Description
game_id game id number
season season number
season_type binary predictor; 2 if regular season game; 3 if playoff game
game_date date of the game
team_id team id number
team_display_name full team name (name and city)
team_winner Boolean; True if the team won the game
opponent_team_id id number of the opponent
team_home_away Where the game was played; either “home” or “away”

Download data: wnba_data.csv

Data Source

Gilani S, Hutchinson G (2022). wehoop: Access Women’s Basketball Play by Play Data. R package version 1.5.0, https://CRAN.R-project.org/package=wehoop.

Materials

We offer worksheets (and their solutions) in Quarto (using R) and Jupyter Notebook (using python) formats.

R versions

Class handout - Quarto

Class handout - Quarto - with solutions

Python versions

Class handout - Quarto

Class handout - Quarto - with solutions

Exploration of the WNBA data revealed that some seasons had incomplete (i.e., were missing) game records. We identified this issue by tallying the number of games recorded for each team within each season and noticing that the totals were occassionally inconsistent across teams.

After identifying these data quality issues, we considered several possible ways to continue the analysis. For example, we could remove seasons with incomplete records, manually fill in missing games using outside sources, or use a different data source. Each approach involves tradeoffs between simplicity, accuracy, and the amount of additional work required.

After choosing a data-cleaning strategy, we used the cleaned data to create a two-way table comparing whether teams were ranked in the top eight at midseason with whether they ultimately made the playoffs. This analysis illustrates how data cleaning decisions can directly affect the conclusions we draw from sports data.

Acknowledgements

Thumbnail image: “WNBA Barnstar.png” by Mungo Kitsch, licensed under CC BY-SA 4.0 via Wikimedia Commons. The image incorporates a WNBA logo element; use does not imply endorsement by the WNBA.