Beijing 2022 Olympics - Cleaning and Summarizing Data
Introduction
The Winter Olympics is an event hosted every 4 years last hosted in Beijing in 2022. In the 2022 Olympics, there were a total of 109 events across 15 disciplines. It also has any more information on the Beijing 2022 Olympics you might be looking for. As the whole goal of the Olympics is to win as many medals as you can, everything is tracked. From the athlete, where they were born, if they won, their height, how many medals they won. This provides large data sets that are very interesting to look at.
In this worksheet, we will look at some of these data sets and learn how to clean them into tidy formats and use these cleaned data to investigate questions such as which country won the most medals. Cleaning and manipulating data sets is a key component to your Data Science journey and opens many doors for what you can learn in the future.
Data
There are two data sets here, athletes and medals.The first athletes contains information on individual athletes. Each row in this data set represents a different athlete that competed at the Beijing 2022 Olympic Winter Games. This data set has 2897 rows and 14 columns.
Download non tidy data: athletes.csv
Athletes
Variable Descriptions
| Variable | Description |
|---|---|
| name | Athlete Name |
| short_name | Athlete Name(short) |
| gender | Athlete Gender |
| birth_date | Athlete Date of Birthday |
| birth_place | Athlete Place of Birthday |
| birth_country | Athlete Country of Birthday |
| country | Athlete Country |
| country_code | Athlete Country Code |
| discipline | Discipline |
| discipline_code | Discipline Code |
| residence_place | Athlete Residence Place |
| residence_country | Athlete Residence Country |
| height_m/ft | Athlete Height (meters and feet) |
| url | Athlete Profile Link |
The second medals contains information on medal winners. Each row in this data set represents a medal won in the Beijing 2022 Olympic Winter Games. This can mean Gold, Bronze, or Silver medals. This data set has 694 rows and 12 columns.
Download non tidy data: medals.csv
Download tidy data: medalsTidy.csv
Medals
Variable Descriptions
| Variable | Description |
|---|---|
| medal_type | Medal Type (Gold, Bronze, Silver) |
| medal_code | Medal Number (1, 2, 3) |
| medal_date | Date Medal Won |
| athlete_short_name | Athlete Name (Short) |
| athlete_name | Athlete Name |
| athlete_sex | Athlete Gender |
| athlete_link | Athlete Profile Link |
| event | Event Competed In |
| country | Country |
| country_code | Country Code |
| discipline | Discipline |
| discipline_code | Discipline Code |