What is Data Cleaning? #
Data Cleaning means fixing messy, incorrect, or inconsistent data so it becomes usable for analysis.
Raw data is often:
- Duplicated
- Misformatted
- Combined in one column
- Inconsistent
Removing Duplicates #
What are Duplicates? #
Duplicate data = same records repeated multiple times
Example:
| Name | |
|---|---|
| Alex | a@gmail.com |
| Alex | a@gmail.com |
Why Remove Duplicates? #
- Avoid wrong analysis
- Prevent double counting
- Improve data quality
Steps to Remove Duplicates #
- Select your dataset
- Go to Data tab
- Click Remove Duplicates
- Select columns to check
- Click OK
Example #
Before:
Alex John Alex
After:
Alex John
Important Tips #
- Always keep a backup before removing
- Choose correct columns (e.g., Email for uniqueness)
Text to Columns #
What is Text to Columns? #
Used to split one column into multiple columns
Example:
"Alex, 25, USA"
→ Split into:
- Name
- Age
- Country
Types #
| Type | Description |
|---|---|
| Delimited | Split by comma, space, etc. |
| Fixed Width | Split by position |
Steps (Delimited) #
- Select column
- Go to Data → Text to Columns
- Choose Delimited
- Select delimiter (comma, space, etc.)
- Click Finish
Example #
Before:
Alex,25,USA
After:
| Name | Age | Country |
|---|---|---|
| Alex | 25 | USA |
Use Cases #
- Split full names
- Separate addresses
- Clean imported data
Flash Fill #
What is Flash Fill? #
Flash Fill automatically detects patterns and fills data.
No formula needed
Example #
Dataset: #
| Full Name |
|---|
| Alex John |
Extract First Name: #
- Type
Alexmanually in next column - Press Ctrl + E
- Excel fills automatically
More Examples #
Extract Last Name: #
John
Combine Names:
Alex_John
Steps #
- Type example manually
- Press Ctrl + E
Done
Use Cases #
- Split names
- Format phone numbers
- Create custom patterns
Combined Real Example #
Raw Data: #
" Alex John , 25 , USA "
Cleaning Steps: #
- TRIM → remove spaces
- Text to Columns → split data
- Flash Fill → format names
- Remove Duplicates → clean repeats
Table #
| Feature | Purpose | Shortcut |
|---|---|---|
| Remove Duplicates | Delete repeated data | Data → Remove Duplicates |
| Text to Columns | Split data | Data → Text to Columns |
| Flash Fill | Auto pattern fill | Ctrl + E |
- Data Cleaning is essential before analysis
- Remove Duplicates → clean repeated data
- Text to Columns → split combined data
- Flash Fill → automate formatting
These skills are used in almost every real dataset

