What is Data Transformation? #
Data Transformation is the process of converting data into a suitable format or scale so it can be easily analyzed or used in machine learning models.
Why Data Transformation is Important #
- Makes data consistent and comparable
- Improves model performance
- Helps algorithms work efficiently
- Removes scale differences between features
Normalization #
🔹 What is Normalization? #
Normalization scales data into a fixed range, usually 0 to 1.
🔹 Formula: #
X_norm = (X - Min) / (Max - Min)
🔹 Example: #
Data = [10, 20, 30]
Min = 10, Max = 30 Normalized: 10 → 0 20 → 0.5 30 → 1
🔹 Python Example:
from sklearn.preprocessing import MinMaxScaler import numpy as np data = np.array([[10], [20], [30]]) scaler = MinMaxScaler() normalized = scaler.fit_transform(data) print(normalized)
Standardization #
🔹 What is Standardization? #
Standardization transforms data so that:
- Mean = 0
- Standard Deviation = 1
🔹 Formula: #
Z = (X - Mean) / Standard Deviation
🔹 Example: #
Data = [10, 20, 30]
After standardization → values centered around 0
🔹 Python Example: #
from sklearn.preprocessing import StandardScaler import numpy as np data = np.array([[10], [20], [30]]) scaler = StandardScaler() standardized = scaler.fit_transform(data) print(standardized)
Difference Between Normalization & Standardization
| Feature | Normalization | Standardization |
|---|---|---|
| Range | 0 to 1 | No fixed range |
| Formula | Min-Max | Mean & Std |
| Use Case | When data has known bounds | When data has outliers |
When to Use What? #
- Use Normalization → when data has different scales (e.g., age vs salary)
- Use Standardization → when data has outliers or normal distribution

