Data Transformation - AI Learner Tech

What is Data Transformation? #

Data Transformation is the process of converting data into a suitable format or scale so it can be easily analyzed or used in machine learning models.

Why Data Transformation is Important #

Makes data consistent and comparable
Improves model performance
Helps algorithms work efficiently
Removes scale differences between features

Normalization #

🔹 What is Normalization? #

Normalization scales data into a fixed range, usually 0 to 1.

🔹 Formula: #

X_norm = (X - Min) / (Max - Min)

🔹 Example: #

Data = [10, 20, 30]

Min = 10, Max = 30

Normalized:
10 → 0  
20 → 0.5  
30 → 1

🔹 Python Example:

from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[10], [20], [30]])

scaler = MinMaxScaler()
normalized = scaler.fit_transform(data)

print(normalized)

Standardization #

🔹 What is Standardization? #

Standardization transforms data so that:

Mean = 0
Standard Deviation = 1

🔹 Formula: #

Z = (X - Mean) / Standard Deviation

🔹 Example: #

Data = [10, 20, 30]

After standardization → values centered around 0

🔹 Python Example: #

from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[10], [20], [30]])

scaler = StandardScaler()
standardized = scaler.fit_transform(data)

print(standardized)

Difference Between Normalization & Standardization

Feature	Normalization	Standardization
Range	0 to 1	No fixed range
Formula	Min-Max	Mean & Std
Use Case	When data has known bounds	When data has outliers

When to Use What? #

Use Normalization → when data has different scales (e.g., age vs salary)
Use Standardization → when data has outliers or normal distribution