View Categories

Data Transformation

What is Data Transformation? #

Data Transformation is the process of converting data into a suitable format or scale so it can be easily analyzed or used in machine learning models.

Why Data Transformation is Important #

  • Makes data consistent and comparable
  • Improves model performance
  • Helps algorithms work efficiently
  • Removes scale differences between features

Normalization #

🔹 What is Normalization? #

Normalization scales data into a fixed range, usually 0 to 1.

🔹 Formula: #

X_norm = (X - Min) / (Max - Min)

🔹 Example: #

Data = [10, 20, 30]

Min = 10, Max = 30

Normalized:
10 → 0  
20 → 0.5  
30 → 1  

🔹 Python Example:

from sklearn.preprocessing import MinMaxScaler
import numpy as np

data = np.array([[10], [20], [30]])

scaler = MinMaxScaler()
normalized = scaler.fit_transform(data)

print(normalized)

Standardization #

🔹 What is Standardization? #

Standardization transforms data so that:

  • Mean = 0
  • Standard Deviation = 1

🔹 Formula: #

Z = (X - Mean) / Standard Deviation

🔹 Example: #

Data = [10, 20, 30]

After standardization → values centered around 0

🔹 Python Example: #

from sklearn.preprocessing import StandardScaler
import numpy as np

data = np.array([[10], [20], [30]])

scaler = StandardScaler()
standardized = scaler.fit_transform(data)

print(standardized)

Difference Between Normalization & Standardization

FeatureNormalizationStandardization
Range0 to 1No fixed range
FormulaMin-MaxMean & Std
Use CaseWhen data has known boundsWhen data has outliers

When to Use What? #

  • Use Normalization → when data has different scales (e.g., age vs salary)
  • Use Standardization → when data has outliers or normal distribution
Data Transformation
💬
AIRA (AI Research Assistant) Neural Learning Interface • Drag & Resize Enabled
×