What is Feature Engineering? #
Feature Engineering is the process of creating, selecting, and transforming variables (features) from raw data to improve model performance.
Why Feature Engineering is Important #
- Improves model accuracy
- Helps algorithms learn better patterns
- Reduces irrelevant data
- Converts raw data into useful features
Feature Selection #
🔹 What is Feature Selection? #
Selecting the most important features and removing unnecessary ones.
🔹 Why use it? #
- Reduces complexity
- Improves performance
- Saves computation time
🔹 Methods: #
- Filter Method: Select based on correlation
- Wrapper Method: Try different feature combinations
- Embedded Method: Feature selection during model training
🔹 Example: #
import pandas as pd
data = pd.read_csv("data.csv")
# Select only important columns
selected_features = data[["age", "salary"]]
print(selected_features.head())Feature Extraction #
🔹 What is Feature Extraction? #
Creating new features from existing data.
🔹 Examples: #
- Extract year from date
- Create total price = quantity × price
- Convert text into numbers
🔹 Python Example: #
data["total_price"] = data["quantity"] * data["price"] # Extract year data["year"] = pd.to_datetime(data["date"]).dt.yeardata["total_price"] = data["quantity"] * data["price"] # Extract year data["year"] = pd.to_datetime(data["date"]).dt.year
Encoding (Categorical Data) #
🔹 What is Encoding? #
Converting categorical (text) data into numerical form.
🔹 Why needed? #
Machine learning models cannot understand text directly
Types of Encoding #
Label Encoding #
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() data["city"] = le.fit_transform(data["city"])
One-Hot Encoding
data = pd.get_dummies(data, columns=["city"])
| Concept | Definition | Methods / Types | Example |
|---|---|---|---|
| Feature Selection | Choosing the most important features | Filter, Wrapper, Embedded | selected_features = data[["age","salary"]] |
| Feature Extraction | Creating new features from existing data | Mathematical transformations, Date extraction, Text features | data["total_price"] = data["quantity"] * data["price"] |
| Encoding | Converting categorical/text data to numeric | Label Encoding, One-Hot Encoding | data["city"] = LabelEncoder().fit_transform(data["city"]) pd.get_dummies(data, columns=["city"]) |

