View Categories

Feature Engineering

What is Feature Engineering? #

Feature Engineering is the process of creating, selecting, and transforming variables (features) from raw data to improve model performance.

Why Feature Engineering is Important #

  • Improves model accuracy
  • Helps algorithms learn better patterns
  • Reduces irrelevant data
  • Converts raw data into useful features

Feature Selection #

🔹 What is Feature Selection? #

Selecting the most important features and removing unnecessary ones.

🔹 Why use it? #

  • Reduces complexity
  • Improves performance
  • Saves computation time

🔹 Methods: #

  • Filter Method: Select based on correlation
  • Wrapper Method: Try different feature combinations
  • Embedded Method: Feature selection during model training

🔹 Example: #

import pandas as pd

data = pd.read_csv("data.csv")

# Select only important columns
selected_features = data[["age", "salary"]]
print(selected_features.head())

Feature Extraction #

🔹 What is Feature Extraction? #

Creating new features from existing data.

🔹 Examples: #

  • Extract year from date
  • Create total price = quantity × price
  • Convert text into numbers

🔹 Python Example: #

data["total_price"] = data["quantity"] * data["price"]

# Extract year
data["year"] = pd.to_datetime(data["date"]).dt.yeardata["total_price"] = data["quantity"] * data["price"]

# Extract year
data["year"] = pd.to_datetime(data["date"]).dt.year

Encoding (Categorical Data) #

🔹 What is Encoding? #

Converting categorical (text) data into numerical form.

🔹 Why needed? #

Machine learning models cannot understand text directly

Types of Encoding #

Label Encoding #

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
data["city"] = le.fit_transform(data["city"])

One-Hot Encoding

data = pd.get_dummies(data, columns=["city"])

ConceptDefinitionMethods / TypesExample
Feature SelectionChoosing the most important featuresFilter, Wrapper, Embeddedselected_features = data[["age","salary"]]
Feature ExtractionCreating new features from existing dataMathematical transformations, Date extraction, Text featuresdata["total_price"] = data["quantity"] * data["price"]
EncodingConverting categorical/text data to numericLabel Encoding, One-Hot Encodingdata["city"] = LabelEncoder().fit_transform(data["city"]) pd.get_dummies(data, columns=["city"])
Feature Engineering

💬
AIRA (AI Research Assistant) Neural Learning Interface • Drag & Resize Enabled
×