Support Vector Machine (SVM) Project

Support Vector Machine (SVM) is a powerful supervised machine learning algorithm used for both classification and regression problems. Its primary objective is to identify the most optimal boundary, known as a hyperplane, that separates different classes in a dataset.

SVM is widely used in binary classification tasks, such as:

Spam detection (Spam vs. Not Spam)
Image classification (Cat vs. Dog)

The key idea behind SVM is to maximize the margin, which is the distance between the separating boundary and the nearest data points from each class. A larger margin generally leads to better generalization, meaning the model performs well on unseen data.

Core Concepts of SVM #

1. Hyperplane #

A hyperplane is a decision boundary that divides the feature space into different classes.
In a linear case, it can be expressed as:

wx + b = 0

Where:

w represents the weight vector
b is the bias term

2. Support Vectors #

Support vectors are the data points that lie closest to the hyperplane. These points are critical because they directly influence the position and orientation of the decision boundary.

3. Margin #

The margin is the distance between the hyperplane and the nearest data points (support vectors).
SVM aims to maximize this margin, which improves the model’s robustness and reduces overfitting.

4. Kernel Trick #

Not all data can be separated using a straight line. SVM uses kernel functions to transform data into higher-dimensional space where it becomes easier to separate.

Common kernel types include:

Linear Kernel
Polynomial Kernel
Radial Basis Function (RBF) Kernel
Sigmoid Kernel

5. Hard Margin vs. Soft Margin #

Hard Margin SVM
Assumes data is perfectly separable with no errors. It creates a boundary with zero misclassification but is sensitive to noise.
Soft Margin SVM
Allows some misclassification by introducing flexibility. It is more practical for real-world datasets that contain noise or overlapping classes.

6. Regularization Parameter (C) #

The parameter C controls the trade-off between:

Maximizing the margin
Minimizing classification errors
A large C → fewer misclassifications but smaller margin (risk of overfitting)
A small C → larger margin but more tolerance for errors (better generalization)

7. Hinge Loss Function #

SVM uses hinge loss to measure errors. It penalizes:

Incorrect classifications
Points that lie within the margin

This helps the model focus on difficult or borderline cases.

8. Dual Optimization Problem #

Instead of solving the problem directly, SVM often uses a dual formulation involving Lagrange multipliers.
This approach:

Makes computation more efficient
Enables the use of kernel functions

How Does the Support Vector Machine (SVM) Algorithm Work? #

The fundamental idea behind the Support Vector Machine (SVM) algorithm is to determine a decision boundary (hyperplane) that separates data points belonging to different classes in the most optimal way.

Instead of just finding any boundary, SVM focuses on selecting the one that creates the maximum possible margin between the classes. The margin is defined as the distance between the hyperplane and the closest data points from each class, which are known as support vectors.

A larger margin generally leads to better model performance because it improves the model’s ability to generalize to new, unseen data.

Choosing the Best Hyperplane #

In many cases, multiple hyperplanes can separate two classes. However, SVM selects the optimal hyperplane—the one that maximizes the margin between the classes.

When the data is perfectly separable, this optimal boundary is called a hard margin hyperplane. It ensures:

No misclassification of data points
Maximum separation between classes

For example, if several possible lines (such as L1, L2, and L3) can divide the dataset, SVM will choose the one with the largest distance from the nearest points of both classes (e.g., L2).

Implementing Support Vector Machine (SVM) #

Before we code, let’s look at the logical flow of our system:

Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, ConfusionMatrixDisplay

# Set visual style
sns.set_theme(style="whitegrid")

Load Dataset

# Load the dataset
df = pd.read_csv('/kaggle/input/datasets/organizations/ailearner-researchlab/iot-intrusion-detection-hybrid-ml-dl-dataset/final_dataset.csv')

Show First 10 Columns Entries

df.head(10)

Distribution of Normal (0) vs Attack (1) Traffic

# Visualize Class Distribution
plt.figure(figsize=(6, 4))
sns.countplot(x='Label', data=df)
plt.title('Distribution of Normal (0) vs Attack (1) Traffic')
plt.show()

Flow Duration

plt.figure(figsize=(12, 7))
sns.scatterplot(
    data=df, 
    x='Flow Duration', 
    y='Total Length of Fwd Packet', 
    hue='Label', 
    style='Label', 
    alpha=0.5,
    palette='viridis',
    legend=False   
)

Feature Correlation Heatmap for IoT Security

# 2. Correlation Matrix calculate karein
# Yeh check karta hai ke columns aapas mein kitne milti-julti hain
corr = df.corr()

# 3. Plot ka size set karein
plt.figure(figsize=(15, 10))

# 4. Mask taiyar karein (Sirf niche wala triangle dikhane ke liye, taaki saaf dikhe)
mask = np.triu(np.ones_like(corr, dtype=bool))

# 5. Heatmap create karein
heatmap = sns.heatmap(
    corr, 
    mask=mask, 
    annot=True,          # Har box mein number dikhane ke liye
    fmt=".2f",           # Decimal ke baad 2 numbers
    cmap='RdBu_r',       # Red (High correlation) se Blue (Low correlation) ka rang
    center=0,            # Zero ko center maan kar color distribution
    linewidths=.5,       # Boxes ke beech ki line
    cbar_kws={"shrink": .8} # Color bar ka size
)

# 6. Labels aur Title
plt.title('Feature Correlation Heatmap for IoT Security', fontsize=16)
plt.xticks(rotation=45, ha='right')
plt.show()

Distribution of Normal vs Attack Traffic

plt.figure(figsize=(8, 6))

ax = sns.countplot(
    x='Label', 
    data=df, 
    hue='Label',           
    palette='viridis',
    legend=False        
)

# Annotate values on bars
for p in ax.patches:
    ax.annotate(
        f'{int(p.get_height())}', 
        (p.get_x() + p.get_width()/2., p.get_height()), 
        ha='center', va='bottom', fontsize=12
    )

plt.title('Distribution of Normal vs Attack Traffic', fontsize=15)
plt.xlabel('Traffic Type (0: Normal, 1: Attack)', fontsize=12)
plt.ylabel('Number of Samples', fontsize=12)

plt.show()

Histogram: Distribution of Flow Duration

# 2. Figure ka size set karein
plt.figure(figsize=(10, 6))

# 3. Histogram banayein
# 'kde=True' ka matlab hai Kernel Density Estimate line (smooth curve) dikhana
sns.histplot(data=df, x='Flow Duration', bins=50, kde=True, color='teal')

# 4. Title aur labels set karein
plt.title('Histogram: Distribution of Flow Duration', fontsize=15)
plt.xlabel('Flow Duration (μs)', fontsize=12)
plt.ylabel('Frequency (Count)', fontsize=12)

# 5. Graph dikhayein
plt.show()

Training Model

# Define Features (X) and Target (y)
X = df.drop('Label', axis=1)
y = df['Label']

# Split the data (80% Training, 20% Testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Train the model

# Initialize the Linear SVM model
# 'dual=False' is preferred when n_samples > n_features
svm_model = LinearSVC(dual=False, C=1.0, random_state=42)

# Train the model
print("Training the SVM model... this may take a moment.")
svm_model.fit(X_train, y_train)

# Make predictions
y_pred = svm_model.predict(X_test)

# Print Evaluation Report
print("--- Classification Report ---")
print(classification_report(y_test, y_pred))

print(f"Accuracy Score: {accuracy_score(y_test, y_pred):.4f}")s

--- Classification Report ---
              precision    recall  f1-score   support

           0       0.79      0.78      0.78     61814
           1       0.92      0.92      0.92    173231

    accuracy                           0.89    235045
   macro avg       0.85      0.85      0.85    235045
weighted avg       0.89      0.89      0.89    235045

Accuracy Score: 0.8867

Evaluate the Model

# Generate Confusion Matrix
cm = confusion_matrix(y_test, y_pred)

# Plotting the Matrix
plt.figure(figsize=(8, 6))
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Normal', 'Attack'])
disp.plot(cmap='Blues', values_format='d')
plt.title('SVM Intrusion Detection: Confusion Matrix')
plt.grid(False) # Clean up the layout
plt.show()

Support Vector Machine (SVM)Decision Tree