A supervised algorithm called Logistic Regression is commonly used to solve classification-based problems in machine learning. Unlike Linear Regression, which produces continuous predicted values, logistic regression will provide a predicted probability that an input belongs to a class.
Logistic regression is best suited for binary classifications with two possible outputs such as Yes/no, True/False or 0/1. To convert the input to a probability between 0 – 1, logistic regression uses the sigmoid function (a mathematical function shaped like an “S”, illustrated below).

Types of Logistic Regression #
There are three major types of logistic regression, each based on the nature of the dependent variable.
1: Binomial logit regression (binomial logistic regression) or binary logit regression where the dependent variable has only two possible outcomes (i.e. yes/no, pass/fail, etc.) This is the most commonly used type of logit regression and usually relates to binary classification problems.
2: Multinomial logit regression – used when the dependent variable has three or more possible outcomes and the outcomes are unordered (i.e. classifying animals into categories such as cat/dog/sheep, etc.). Multinomial logit regression is simply the extension of the type of logistic regression used for binary classification problems to support multiple classifications.
3: Ordinal logit regression – applies when the dependent variable has three or more possible outcomes and the outcomes have a natural hierarchy/ranking (e.g. rating of low/medium/high). Ordinal logit regression considers the ordering of the outcomes when it performs its modeling.
The Implementation Pipeline #
Before we code, let’s look at the logical flow of our system:
Import Libraries
import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns
Load Dataset
df = pd.read_csv("/kaggle/input/datasets/organizations/ailearner-researchlab/iot-intrusion-detection-hybrid-ml-dl-dataset/final_dataset.csv")
df_sample = df.sample(frac=0.1, random_state=42)
df.head()Output :1

Output :2

Target Variable Distribution (Label Balance)
plt.figure(figsize=(8, 5))
sns.countplot(x='Label', hue='Label', data=df_sample, palette='viridis', legend=False)
plt.title('Distribution of Normal (0) vs Attack (1) Traffic')
plt.xlabel('Traffic Type')
plt.ylabel('Count')
plt.show()
Correlation Heatmap
plt.figure(figsize=(16, 10))
# Sirf numeric columns ka correlation nikalna hai
corr_matrix = df_sample.corr()
sns.heatmap(corr_matrix, annot=False, cmap='coolwarm', linewidths=0.5)
plt.title('Feature Correlation Heatmap')
plt.show()
Feature Distribution (Histograms)
features_to_plot = ['Flow Duration', 'Total Fwd Packet', 'Total Bwd packets', 'Flow IAT Min']
df_sample[features_to_plot].hist(bins=30, figsize=(15, 10), color='skyblue', edgecolor='black')
plt.suptitle('Distribution of Key Network Features')
plt.show()
Box Plots (Outliers Detection)
plt.figure(figsize=(12, 6))
sns.boxplot(x='Label', y='Flow Duration', data=df_sample)
plt.yscale('log') # Log scale kyunke flow duration ki values bohat bari ho sakti hain
plt.title('Flow Duration vs Label (Log Scale)')
plt.show()
Data Preprocessing
# 1. Separate Features (X) and Target (y)
X = df.drop('Label', axis=1)
y = df['Label']
# 2. Split data into Training (80%) and Testing (20%) sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# 3. Feature Scaling (Very important for Logistic Regression!)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train
X_test = scaler.transform(X_test)
Train the Logistic Regression Model
# Initialize the model # Using 'sag' or 'saga' solver is better for large datasets (1.2M rows) log_model = LogisticRegression(solver='saga', max_iter=1000) # Train the model log_model.fit(X_train, y_train)

Evaluate the Model
# Make predictions
y_pred = log_model.predict(X_test)
# Print results
print("--- Accuracy Score ---")
print(f"{accuracy_score(y_test, y_pred):.4f}")
print("\n--- Classification Report ---")
print(classification_report(y_test, y_pred))
print("\n--- Confusion Matrix ---")
print(confusion_matrix(y_test, y_pred))--- Accuracy Score ---
0.8855
--- Classification Report ---
precision recall f1-score support
0 0.78 0.78 0.78 61522
1 0.92 0.92 0.92 173523
accuracy 0.89 235045
macro avg 0.85 0.85 0.85 235045
weighted avg 0.89 0.89 0.89 235045
--- Confusion Matrix ---
[[ 48058 13464]
[ 13457 160066]]