View Categories

Cross-Validation

Cross Validation in Machine Learning is a technique used to evaluate machine learning models more reliably. #

“You split your data into train, validation, and test. But what if your validation set is just lucky? What if it contains easy examples? What if the test set contains hard ones? Your score will be wrong. K-Fold Cross Validation fixes this by testing your model multiple times on different chunks of data instead of relying on a single split.”

The Problem with Single Validation Set #

A single validation set introduces luck.

ScenarioProblem
Easy validation setModel looks better than it actually is
Hard validation setModel looks worse than it actually is
Unrepresentative validation setYour hyperparameters are wrong for real data

Example: You are building a digit classifier. By random chance, your validation set has mostly the digit “1” (which is easy to classify). Your model scores 98%. But when deployed, it fails on digit “8”. The validation set lied to you.

Solution: Test your model on multiple different validation sets. Average the results.

Cross-Validation

What is Cross-Validation? #

Simple Definition: Split your data into k equal parts. Train on k-1 parts. Validate on the remaining 1 part. Repeat k times. Average the scores.

The K-Fold Process:

Fold 1: [VALID] [TRAIN] [TRAIN] [TRAIN] [TRAIN]  → Score 1
Fold 2: [TRAIN] [VALID] [TRAIN] [TRAIN] [TRAIN]  → Score 2
Fold 3: [TRAIN] [TRAIN] [VALID] [TRAIN] [TRAIN]  → Score 3
Fold 4: [TRAIN] [TRAIN] [TRAIN] [VALID] [TRAIN]  → Score 4
Fold 5: [TRAIN] [TRAIN] [TRAIN] [TRAIN] [VALID]  → Score 5

Final Score = Average(Score 1 + Score 2 + Score 3 + Score 4 + Score 5)

Each data point gets validated exactly once. No data is wasted.

Cross-Validation 2

K-Fold Cross-Validation (Standard) #

How it works:

  1. Shuffle the data randomly
  2. Split into k equal-sized folds
  3. For each fold: Train on k-1 folds, validate on the remaining fold
  4. Record the validation score
  5. After k rounds, compute the average and standard deviation

Choosing k (number of folds):

k valueProsConsWhen to use
k=5Fast, less computationSlightly higher varianceLarge datasets (100k+ samples)
k=10Balanced, standard choiceMore computationMost common default
k=20Low variance, stable estimatesSlowSmall datasets

Default recommendation: k=10

from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()

# 10-fold cross-validation
scores = cross_val_score(model, X, y, cv=10)

print(f"Scores: {scores}")
print(f"Mean: {scores.mean():.4f}")
print(f"Std: {scores.std():.4f}")

Output interpretation:

  • Mean = expected performance on new data
  • Standard deviation = how much performance varies across folds
  • Large std = model is unstable (depends on which data it sees)
Cross-Validation 4

Stratified K-Fold (For Imbalanced Classes) #

The Problem: Regular K-Fold might put all samples of a rare class in one fold.

Example: You have 90% “No Fraud” and 10% “Fraud”. Random split might put all “Fraud” cases in the validation set of Fold 3. That fold will have 0% training fraud and 100% validation fraud. The score will be terrible.

Solution: Stratified K-Fold preserves the class percentage in every fold.

How it works:

  • Each fold has the same % of each class
  • If dataset has 90% No Fraud, 10% Fraud → every fold has 90% No Fraud, 10% Fraud

Comparison:

MethodClass BalanceBest for
Regular K-FoldMay be unbalancedBalanced datasets
Stratified K-FoldMaintains balance in every foldImbalanced datasets
from sklearn.model_selection import StratifiedKFold

skf = StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=skf)

When to use: Always use Stratified K-Fold for classification problems. It is safer. Only use regular K-Fold for regression.

Cross-Validation 6

Leave-One-Out Cross-Validation (LOOCV) #

The Extreme Version: Set k = number of samples. Train on all samples except one. Validate on that one sample. Repeat for every sample.

Example with 100 samples:

  • Train on 99 samples, validate on sample 1
  • Train on 99 samples (different), validate on sample 2
  • Repeat 100 times

Pros and Cons:

AspectRatingExplanation
Bias✅ Very lowAlmost all data used for training each time
Variance❌ HighModels are very similar, scores are correlated
Computation❌ Very slowMust train n models (n = dataset size)
When to use⚠️ RarelyOnly for very small datasets (< 500 samples)
from sklearn.model_selection import LeaveOneOut

loo = LeaveOneOut()
scores = cross_val_score(model, X, y, cv=loo)

Warning: If you have 10,000 samples, LOOCV trains 10,000 models. This could take days. Do not use LOOCV on large datasets.

Cross-Validation 8

Comparison Table #

MethodNumber of ModelsComputationBest For
K-Fold (k=10)10 modelsFastMost problems
Stratified K-Fold10 modelsFastClassification with imbalanced classes
Leave-One-OutN modelsVery SlowTiny datasets (< 500 samples)

When to Use Cross-Validation #

Use CaseDo you need CV?Why
Tuning hyperparameters✅ YesNeed stable estimate of performance
Comparing two models✅ YesNeed to know which is truly better
Small dataset (< 1,000 samples)✅ YesCannot afford a separate validation set
Large dataset (> 100,000 samples)⚠️ MaybeSimple train/val split might be enough
Final test set evaluation❌ NoTest set is only for final check

Quick Quiz #

Q1: Your dataset has 1,000 samples. You run 10-fold CV. How many models do you train? How many samples in each training set?

A1: 10 models. Each training set has 900 samples (90% of 1,000). Each validation set has 100 samples.

Q2: Your dataset has 90% Class A and 10% Class B. You use regular K-Fold. What could go wrong?

A2: Some fold might accidentally get 0% Class B in training. That model will never learn to predict Class B. Use Stratified K-Fold instead.

Q3: You have 200,000 samples. Should you use Leave-One-Out CV?

A3: No. That would train 200,000 models. It would take weeks. Use 5-fold or 10-fold CV instead.

💬
AIRA (AI Research Assistant) Neural Learning Interface • Drag & Resize Enabled
×