View Categories

Learning Curves

“Your model is performing badly. But why? Is it too simple? Too complex? Not enough data? More training needed? Learning curves answer all these questions with just two lines on a graph.”

What Are Learning Curves? #

Two lines plotted against training experience.

X-axis: Training set size (or number of training iterations)

Y-axis: Error (or accuracy)

Two lines:

  • Blue line = Performance on training data
  • Orange line = Performance on validation data

That is it. Two lines. They tell you everything wrong with your model.

Learning Curves

The Three Patterns #

Pattern 1: High Bias (Underfitting) #

What you see:

  • Training error is high
  • Validation error is also high
  • Both lines are close to each other
  • Neither line reaches good performance

Why this happens:
Model is too simple. Like trying to fit a straight line through a circle. No matter how much data you give, it will never learn.

The fix:

  • Use a more complex model
  • Add more features
  • Reduce regularization
Learning Curves 2

Pattern 2: High Variance (Overfitting) #

What you see:

  • Training error is very low (close to zero)
  • Validation error is high
  • Large gap between the two lines
  • Gap gets wider as training progresses

Why this happens:
Model is too complex. It memorizes the training data but cannot generalize to new data.

The fix:

  • Get more training data
  • Simplify the model
  • Add regularization
  • Early stopping
Learning Curves 4

Pattern 3: Good Fit #

What you see:

  • Training error is low
  • Validation error is also low
  • Small gap between lines
  • Both lines flatten at good values

This is what you want.

Learning Curves 6

What Learning Curves Tell You #

ProblemTraining ErrorValidation ErrorGap
High Bias (Underfitting)HighHighSmall
High Variance (Overfitting)Very LowHighLarge
Good FitLowLowSmall

One look at the graph. Instant diagnosis.

More Data Will Help Only One Problem #

If you have High Variance (Overfitting):
More data WILL help. Each new example makes it harder to memorize. Model forced to generalize.

If you have High Bias (Underfitting):
More data will NOT help. Model is too simple. It will never learn the pattern. You need a better model first.

The Ideal Learning Curve (What You Want) #

As you add more training data:

  1. Training error starts low and stays low
  2. Validation error starts higher but drops
  3. Both lines converge to similar low error
  4. Gap becomes small

If you see this, stop. Your model is ready.

Real Example #

You are building a house price predictor.

First attempt (Linear Regression):

  • Training error: $80k
  • Validation error: $85k
    → High Bias. Too simple.

Second attempt (Deep Neural Network with 10 layers):

  • Training error: $5k
  • Validation error: $70k
    → High Variance. Overfitting.

Third attempt (Random Forest with 100 trees):

  • Training error: $35k
  • Validation error: $38k
    → Good fit. Ready.
Learning Curves 8

Code (Scikit-learn)

from sklearn.model_selection import learning_curve

train_sizes, train_scores, val_scores = learning_curve(
    model, X, y, cv=5, 
    train_sizes=np.linspace(0.1, 1.0, 10)
)

When you see a problem, ask:

QuestionIf YesDiagnosis
Both errors high?YesUnderfitting (High Bias)
Large gap between lines?YesOverfitting (High Variance)
Gap shrinking as data increases?YesNeed more data
Gap not shrinking?YesModel too simple. Change model.
Validation error going up?YesOverfitting severely. Stop training.

Quick Quiz #

Q1: Your training error is 10%. Validation error is 50%. What is the problem?

A1: High Variance (Overfitting). Model memorized training but fails on validation.

Q2: Both errors are 45% and close together. What is the problem?

A2: High Bias (Underfitting). Model too simple. Need more complexity.

Q3: You add more data. Validation error drops from 50% to 30%. What does this tell you?

A3: You had overfitting. More data helped. Keep adding data.

Q4: You add more data. Both errors stay at 45%. What does this tell you?

A4: You have underfitting. More data will not help. Change the model.

Key Takeaways (5 Lines) #

  1. Learning curves = Two lines (training error, validation error) vs training size.
  2. High Bias (Underfitting) = Both errors high and close. Need stronger model.
  3. High Variance (Overfitting) = Large gap between errors. Need more data or simpler model.
  4. More data helps overfitting but does NOT help underfitting.
  5. Good fit = Both errors low and close together.

💬
AIRA (AI Research Assistant) Neural Learning Interface • Drag & Resize Enabled
×