What Is Overfitting?
Overfitting occurs when a machine learning model learns the training data too well — including its noise, outliers, and random fluctuations — rather than learning the underlying generalizable patterns. An overfit model performs excellently on training data but poorly on new, unseen data.
The Analogy
Imagine a student who memorizes every answer in a textbook word-for-word but can't solve a problem phrased differently. That student has "overfit" to the textbook — they've memorized examples instead of understanding concepts.
How to Detect Overfitting
The clearest signal is a gap between training and validation performance:
| Metric | Training Set | Validation Set | Diagnosis |
|---|---|---|---|
| High accuracy | High accuracy | ✅ Good fit | |
| High accuracy | Low accuracy | ⚠️ Overfitting | |
| Low accuracy | Low accuracy | ⚠️ Underfitting |
Visual Indicators
- Learning curves — If training loss keeps decreasing while validation loss starts increasing, the model is overfitting
- Complexity vs. performance — If adding more model capacity doesn't improve validation performance, overfitting has begun
Common Causes
- Too little training data — The model doesn't see enough examples to learn general patterns
- Too complex model — A model with too many parameters relative to the data size can memorize examples
- Training too long — With enough iterations, even a well-sized model will start memorizing
- Noisy data — Errors and outliers in training data are learned as patterns
- Irrelevant features — Features that don't carry useful signal add noise
Prevention Techniques
1. Regularization
Add a penalty term to the loss function that discourages large weights:
- L1 (Lasso) — Pushes some weights to exactly zero (feature selection)
- L2 (Ridge) — Pushes weights toward zero without eliminating them
- Elastic Net — Combination of L1 and L2
2. Dropout
Randomly deactivate a percentage of neurons during each training step, forcing the network to learn redundant representations. Typically set to 20–50%.
3. Early Stopping
Monitor validation loss during training and stop when it begins to increase, even if training loss continues to decrease.
4. Data Augmentation
Artificially expand the training dataset by applying transformations:
- Images: rotation, flipping, cropping, color jittering
- Text: synonym replacement, back-translation
- Time series: noise injection, time warping
5. Cross-Validation
Use k-fold cross-validation to get a more robust estimate of model performance and detect overfitting earlier.
6. Simplify the Model
Reduce the number of layers, neurons, or features. Sometimes a simpler model generalizes better.
7. Collect More Data
More diverse training data helps the model learn general patterns rather than memorizing specific examples.
Overfitting vs. Underfitting
- Overfitting (high variance): Model is too complex for the data; captures noise as signal
- Underfitting (high bias): Model is too simple for the data; misses important patterns
- Good fit: Model captures the underlying patterns without memorizing noise
The goal is to find the optimal balance — the bias-variance tradeoff.
ELMs and Overfitting
Extreme Learning Machines have a natural relationship with overfitting:
- Fewer hyperparameters — Less room for overfitting through misconfiguration
- Regularization built-in — The pseudoinverse computation inherently provides regularization
- Fast retraining — Quick experimentation with different hidden node counts to find the optimal complexity