This autumn I spent 200 hours on wafer anomaly detection with data from an Indian manufacturer. The last part was delivered from Thailand. Enjoyed every moment.

Result? A rule-based interpretable model built from scratch beat all 7 black-box models examined, from Naive Bayes to XGBoost. All were optimized to minimize inspection costs. All trailed the custom system: 72% cost reduction.

Fully interpretable. No hyperparameters. Easily portable. Here's how...

The Challenge

Wafer anomaly detection for an Indian semiconductor manufacturer. 1,558 anonymized features. The standard approach would be to maximize AUC or another technical metric.

The Reframe

Instead of optimizing for technical metrics, I reframed the problem: minimize total inspection cost directly. This meant building a system for the actual business problem, not a proxy metric.

The Work

Built a rule-based system from first principles. Compared it against 7 optimized model types: Random Forest, XGBoost, LightGBM, Naive Bayes variants, Logistic Regression, KNN, and Neural Network.

Results

The rule-based system is superior:

  • 72% cost reduction vs. manual inspection
  • 87% recall, 42% precision, 17% inspection load
  • Fully interpretable, every flag explained
  • No hyperparameters, robust to overfitting
  • Efficient, easy to deploy, easy to adapt
  • Stable across seeds, folds, and temporal splits
  • Consistent results on cost sensitivity analysis

Why It Matters

The best model is also the simplest. It was built for the actual business problem. Interpretability isn't a tradeoff. It's a feature.

Does this sounds interesting, impossible, or wrong? Let me know.
quique@databirds.ai

← Back to Blog