Interpretable ML vs. Black-Box Magic

This autumn I spent 200 hours on wafer anomaly detection with data from an Indian manufacturer. The last part was delivered from Thailand. Enjoyed every moment.

Result? A rule-based interpretable model built from scratch beat all 7 black-box models examined, from Naive Bayes to XGBoost. All were optimized to minimize inspection costs. All trailed the custom system: 72% cost reduction.

Fully interpretable. No hyperparameters. Easily portable. Here's how...

The Challenge

Wafer anomaly detection for an Indian semiconductor manufacturer. 1,558 anonymized features. The standard approach would be to maximize AUC or another technical metric.

The Reframe

Instead of optimizing for technical metrics, I reframed the problem: minimize total inspection cost directly. This meant building a system for the actual business problem, not a proxy metric.

The Work

Built a rule-based system from first principles. Compared it against 7 optimized model types: Random Forest, XGBoost, LightGBM, Naive Bayes variants, Logistic Regression, KNN, and Neural Network.

Results

The rule-based system is superior:

72% cost reduction vs. manual inspection
87% recall, 42% precision, 17% inspection load
Fully interpretable, every flag explained
No hyperparameters, robust to overfitting
Efficient, easy to deploy, easy to adapt
Stable across seeds, folds, and temporal splits
Consistent results on cost sensitivity analysis

Why It Matters

The best model is also the simplest. It was built for the actual business problem. Interpretability isn't a tradeoff. It's a feature.

Does this sounds interesting, impossible, or wrong? Let me know.
quique@databirds.ai