· Hernán Pérez Rodal · Engineering · 5 min read
AI anomaly detection on traceability events: from detection to yield optimization
Detecting anomalies is 20% of the problem. The other 80% is turning alerts into real production savings. We share how we solve it at Darwin — with simple models that move the needle.

TL;DR — “AI anomaly detection” sounds like sophisticated deep learning. In practice, the models that move the needle in food production are simple, interpretable, and plugged directly into the operational process. We share how at Darwin we went from “we detect anomalies” to “we optimize yield” — and why the gap between the two is bigger than it seems.
The problem: detected anomalies ≠ captured value
Many platforms promise to “detect anomalies in supply chain with AI”. ML detects outliers — that’s easy. But a detected outlier is not value until:
- The operator understands it (it’s not a black box saying “anomaly”)
- It arrives on time (an alert 4 hours later is useless)
- It translates into a concrete action (not just “red-chart dashboard”)
- The impact is measured (how much did you save? how much yield did you gain?)
Most projects stop at step 1 — they detect things but don’t close the loop. At Darwin we built the system end-to-end. Here’s how.
What we detect (and what we don’t)
✅ Anomalies we do detect
| Type | Example | Model |
|---|---|---|
| Process deviations | Cold chain temperature out of range | Rules + rolling statistics |
| Lot inconsistencies | Declared weight vs. actual weight at dispatch | Simple regression per product |
| Potential fraud | Supplier reporting more volume than their historical production capacity | Isolation forest over supplier features |
| Degraded quality | Customer returns correlate with a specific lot | Correlation + basic causal analysis |
| Temporal patterns | Yield drops systematically on Mondays after shutdowns | Seasonal decomposition |
❌ What we do NOT do (yet)
- Demand forecasting — not traceability, we don’t play there
- Product computer vision — requires specific hardware; out of scope
- End-to-end deep learning — the overhead vs. value doesn’t justify it in this domain
Our rule: an interpretable model the operator understands > a complex model that is “better” on metrics. If a customer can’t explain to their auditor how an anomaly was detected, it doesn’t work for us.
The stack
Data pipeline:
- CTE/KDE events coming in via Captia → Pub/Sub → PostgreSQL (raw) + data warehouse (features)
- Scheduled feature engineering (every N events or every X minutes)
Models:
- Python (scikit-learn, statsmodels) — simple, interpretable models
- Prophet — for temporal patterns
- Isolation Forest — for multivariate outliers in supplier features
Serving:
- Models trained offline, scoring in real time on a dedicated service
- Typical scoring: <100ms per event
- Rule-based alerts on top (we combine ML + business rules)
Delivery:
- Alerts to operators via WhatsApp + email + dashboard
- Every alert includes: what was detected, why, which features contributed, suggested action
From outlier to yield: the step nobody talks about
This is where the real work happens. Detecting an outlier is easy. Turning it into value for the customer requires:
1. Actionable alerts, not informational ones
Bad:
“Anomaly detected in lot #12345”
Good:
“Lot #12345 is 7% under expected weight (based on 200 similar lots over the last 90 days). Possible cause: humidity loss from excessive ventilation. Check: compressor #3 (last maintenance 6 months ago) or chamber door (seal).”
The alert includes: what, why, what to check first. The operator acts in minutes, not hours.
2. Operator feedback loop
When the operator resolves an alert, the app asks:
- Was it a real anomaly? (model feedback)
- What was the root cause?
- What action did you take?
Those labels become training data for the next model iteration. Models learn from human decisions, not just historical data.
3. Business metrics, not just ML metrics
The dashboards we show the customer do NOT display “precision: 0.94 / recall: 0.87”. They show:
- Preserved yield — tons recovered because the alert came in time
- Improved response time — from X hours to Y minutes
- Recalls avoided — incidents detected before they reached the consumer
Those are the numbers that get the customer to renew the contract, not the confusion matrix.
4. Causal analysis, not just correlation
Detecting that “yield drops on Mondays” is easy. What’s useful is: why.
We combine:
- Event sequence analysis — which events precede the yield drop
- Operator feedback — the local knowledge of the plant manager
- LLM-assisted root cause — the LLM proposes hypotheses the expert confirms
Result: we don’t just alert, we explain patterns. That turns “detection” into “optimization”.
A real case: yield loss in fruit processing
A customer noticed processing yield (raw material kg → finished product kg) was dropping without clear reason. It varies between 72% and 78%, no obvious pattern.
With Captia + Tracium we already had all the detailed events. We applied:
- Feature engineering — ambient temperature, supplier, source lot, shift, operator, time between harvest and processing
- Simple regression model — predicts expected yield given known features
- Residuals analysis — cases where actual yield differs >3% from expected
Finding: time between harvest and processing was the strongest predictor. Lots that went into processing >48h after harvest had 5-8% lower yield.
Customer action: they reorganized the logistics flow to prioritize older lots. In 3 months:
- Average yield +3.2% → $180k/year in savings at this single plant
- Actionable alerts when a lot crosses the “risk” threshold
- Better supplier negotiation — data to demand faster deliveries
This isn’t sophisticated deep learning. It’s regression + feature engineering + disciplined distribution of the alert. The value is in closing the loop, not in model complexity.
What didn’t work
V0: end-to-end deep learning — we tried LSTMs on time series of events. Precision similar to simple regression, but not interpretable. Operators didn’t trust alerts they couldn’t explain. We dropped it.
Dashboards as main channel — many anomalies sat on the dashboard with no action. We migrated to push notifications (WhatsApp) with explicit call-to-action.
Global per-industry models — a “all fruits” model gave worse precision than product-plant-supplier specific models. We now train specific ones and give clear versioning.
Lessons learned
- The gap between detection and value is 80% of the project — not the model
- Interpretability > precision in domains with human operators
- Feedback loop from day 0 — without operator labels, the model stalls
- Business metrics, not ML metrics in reports to the customer
- Simple models + domain knowledge > complex models without context
What’s next?
We’re exploring causal AI with tools like DoWhy — not just detecting correlations but inferring real causality of events. Combined with LLMs to formulate hypotheses, it can greatly accelerate root cause analysis.
If you’re building anomaly detection in supply chain and your model detects well but nobody acts on the alerts, the problem isn’t your model — it’s the action loop. Start there.
Do you have a yield, waste or quality loss problem in production? Let’s talk — we can show you real cases with measurable impact.




