· Hernán Pérez Rodal · Engineering  · 5 min read

AI anomaly detection on traceability events: from detection to yield optimization

Detecting anomalies is 20% of the problem. The other 80% is turning alerts into real production savings. We share how we solve it at Darwin — with simple models that move the needle.

Detecting anomalies is 20% of the problem. The other 80% is turning alerts into real production savings. We share how we solve it at Darwin — with simple models that move the needle.

TL;DR — “AI anomaly detection” sounds like sophisticated deep learning. In practice, the models that move the needle in food production are simple, interpretable, and plugged directly into the operational process. We share how at Darwin we went from “we detect anomalies” to “we optimize yield” — and why the gap between the two is bigger than it seems.

The problem: detected anomalies ≠ captured value

Many platforms promise to “detect anomalies in supply chain with AI”. ML detects outliers — that’s easy. But a detected outlier is not value until:

  1. The operator understands it (it’s not a black box saying “anomaly”)
  2. It arrives on time (an alert 4 hours later is useless)
  3. It translates into a concrete action (not just “red-chart dashboard”)
  4. The impact is measured (how much did you save? how much yield did you gain?)

Most projects stop at step 1 — they detect things but don’t close the loop. At Darwin we built the system end-to-end. Here’s how.

What we detect (and what we don’t)

✅ Anomalies we do detect

TypeExampleModel
Process deviationsCold chain temperature out of rangeRules + rolling statistics
Lot inconsistenciesDeclared weight vs. actual weight at dispatchSimple regression per product
Potential fraudSupplier reporting more volume than their historical production capacityIsolation forest over supplier features
Degraded qualityCustomer returns correlate with a specific lotCorrelation + basic causal analysis
Temporal patternsYield drops systematically on Mondays after shutdownsSeasonal decomposition

❌ What we do NOT do (yet)

  • Demand forecasting — not traceability, we don’t play there
  • Product computer vision — requires specific hardware; out of scope
  • End-to-end deep learning — the overhead vs. value doesn’t justify it in this domain

Our rule: an interpretable model the operator understands > a complex model that is “better” on metrics. If a customer can’t explain to their auditor how an anomaly was detected, it doesn’t work for us.

The stack

Data pipeline:

  • CTE/KDE events coming in via Captia → Pub/Sub → PostgreSQL (raw) + data warehouse (features)
  • Scheduled feature engineering (every N events or every X minutes)

Models:

  • Python (scikit-learn, statsmodels) — simple, interpretable models
  • Prophet — for temporal patterns
  • Isolation Forest — for multivariate outliers in supplier features

Serving:

  • Models trained offline, scoring in real time on a dedicated service
  • Typical scoring: <100ms per event
  • Rule-based alerts on top (we combine ML + business rules)

Delivery:

  • Alerts to operators via WhatsApp + email + dashboard
  • Every alert includes: what was detected, why, which features contributed, suggested action

From outlier to yield: the step nobody talks about

This is where the real work happens. Detecting an outlier is easy. Turning it into value for the customer requires:

1. Actionable alerts, not informational ones

Bad:

“Anomaly detected in lot #12345”

Good:

“Lot #12345 is 7% under expected weight (based on 200 similar lots over the last 90 days). Possible cause: humidity loss from excessive ventilation. Check: compressor #3 (last maintenance 6 months ago) or chamber door (seal).”

The alert includes: what, why, what to check first. The operator acts in minutes, not hours.

2. Operator feedback loop

When the operator resolves an alert, the app asks:

  • Was it a real anomaly? (model feedback)
  • What was the root cause?
  • What action did you take?

Those labels become training data for the next model iteration. Models learn from human decisions, not just historical data.

3. Business metrics, not just ML metrics

The dashboards we show the customer do NOT display “precision: 0.94 / recall: 0.87”. They show:

  • Preserved yield — tons recovered because the alert came in time
  • Improved response time — from X hours to Y minutes
  • Recalls avoided — incidents detected before they reached the consumer

Those are the numbers that get the customer to renew the contract, not the confusion matrix.

4. Causal analysis, not just correlation

Detecting that “yield drops on Mondays” is easy. What’s useful is: why.

We combine:

  • Event sequence analysis — which events precede the yield drop
  • Operator feedback — the local knowledge of the plant manager
  • LLM-assisted root cause — the LLM proposes hypotheses the expert confirms

Result: we don’t just alert, we explain patterns. That turns “detection” into “optimization”.

A real case: yield loss in fruit processing

A customer noticed processing yield (raw material kg → finished product kg) was dropping without clear reason. It varies between 72% and 78%, no obvious pattern.

With Captia + Tracium we already had all the detailed events. We applied:

  1. Feature engineering — ambient temperature, supplier, source lot, shift, operator, time between harvest and processing
  2. Simple regression model — predicts expected yield given known features
  3. Residuals analysis — cases where actual yield differs >3% from expected

Finding: time between harvest and processing was the strongest predictor. Lots that went into processing >48h after harvest had 5-8% lower yield.

Customer action: they reorganized the logistics flow to prioritize older lots. In 3 months:

  • Average yield +3.2% → $180k/year in savings at this single plant
  • Actionable alerts when a lot crosses the “risk” threshold
  • Better supplier negotiation — data to demand faster deliveries

This isn’t sophisticated deep learning. It’s regression + feature engineering + disciplined distribution of the alert. The value is in closing the loop, not in model complexity.

What didn’t work

V0: end-to-end deep learning — we tried LSTMs on time series of events. Precision similar to simple regression, but not interpretable. Operators didn’t trust alerts they couldn’t explain. We dropped it.

Dashboards as main channel — many anomalies sat on the dashboard with no action. We migrated to push notifications (WhatsApp) with explicit call-to-action.

Global per-industry models — a “all fruits” model gave worse precision than product-plant-supplier specific models. We now train specific ones and give clear versioning.

Lessons learned

  1. The gap between detection and value is 80% of the project — not the model
  2. Interpretability > precision in domains with human operators
  3. Feedback loop from day 0 — without operator labels, the model stalls
  4. Business metrics, not ML metrics in reports to the customer
  5. Simple models + domain knowledge > complex models without context

What’s next?

We’re exploring causal AI with tools like DoWhy — not just detecting correlations but inferring real causality of events. Combined with LLMs to formulate hypotheses, it can greatly accelerate root cause analysis.

If you’re building anomaly detection in supply chain and your model detects well but nobody acts on the alerts, the problem isn’t your model — it’s the action loop. Start there.


Do you have a yield, waste or quality loss problem in production? Let’s talk — we can show you real cases with measurable impact.

Compartir:
Back to Blog

Related Posts

View All Posts »