AI anomaly detection on traceability events: from detection to yield optimization

TL;DR — “AI anomaly detection” sounds like sophisticated deep learning. In practice, the models that move the needle in food production are simple, interpretable, and plugged directly into the operational process. We share how at Darwin we went from “we detect anomalies” to “we optimize yield” — and why the gap between the two is bigger than it seems.

The problem: detected anomalies ≠ captured value

Many platforms promise to “detect anomalies in supply chain with AI”. ML detects outliers — that’s easy. But a detected outlier is not value until:

The operator understands it (it’s not a black box saying “anomaly”)
It arrives on time (an alert 4 hours later is useless)
It translates into a concrete action (not just “red-chart dashboard”)
The impact is measured (how much did you save? how much yield did you gain?)

Most projects stop at step 1 — they detect things but don’t close the loop. At Darwin we built the system end-to-end. Here’s how.

What we detect (and what we don’t)

✅ Anomalies we do detect

Type	Example	Model
Process deviations	Cold chain temperature out of range	Rules + rolling statistics
Lot inconsistencies	Declared weight vs. actual weight at dispatch	Simple regression per product
Potential fraud	Supplier reporting more volume than their historical production capacity	Isolation forest over supplier features
Degraded quality	Customer returns correlate with a specific lot	Correlation + basic causal analysis
Temporal patterns	Yield drops systematically on Mondays after shutdowns	Seasonal decomposition

❌ What we do NOT do (yet)

Demand forecasting — not traceability, we don’t play there
Product computer vision — requires specific hardware; out of scope
End-to-end deep learning — the overhead vs. value doesn’t justify it in this domain

Our rule: an interpretable model the operator understands > a complex model that is “better” on metrics. If a customer can’t explain to their auditor how an anomaly was detected, it doesn’t work for us.

The stack

Data pipeline:

CTE/KDE events coming in via Captia → Pub/Sub → PostgreSQL (raw) + data warehouse (features)
Scheduled feature engineering (every N events or every X minutes)

Models:

Python (scikit-learn, statsmodels) — simple, interpretable models
Prophet — for temporal patterns
Isolation Forest — for multivariate outliers in supplier features

Serving:

Models trained offline, scoring in real time on a dedicated service
Typical scoring: <100ms per event
Rule-based alerts on top (we combine ML + business rules)

Delivery:

Alerts to operators via WhatsApp + email + dashboard
Every alert includes: what was detected, why, which features contributed, suggested action

From outlier to yield: the step nobody talks about

This is where the real work happens. Detecting an outlier is easy. Turning it into value for the customer requires:

1. Actionable alerts, not informational ones

Bad:

“Anomaly detected in lot #12345”

Good:

“Lot #12345 is 7% under expected weight (based on 200 similar lots over the last 90 days). Possible cause: humidity loss from excessive ventilation. Check: compressor #3 (last maintenance 6 months ago) or chamber door (seal).”

The alert includes: what, why, what to check first. The operator acts in minutes, not hours.

2. Operator feedback loop

When the operator resolves an alert, the app asks:

Was it a real anomaly? (model feedback)
What was the root cause?
What action did you take?

Those labels become training data for the next model iteration. Models learn from human decisions, not just historical data.

3. Business metrics, not just ML metrics

The dashboards we show the customer do NOT display “precision: 0.94 / recall: 0.87”. They show:

Preserved yield — tons recovered because the alert came in time
Improved response time — from X hours to Y minutes
Recalls avoided — incidents detected before they reached the consumer

Those are the numbers that get the customer to renew the contract, not the confusion matrix.

4. Causal analysis, not just correlation

Detecting that “yield drops on Mondays” is easy. What’s useful is: why.

We combine:

Event sequence analysis — which events precede the yield drop
Operator feedback — the local knowledge of the plant manager
LLM-assisted root cause — the LLM proposes hypotheses the expert confirms

Result: we don’t just alert, we explain patterns. That turns “detection” into “optimization”.

A real case: yield loss in fruit processing

A customer noticed processing yield (raw material kg → finished product kg) was dropping without clear reason. It varies between 72% and 78%, no obvious pattern.

With Captia + Tracium we already had all the detailed events. We applied:

Feature engineering — ambient temperature, supplier, source lot, shift, operator, time between harvest and processing
Simple regression model — predicts expected yield given known features
Residuals analysis — cases where actual yield differs >3% from expected

Finding: time between harvest and processing was the strongest predictor. Lots that went into processing >48h after harvest had 5-8% lower yield.

Customer action: they reorganized the logistics flow to prioritize older lots. In 3 months:

Average yield +3.2% → $180k/year in savings at this single plant
Actionable alerts when a lot crosses the “risk” threshold
Better supplier negotiation — data to demand faster deliveries

This isn’t sophisticated deep learning. It’s regression + feature engineering + disciplined distribution of the alert. The value is in closing the loop, not in model complexity.

What didn’t work

V0: end-to-end deep learning — we tried LSTMs on time series of events. Precision similar to simple regression, but not interpretable. Operators didn’t trust alerts they couldn’t explain. We dropped it.

Dashboards as main channel — many anomalies sat on the dashboard with no action. We migrated to push notifications (WhatsApp) with explicit call-to-action.

Global per-industry models — a “all fruits” model gave worse precision than product-plant-supplier specific models. We now train specific ones and give clear versioning.

Lessons learned

The gap between detection and value is 80% of the project — not the model
Interpretability > precision in domains with human operators
Feedback loop from day 0 — without operator labels, the model stalls
Business metrics, not ML metrics in reports to the customer
Simple models + domain knowledge > complex models without context

What’s next?

We’re exploring causal AI with tools like DoWhy — not just detecting correlations but inferring real causality of events. Combined with LLMs to formulate hypotheses, it can greatly accelerate root cause analysis.

If you’re building anomaly detection in supply chain and your model detects well but nobody acts on the alerts, the problem isn’t your model — it’s the action loop. Start there.

Do you have a yield, waste or quality loss problem in production? Let’s talk — we can show you real cases with measurable impact.