· Hernán Pérez Rodal · Engineering  · 6 min read

Offline-first architecture: data capture in rural areas with intermittent connectivity

In Latin America, the first production link typically has 3G signal or worse. We share how we designed Captia to work without connection — and what tradeoffs we accepted with eventual consistency.

In Latin America, the first production link typically has 3G signal or worse. We share how we designed Captia to work without connection — and what tradeoffs we accepted with eventual consistency.

TL;DR — Most apps assume connectivity. In the Latin American field, you can’t. Captia, our mobile data-capture app, is natively offline-first: operators log CTEs without signal and sync later. This post covers how we designed it, what tradeoffs we accepted, and how we resolve conflicts when they appear.

The problem: connectivity is NOT an optional feature

FSMA 204 and EUDR require events to be captured where they happen. For primary producers (field, fishing, cattle), that means:

  • Rural areas with intermittent 3G or no signal
  • Processing plants where coverage dies on certain floors
  • Workers moving between lots/silos/chambers

Asking an operator to “come back when you have signal” to record an event is a guarantee it doesn’t get recorded. In real production, either you capture offline — or you don’t capture.

Darwin tracks millions of events per year, the majority from operations in Latin America. We couldn’t assume connectivity. We designed everything backwards.

Principles we applied

1. Offline is the default, online is a bonus

The app works the same with or without connection. There’s no “offline mode” with reduced functionality. The user doesn’t have to think about network state.

Create event → save local → (when there's network) → sync

The operator never sees “error: no connection”. The app just works.

2. Local database as the temporary source of truth

Every device has a local DB (SQLite via WatermelonDB) which is the source of truth until sync happens. Everything the operator sees comes from there, not from the server.

That changes how you design queries: there is no “GET /events”. There’s “read from my local store”. Sync is a separate background process.

3. Events as first-class citizens

Instead of “form submissions”, we model everything as immutable timestamped events:

type CriticalTrackingEvent = {
  id: string;            // UUID generated locally
  type: 'harvest' | 'processing' | 'shipment' | ...;
  timestamp: Date;       // when it HAPPENED, not when it was recorded
  deviceId: string;
  operatorId: string;
  payload: EventPayload;
  syncStatus: 'pending' | 'synced' | 'conflicted';
}

Events aren’t edited — they’re corrected with new events (event sourcing pattern). This simplifies sync and is consistent with how compliance works (no one “edits” a CTE from the past, a correction is issued).

4. UUIDs generated on the client, not on the server

Each event has a UUID v7 generated on the device before sync. This allows:

  • Referencing the event offline from other events
  • Attaching photos/documents that link by UUID
  • Syncing in any order without dependency issues

The stack

  • Mobile: React Native (Expo) for iOS + Android
  • Local DB: WatermelonDB (SQLite with reactivity)
  • Sync layer: Custom — based on light operational transform (OT), not CRDTs
  • Backend: FastAPI receives event batches, validates, anchors critical events on-chain
  • Firebase (Firestore): for some dynamic metadata (product catalog, rules) that does need near-realtime updates

The real tradeoffs

Eventual consistency, not strong consistency

Two operators on different lots can record events “at the same time” (by their clocks) and the order reaching the server may not respect wall-clock.

For compliance, this is OK because:

  • Each event has a device timestamp (local clock)
  • Each event has a server timestamp when received
  • Traceability reconstructs by lot (not by global order)

Exception: events that require global serialization (e.g., custody transfer) use server-side locking — if there’s no network, the event stays in “lock pending” and completes on sync.

Conflict resolution: last-write-wins with audit

When two devices edit the same resource (rare but happens), we apply last-write-wins based on server receive timestamp. But:

  • Both versions stay recorded in the audit trail
  • The conflict is flagged in the operator UI for human review if it matters
  • For compliance, evidence of both events is preserved

It’s not perfect, but it’s transparent — we don’t silence data, we flag it.

Sensitive data doesn’t go offline

There are things we don’t store on the device:

  • Prices and commercial terms
  • Info about other customers
  • Expired credentials

For those, if there’s no network, the operation is denied. Acceptable trade-off because they’re operational edge cases, not the core capture flow.

Sync protocol: 80% of the work

When we recover connectivity:

  1. Push local → server: batch of pending events, with checksum
  2. Server validates: schema, auth, business rules (atomic batch)
  3. Server responds: { accepted: [...], rejected: [...], conflicts: [...] }
  4. Client reconciles: marks locally as synced, conflict, or retry
  5. Pull server → local: deltas the server has (other devices already synced)

Critical details:

  • Batching by size + time — we group events to cut requests but don’t wait too long
  • Exponential backoff with jitter — if it fails, retry without DDoS-ing the server
  • Checkpoints mid-batch — if the batch fails partway, we resume from the last confirmed event (we don’t resend everything)
  • Client + server observability — logs of when the event occurred, when sync was attempted, when it completed, how long it took

What didn’t work

V0: aggressive auto-sync — app detected signal and synced immediately in the background. Problem: it consumed expensive data plans of rural operators. We switched to on-demand sync + WiFi-only by default with an override option.

Full CRDTs — we evaluated CRDTs for multi-device collab. Over-engineering for our case (most events are single-device, single-user). We stayed with eventual consistency + selective locks.

Long-running offline transactions — some flows tried “atomic multi-event transactions” offline. Result: UI hung if anything failed. We simplified to atomic events per minimal unit.

What did work

WatermelonDB as local DB — excellent performance even with hundreds of thousands of events on device.

UUIDs v7 — naturally sortable by timestamp, collisions impossible in practice, they serve as primary key both local and remote.

Server push notifications when sync is pending — the operator knows they need to open the app and connect to WiFi at the office.

“Mark as synced” local after server confirmation — prevents ghost data (events the server has but the device thinks it still doesn’t).

Client debug tooling — a hidden screen that shows which events are pending, when sync was attempted, what errors came up. Operators and our support team use it daily.

Lessons learned

  1. Offline-first is architecture, not a feature — retrofitting it costs 5x more
  2. Event sourcing simplifies everything — immutables + timestamps + eventual sync
  3. Client-side UUIDs are indispensable for offline
  4. Eventual consistency is acceptable in compliance if your audit trail is explicit
  5. Honest conflict resolution > silent — flag conflicts, don’t hide data

What’s next?

We’re exploring peer-to-peer sync via Bluetooth/local WiFi for when multiple operators are in an internet-less area but close to each other. Cases like offshore fishing plants or producer cooperatives in remote areas.

If you’re building for contexts with fragile connectivity, my advice is: design as if there’s never a connection. What comes over the network later is bonus. The other way around never works.


Do you need data capture in operations with fragile connectivity? Let’s talk — it’s one of our specialties.

Compartir:
Back to Blog

Related Posts

View All Posts »