Handling offline recovery cleanly
Users drop offline mid-session. When they come back, the wrong recovery strategy will deliver the same survey three times. Here is the pattern Stream Sync Engage uses, and the gotchas to avoid.
A live notification engine looks great in a demo, where the user is online, focused, and clicks every CTA. In production, half your users are on flaky transit Wi-Fi, switching apps mid-engagement, or coming back to a tab after the laptop slept for two hours. Recovery is where the real work lives.
What goes wrong by default
When a connection drops, the naive thing to do is queue the next engagement and replay it when the client reconnects. That works once. It breaks the second time the same user reconnects to the same session, because the queued event is still there.
The result: a user sees the onboarding tour every time they open the app for a week.
The pattern we use
Stream Sync Engage treats every engagement as an idempotent delivery keyed by (user, engagement_id, session_window). The first delivery wins. Subsequent reconnects within the session window are no-ops.
Three things make this work:
- A session window, not a global key. A user re-seeing the same survey six months later is fine. Six hours later is not.
- Server-confirmed acknowledgement, not client claim. The client says “I rendered it”; the server records the ack and refuses to re-deliver.
- A bounded replay buffer. We hold undelivered engagements for the session window, then drop them. No infinite queue.
Gotchas
A few things will burn you if you do not plan for them:
- Clock skew. Client timestamps lie. Use server-issued session windows.
- Multi-device sessions. A user on phone + laptop is one user, two clients. Decide whether the engagement is per-user or per-device, and key accordingly.
- Mid-engagement disconnects. If a user dropped halfway through a survey, the ack is only partial. Recovery should resume, not restart, multi-step flows.
When to break the rule
If an engagement is genuinely time-sensitive (a payment confirmation, a security prompt), idempotency by session is too lenient. Use a short window or fire-and-forget with monitoring. Default to the safe pattern, override deliberately.
- #offline
- #recovery
- #delivery
- #idempotency
Found this helpful?
Pass it along to a teammate.