A latency budget, hop by hop

How to actually reduce camera notification latency: a hop-by-hop budget for the chain between “person walks past the camera” and “phone buzzes in someone’s pocket.”

Most advice on this stops at “use ethernet, update firmware, lower motion sensitivity.” That advice is not wrong, but it tunes the detector, which is rarely the bottleneck. The real lag lives in the four hops after detection: a cloud event bus, a server-side rules engine, a push fan-out service, and your phone’s battery saver. This guide walks the whole chain in order, names the millisecond cost at each step, and tells you which ones you can shrink and which ones are physics.

Matthew Diakonov, Written with AI

Published May 8, 202610 min read

Direct answer (verified 2026-05-08)

Most camera notification latency is downstream of detection.The model takes 100 to 400 milliseconds to decide there is a person in the frame. The seconds (and sometimes the half-minute or more) come from the cloud event bus, the server-side rules engine, the push fan-out service, and your phone’s battery saver.

To collapse the gap, move the rules engine and the dispatcher onto the same device as the model and use a real outbound phone call (not just a push) for high-priority events. On a properly local pipeline the chain lands in the 2 to 5 second band, with the SMS carrier hop (1 to 3 seconds) as the irreducible floor.

The chain has six hops, not one

The phrase “notification latency” flattens a pipeline that has at least six distinct stops. Each one has its own dominant cost, its own failure modes, and its own knobs. Tuning the wrong one buys you nothing. Below is the chain in order, with the typical millisecond cost when everything is healthy.

The six hops between motion and phone

Camera

frame on bus, <100 ms LAN

Recorder

encode and tile, <300 ms

Inference

model on box, 100-400 ms

Dispatcher

tier rule, <50 ms local

Carrier

SMS plus call, 1-3 s

Phone

OS deliver, 0-30 s+

Add those up and a healthy local pipeline lands at roughly 2 to 5 seconds. A consumer-grade chain that round-trips through a cloud rules engine, a third-party push service, and a doze-batched phone can extend the same chain to 30, 60, or 90 seconds. Same model. Same camera. The difference is which hops live where.

0 msModel upper bound, per tile

0 sSMS carrier hop, the floor

0 sHealthy local total, ceiling

0 feedsOn one box, no replacements

Hop 1: camera to recorder

A frame leaves the image sensor and lands on the recorder. On a wired LAN this is sub-second and effectively free. On a flaky Wi-Fi camera with a weak signal, the same hop can spike to multiple seconds when packets retransmit. This is the one hop where the standard advice (wired ethernet, place the router closer, reduce contention) actually buys you the milliseconds it claims.

If your alerts are arriving 30 seconds late on a wired system, this hop is not your problem. If you are on Wi-Fi and the camera is two walls away from the router, it could easily be hundreds of milliseconds of variable delay. Move it to ethernet and stop tuning.

Hop 2: recorder to inference

This is where most architectures secretly bleed time and most product pages will not tell you. There are three patterns:

Camera-side detector. Each camera runs a small model on its own chip. The detection event is local to that camera, but it then ships out to wherever the rules engine lives. The hop after the model is the one that matters.
Cloud detector. The recorder uploads frames or clips to a cloud GPU pool. The model latency is competitive but you have just paid for an entire upload to start the inference. Add the WAN.
Local sidecar reading the recorder.A box on the same LAN as the recorder reads the recorder’s output (over HDMI, on Cyrano) and runs inference on the same machine. The hop is a memory copy. Cost is on the order of milliseconds, not seconds.

The third pattern is the only one where this hop is essentially free. The first two pay anywhere from a hundred milliseconds to several seconds, plus all the failure modes that come with any link that has to be up and healthy at the moment something happens.

Hop 3: the model

On a current edge accelerator, a person-and-vehicle detector running on an HDMI tile grid takes about 100 to 400 milliseconds per tile, depending on resolution and how many objects are in frame. The variance is in the second number, not the first. A clean tile resolves fast. A tile with five overlapping people in low light takes longer.

This is the hop everyone assumes is the bottleneck and almost never is. Unless the system is running on under-spec hardware, the model is hundreds of milliseconds, not seconds, and tuning it harder mostly buys you false-positive reductions, not raw speed. The wins are on either side.

Hop 4: the dispatcher decision

The model produces a structured event: person, side-gate zone, 0.91 confidence, 02:11:03. Something has to decide whether that becomes an SMS, an outbound phone call, both, or just a row in the morning digest. That “something” is the dispatcher, and where it lives determines whether your alerts are 2 seconds late or 30 seconds late.

On most consumer camera systems and many small business systems, this decision lives in a cloud control plane. Every detection round-trips to a server, the server evaluates rules like “after-hours equals HIGH,” and the dispatcher fires from the cloud. Round trip on a healthy uplink: 200 to 600 milliseconds. Round trip on a saturated upload during a storm: anywhere from a couple of seconds to no answer at all. This is the hop that turns a 3-second chain into a 60-second one when the WAN is unhappy.

The fix is structural, not configuration. The dispatcher needs to be on the same machine as the model. Cyrano keeps detect, dedup, classify, and dispatch on the box itself. The cloud is on the egress path only, used to actually send the SMS, place the call, and update the dashboard. The decision that fired them was already local.

Hop 5: the SMS carrier hop

This is the floor. Once the dispatcher decides to send, the SMS goes out through a carrier gateway and an outbound phone call goes through a voice provider. End to end, both land on the destination phone in 1 to 3 seconds when everything is healthy. Sometimes 4 if the carrier is congested.

You cannot collapse this hop. It is not in your settings menu. It is not in your camera app. It is the cost of running on the public phone network, which is the only network the operator’s phone is reliably reachable on at 2 a.m. The right move is to budget for it and not waste effort trying to optimize underneath it.

Hop 6: the phone OS

Last hop, and the one most people never think about. iOS Low Power Mode and Android battery saver delay non-priority pushes so the radio can sleep. A push that would have arrived in 800 milliseconds can land 30 seconds later, sometimes longer if the phone has been idle. The behavior is documented; it is not a bug. The OS is doing exactly what it was designed to do.

The reliable workaround for high-priority events is a real outbound phone call rather than a push. Phone calls bypass the doze-batching logic because they go through the cellular voice path, not the data push path. This is also why an SMS plus an outbound call is the right pair for HIGH events, and a push alone is fine for LOW events that can wait until the operator wakes the phone. If you are evaluating systems and the only delivery channel is an in-app push, you are downstream of the phone’s mood.

Timing one event end to end on a local pipeline

Stylized to show the shape of the timing budget on a healthy run. Real field numbers vary with carrier load and phone state, but the proportions are what matters: the first 200 ms is everything except the carrier, and the carrier is the rest.

“At one Class C multifamily property in Fort Worth, Cyrano caught 20 incidents including a break-in attempt in the first month. The alert chain stayed under 5 seconds end to end, which is what made the difference between a recorded incident and an interrupted one. Customer renewed after 30 days.”

Fort Worth, TX deployment, first 30 days

What you can change today, in order of impact

Sorted by how much millisecond-or-second time the change actually buys you. Top of the list first.

Move the dispatcher onto the LAN. If your camera vendor evaluates rules in a cloud control plane, every event is paying a WAN round trip on the way to the SMS. The fix is a box on the same network as the recorder that runs detect, dedup, classify, and dispatch in one process. This is the largest single change you can make to your alert latency, and it is the one no settings menu offers.
Add a phone call channel for HIGH events.Pushes and SMS are subject to OS batching and carrier weirdness. A real outbound voice call goes through a different path that the phone’s battery saver does not throttle. The cost is low and the worst-case latency improvement is minutes, not milliseconds.
Move duplicate suppression into the inference layer.If your pipeline merges duplicates in a cloud notification layer, every detection is paying WAN time before the merge decision. Local merging using the model’s per-zone tracks is faster and produces better merges (one walk-through becomes one alert, not four phone calls).
Wire up cameras and recorders, not just suggest it. If the LAN hop is on Wi-Fi, every retransmission costs you. This is the smallest of the four wins on this list, but it is real and free if you have ports.

What is not on this list, despite being the standard advice: lowering motion sensitivity, shortening clip length, factory-resetting the camera, upgrading firmware. Those are useful, but they tune the detector. The detector is rarely the bottleneck. Tuning past it just buys diminishing returns on the cheapest hop in the chain.

How to test where your own lag actually is

One short test, no instrumentation needed. Stand in front of one of your cameras, wave for two seconds, and step out of frame. Note the wall-clock time. Wait for the SMS or push to land on the operator’s phone and note that wall-clock time. Subtract.

Under 5 seconds. You are mostly waiting on physics. The carrier hop and the phone OS are the dominant terms. There is not much left to cut. Stop tuning.
Five to fifteen seconds. A WAN round trip is showing up somewhere, probably the rules engine or the push fan-out. Worth investigating but not catastrophic.
Fifteen to ninety seconds. Cloud round trip plus phone-OS batching. Two separate problems compounding. The dispatcher is in the cloud and your phone is in low power mode. Move the dispatcher local and add an outbound call channel.
Over ninety seconds, or never. The link is failing some events outright. This is no longer a tuning problem; it is a design problem. The alert chain is depending on a cloud round trip that does not always succeed.

Repeat the test once on a healthy day and once on a busy weekday evening (when the property uplink is congested). The gap between the two is the gap that vendors with a cloud dispatcher quietly hide.

Why the structural fix beats every settings-page fix

Most pages on this topic give you a settings checklist. Lower the sensitivity, raise the retrigger window, update firmware, plug in ethernet. Each item buys back a small fraction of the chain. None of them touch the structural problem, which is that the rules engine and the dispatcher are on the wrong side of the WAN. A 50-millisecond improvement on the LAN does not help when the cloud round trip is the dominant term and behaves nondeterministically under load.

The honest answer for property operators with existing CCTV is: keep the cameras and the recorder you have. Add a sidecar box on the same LAN that ingests the recorder’s tile grid, runs the model and the dispatcher locally, and uses the WAN only for egress (SMS, voice call, dashboard sync). That is the architecture that keeps the alert chain under 5 seconds on a good day and under 30 seconds on the worst day. It is also the architecture that survives the property uplink dropping for nine minutes during a storm without losing any events. The settings checklist does not survive that. The structure does.

Walk through your own latency budget on one of your properties

A 15 minute call. Bring one recorder model, a rough camera count, and the wall-clock gap you measured between motion and SMS. We will trace which hop is eating the time and what would change if the dispatcher moved onto a box on your LAN.

Reducing camera notification latency: frequently asked questions

What is a realistic latency target for a camera notification, end to end?

On a healthy local pipeline with the model and dispatcher on the same box as the cameras, the chain from a person crossing the frame to the on-call manager's phone buzzing lands in the 2 to 5 second band. Most of that is the SMS carrier hop, which is 1 to 3 seconds and you cannot collapse from the settings menu. Anything claiming sub-second 'instant' alerts is either timing detection only (not delivery) or assuming the operator has the live app open at that exact moment.

Where is most consumer-camera lag actually hiding?

After detection, not before it. A typical consumer camera detects in well under a second, then ships the event to a cloud event bus, where a server-side rules engine decides whether to fire a push, a third-party push service fans out to your phone, and your phone's battery saver decides whether to deliver it now or batch it. The detection step is fast. The four hops after it are where the seconds (and sometimes minutes) accumulate. Settings pages let you tune the detector. They do not let you change the four hops behind it.

Why does a cloud rules engine add latency I cannot see?

Because it is a WAN round trip on the critical path. The detector says 'person at side gate, 0.91 confidence' and instead of the dispatcher firing locally, that event is shipped to a cloud control plane where rules like 'after-hours zone equals HIGH' are evaluated. If your uplink is slow, congested, or having a bad day, that round trip adds anything from 200 milliseconds to 'the alert never arrived because the link dropped during the round trip.' You cannot fix this in the camera app. The dispatcher has to live on the same box as the model.

Will a wired ethernet connection actually help?

Yes, but the magnitude depends on which hop your system is bottlenecked on. Wired ethernet helps on the camera-to-recorder LAN hop and on the recorder-to-uplink hop if those are over Wi-Fi today. It does nothing for the cloud-rules-engine round trip, the SMS carrier hop, or your phone's push delivery delay. If your alerts are arriving 30 to 90 seconds late, ethernet alone almost certainly will not close the gap. The lag is downstream of the LAN.

What does Apple or Android battery optimization do to push notifications?

It batches them. iOS Low Power Mode and Android battery saver both delay non-priority pushes so the radio can sleep. A push that would have arrived in 800 milliseconds can land 30 seconds later, sometimes longer if the phone has been idle for a while. The fix is a real phone call on HIGH events, not just a push, because phone calls bypass the push-batching logic. That is also why a SMS plus an outbound call beats relying on an in-app notification alone.

How fast is the model itself, on a current edge accelerator?

On a current edge accelerator running detection on an HDMI tile grid, the model takes 100 to 400 milliseconds per tile depending on resolution and how many people are in frame. On a 16-camera tile grid, the box runs that detector across the whole grid every frame interval. The model is rarely the bottleneck. The hops on either side of the model usually are.

Does an edge box still need internet at all to send the alert?

It needs internet to deliver the SMS and the outbound phone call, push to the dashboard, and receive central rule updates. It does not need internet to detect, deduplicate, classify, or queue. The internet is only on the egress path. When the uplink drops, detection keeps running and events queue locally with a monotonic counter. When the uplink returns, the queue drains in order. The alert chain never depends on a cloud round trip to make the dispatch decision.

What does duplicate suppression have to do with latency?

More than people expect. If duplicate suppression happens in a cloud notification layer instead of in the inference layer, every detection has to ride the WAN to the merger before it can be paged. That adds round-trip time to every event, and in the failure case (cloud is slow or unreachable) the merger either over-batches and delays alerts, or under-merges and pages the operator four times for one walk-through. Local dedup using the model's per-zone tracks cuts latency and false multiples in the same step.

Is there a way to test how much of my lag is downstream of the camera?

Yes. Time the detection event in the camera or recorder UI, then time the SMS arrival on the operator's phone, and subtract. If the gap is under 5 seconds you are mostly waiting on physics (carrier hop, push delivery). If the gap is 15 to 90 seconds you are paying for a cloud round trip somewhere and your camera vendor is not telling you about it. Above 90 seconds, your phone is batching pushes or your uplink is failing some events outright. Each of those needs a different fix.

What is the single highest-impact change for an existing CCTV system?

Move the dispatch decision off the cloud and onto a box on the same LAN as the recorder. Whatever the model is, whatever the cameras are, whatever the recorder is. The latency problem is structural, not configuration. Any pipeline where 'should this fire an SMS' is decided behind the WAN is going to be intermittently slow in exactly the conditions (storms, ISP hiccups, congested uplinks) where you needed the alert most.