An argument about where the dispatcher belongs

“Edge AI” for real time CCTV alerts is mostly a story about the dispatcher, not the detector. If the part that decides who to page lives in the cloud, you do not have an edge alert system. You have a faster forensics tool.

Most pages on this topic talk about edge AI as if “edge” is a property of the model. It is not. Edge is a property of the pipeline. There are four pieces in any real time CCTV alert system: detect, dedup, classify, dispatch. The market is comfortable putting the first one near the camera. The other three usually still live in a cloud control plane, which is exactly where they break the moment a property uplink misbehaves. This guide walks through what each piece does, which ones have to be local for “real time” to mean anything, and what changes when the dispatcher itself runs on the same box as the model.

Matthew Diakonov, Written with AI

Published April 28, 202611 min read

The four pieces of any CCTV alert pipeline

Pulling the pipeline apart the right way is the prerequisite for the rest of this argument. The shape is the same whether the system is built on Verkada, on a smart NVR, on a hand-rolled Frigate stack, or on a sidecar box reading the recorder's HDMI output:

Detect.A model evaluates a frame and produces structured output: a person, a vehicle, a package, with a bounding box and a confidence. This is the part everyone talks about when they say “edge AI.” It is also the easiest part to put on the edge, because it does not need any state.
Dedup. The same intruder walking through four cameras produces four detections. Without dedup, that is four phone calls. Dedup needs short-lived per-zone tracks across the camera grid and a merge rule that respects spatial and temporal continuity. Dedup needs state.
Classify. Every event gets a threat tier (LOW or HIGH on Cyrano) based on zone, time of day, behavior, and context. The package room at 2 a.m. is not the leasing office at 11 a.m. Classification needs zone definitions, schedule rules, and the running track from dedup.
Dispatch. A HIGH event fires an SMS to the on-call manager and triggers an outbound phone call. A LOW event updates the dashboard and rolls into the daily portfolio digest. Dispatch needs the carrier gateway, the on-call rotation, and the tier output from classify.

The interesting question is not which of these the model can do. It is which of these have to live on the same box for the chain to remain intact when the property network is unhappy.

What flows through the box, and what flows out

Where the rest of the market quietly leaves things in the cloud

Read the architecture pages on most edge AI camera products carefully and you will find the same shape: detection on the camera, everything else on the back end. The marketing reads as if “edge AI” means the whole pipeline is at the edge, but the implementation note in the small print says inferences are streamed to a cloud event bus and a server-side rules engine decides what becomes an alert. That is fine for forensics. It is not fine for alerts.

The reason vendors do this is structural. Cloud control planes are easier to build than on-device control planes. Pushing tier rules to a server lets a single team iterate on logic that affects every customer. Per-camera detector chips also have very little memory headroom for the on-device state machines that dedup and classify need. The cloud is where the easy answers go. So that is where they end up.

The cost shows up at exactly the moment you needed the system. A coastal property loses cable for nine minutes during a thunderstorm. A multifamily complex switches to cellular failover and an LTE re-association takes 40 seconds. An ISP rolls out a misconfigured BGP change at 2 a.m. on a Sunday. In all three cases the cameras kept working, the recorder kept recording, and the on-camera detector kept detecting. None of that produced an SMS, because the dispatcher was on the other side of the dead link. The forensic story is intact: tomorrow you can review what happened. The alert story is gone: you did not get the call.

What link-independence actually looks like in practice

On a properly local pipeline, the cameras and the recorder talk to each other on the LAN, the box reads the recorder over HDMI on the same LAN, and the box runs detect, dedup, classify, and dispatch on its own CPU and accelerator. The internet is only on the egress path. SMS, the outbound phone call, the dashboard event, and the digest entry all leave the building, but the decision that triggered them did not.

That has a specific operational consequence. When the uplink drops, detection keeps running. Events keep being generated. The dispatcher keeps writing to the local queue. SMS attempts fail and retry. As the uplink recovers, the queue drains in strict order. The dashboard backfills with correct timestamps. Nothing about the chain notices that the WAN was gone for nine minutes, except that the SMS arrival times are clustered.

The mechanism is intentionally boring. There is one outbox file. Events are appended one JSON line at a time. There is a monotonically increasing counter. A drain worker walks the file forward, posts events, and advances a checkpoint. If the uplink is healthy, the checkpoint is approximately current; if it is degraded, the checkpoint trails. There is no distributed state, no clever queuing, no reordering. Boring queues survive 2 a.m. better than clever ones.

What the local outbox looks like during a 9-minute uplink outage

Stylized to show the shape of the queue and drain. The real file format and field names are similar; the point is the monotonic counter and the strict-order replay.

Why dedup and classify also have to be local

It would be tempting to say only the dispatcher has to be local; ship raw detections to a cloud rules engine, dedup and classify there, and call it a day. That fails for two reasons.

First, dedup needs the running per-zone tracks the model is already maintaining. A cloud rules engine that tries to merge by event timestamp alone ends up either collapsing real distinct incidents or splitting one intruder's walk across four cameras into four phone calls. The merge logic that works uses spatial continuity across the tile grid, which is in-frame information the model already has and the cloud does not. Pushing dedup off the box means re-deriving state the box already knew.

Second, classify is the part the operator is most sensitive to. The difference between “HIGH means SMS plus a phone call” and “HIGH means an unread row in a dashboard” is the difference between an alert system and an inbox. If the classify decision lives in the cloud, then HIGH events have to round-trip to a remote control plane before the dispatcher fires. That round trip is not the bottleneck on a healthy day; it is the bottleneck on a bad day, and bad days are the only ones that matter for an alert pipeline.

The argument simplifies cleanly. Dispatch has to be local because that is where the alert leaves the building. Classify has to be local because dispatch depends on it. Dedup has to be local because classify and operator hygiene both depend on it. Detection has to be local because everything else does. The cloud is for analytics, fleet-wide rule updates, and the things that are allowed to be late.

“At one Class C multifamily property in Fort Worth, Cyrano caught 20 incidents including a break-in attempt in the first month. The property's uplink also flapped twice during that period; the alert chain stayed intact through both. Customer renewed after 30 days.”

Fort Worth, TX deployment, first 30 days

What this looks like to a property manager who never wanted to think about edge AI

None of this matters to the operator at the level of architecture. What matters at the operator level is two things. One: when something HIGH happens, a phone rings, fast, even if the building's internet is having a bad night. Two: when something LOW happens, it is in the morning digest and not on the phone. Everything in this guide is plumbing in service of those two outcomes.

The HDMI ingestion path matters because it lets one box cover up to 25 feeds without a camera-replacement project. The on-device dispatcher matters because it is what makes the “phone rings, fast, even on a bad night” promise honest. The boring outbox matters because it is the difference between a system that drops 0.5 percent of alerts during weekly uplink hiccups and one that does not. None of these are the kind of feature that lives in a sales deck. They are the kind of feature that the operator notices, six months in, when they realize the alert app on their phone still works and they have not muted it.

The pricing follows the same shape: 450 dollars one-time for the device, 200 dollars a month after the first month, no camera replacement, no cloud minutes per feed. For comparison, an overnight security guard runs 3,000 to 5,000 dollars a month per property and one guard cannot watch 25 feeds at once. The alert channel is the part that compounds.

How to test whether a candidate “edge AI” system actually qualifies

The phrase “edge AI” is now in the marketing glossary of every camera vendor. The cheap version is the one this argument has been describing all along: detector on the camera, everything else in the cloud. The honest version puts detect, dedup, classify, and dispatch on the same device. Three concrete questions separate them.

First: pull the property uplink for ten minutes mid-pilot. Trip a known event during the outage. After the uplink returns, confirm an SMS arrives, the dashboard shows the event in the right slot in the timeline, and the event ordering is correct. If the answer is “the SMS never arrived” or “it arrived but timestamps are wrong,” the dispatcher and the queue are not actually local.

Second: walk a tester through four to five cameras in 90 seconds. Confirm that one incident, one phone call, and one thumbnail strip are produced, not four separate phone calls. If the answer is four phone calls, dedup is happening at the notification layer, not at the inference layer, and it is not going to scale to a real property.

Third: ask where threat-tier rules execute. If the answer is “in our cloud rules engine,” HIGH events are paying a WAN round trip on the way to the dispatcher. If the answer is “on the box, with central rule updates pushed nightly,” the chain is honest. The right answer is the second one. Both can be marketed as edge AI; only one keeps its promise on a thunderstorm Sunday.

Walk through the dispatcher question on one of your properties

A 15 minute call. Bring a recorder model and a rough camera count for one property. We will walk through which of the four pieces (detect, dedup, classify, dispatch) live where, and what would change if all four moved onto the box.

Real time CCTV alerts with edge AI: frequently asked questions

Is edge AI for CCTV alerts just 'cloud AI but faster'?

It is not. Most products marketed as edge AI run a detector on a chip near the camera, then ship the bounding box, the timestamp, and a thumbnail to a cloud notification service that decides whether to send the SMS. That is edge inference with cloud alerting, and it breaks the moment the property uplink burps. A real edge AI alert pipeline runs detection, deduplication, threat-tier classification, and the dispatcher on the same device, so the alert chain has no WAN dependency between the door opening and the on-call manager's phone ringing.

Why does the dispatcher matter as much as the model?

Because the dispatcher is the part the operator actually feels. The model decides what is in the frame; the dispatcher decides whether that triggers an SMS, an outbound phone call, both, or just a row in the morning digest. If the dispatcher lives in the cloud and the cloud is unreachable, your detector can be running at 2 millisecond latency and it does not matter, because the SMS never leaves the building. The dispatcher is the line between 'we recorded an incident' and 'somebody got the call.'

What actually happens during a property uplink outage on Cyrano?

The model is on the box and the camera feed never crossed the WAN to begin with, so detection keeps running on every frame. Each event the dispatcher fires gets appended to a local outbox file with a monotonic counter, one line of JSON per event. SMS and outbound call attempts fail and are queued. When the uplink returns, the drain worker walks the outbox forward in strict order and posts the events. The dashboard backfills, the missed SMS sequence catches up, and ordering is preserved. Cloud-only architectures cannot do this, because the inference itself lives on the other side of the dead link.

How fast does an alert actually arrive when everything is healthy?

On a 16 to 25 camera property running on the box, the chain from door opening to phone buzz lands in the 2 to 5 second band. Most of that is the SMS carrier hop, which is the part nobody can collapse. Hop one (camera to recorder) is sub-second on the LAN, hop two (recorder to inference) is a memory copy on the same machine because the box reads the recorder's HDMI tile grid, hop three (the model itself) is 100 to 400 milliseconds per tile on a current edge accelerator, and hop four (the SMS plus an outbound phone call) is 1 to 3 seconds end to end through the carrier gateway.

What does 'edge AI' buy me beyond speed?

Three things, in order of what an operator notices first. One, the alert chain stays whole during a property uplink outage; you do not lose 2 a.m. coverage to a flaky cable modem. Two, the dispatcher can hold tier rules and zone state without round-tripping to a remote control plane, so a HIGH event fires immediately instead of waiting for cloud confirmation. Three, the camera frames never leave the building, which keeps the privacy and bandwidth profile clean and lets the same box scale to 25 feeds without paying per-camera cloud minutes. Speed is real, but it is the third thing that matters, not the first.

Does an edge box still need internet at all?

It needs internet to deliver outbound SMS and phone calls, to push events to the cloud dashboard, and to receive central tier-rule updates. It does not need internet to detect, classify, dedup, or queue. That split is the whole point. Detection without the WAN keeps the recorder honest. The WAN is only on the egress path, and the outbox absorbs egress outages without losing ordering.

Why is dedup part of the edge AI story and not a notification-layer feature?

Because dedup needs the per-zone tracks the model is already maintaining. A naive notification layer that tries to merge by timestamp ends up either over-merging (collapsing two real incidents that happened minutes apart) or under-merging (firing four phone calls when one person walked past four cameras). The merge logic that actually works needs spatial continuity across the tile grid, which is in-frame information the model already has. Push that decision to the cloud and you have already lost the context that makes it correct.

What kind of recorder does this work with?

Anything that drives a wall monitor over HDMI. The Cyrano box reads what the recorder is already drawing on the tile grid for the leasing office monitor, so it does not care whether the cameras are five years old, mixed brands, mixed resolutions, or analog over coax. There is no per-camera RTSP handshake, no codec negotiation, and no auth dance with each camera. If the recorder can render a tile grid, the model can detect on it.

How is this different from putting AI on the camera itself?

Smart cameras run a detector per camera, in isolation. They can tell you a person is in frame on camera 7. They cannot easily tell you that the person on camera 7 is the same person who was on camera 4 eighty seconds ago, because each camera is its own island. A box that ingests the recorder's tile grid sees all the cameras at once, which is what makes cross-tile incident merging tractable. It also keeps the dispatcher unified: one box decides who gets paged, instead of 25 cameras each firing their own opinion at a notification service.

What is the failure mode if the box itself dies?

The recorder keeps recording, because the box never stood between the cameras and the disk. You lose the alert layer until the box is replaced, but you do not lose the forensic capture. That is the inverse of replacing the recorder with a smart NVR, where a dead unit takes both the alert pipeline and the recording with it. Plug-and-play HDMI ingestion means the box is a sidecar, not a chokepoint.