The latency budget on a 25 camera property

Real time CCTV alerts is not a feature. It is a budget. Where the seconds go decides whether your team gets the buzz in 3 seconds or 30.

Every product page on this topic uses the phrase the same way: cameras detect, you get an alert, done. That framing skips the question that actually matters once a property is live. The wait between the door opening and the manager's phone buzzing is a sum of four hops, and one of them owns most of the variance. This page walks through the hops in order, shows why an on-prem box reading the recorder's HDMI output collapses the slow one, and explains why a LOW vs HIGH threat tier is the part that keeps the alert channel useful past week two.

Matthew Diakonov, Written with AI

Published April 28, 202610 min read

4.9from 50+ properties

Per-tile inference on the recorder's HDMI output, up to 25 feeds in parallel

On-device LOW vs HIGH threat classification before dispatch

HIGH events: SMS plus a phone call. LOW events: dashboard only.

Outbox keeps events ordered through a property uplink outage

What “real time” means as a number

For an operator who has to act (call a guard, dispatch on-site staff, decide whether to call police), the working definition of real time is anything under 5 seconds. Anything over 30 seconds is a forensic clip with notification chrome on top, not an alert. Most of the spread between those two numbers comes from a single hop, and the rest of this page is about which hop and how to collapse it.

The total budget on a CCTV alert has four parts:

Camera to recorder. Sub-second on a healthy property LAN. PoE camera, switch, recorder. Already paid for, already in place.
Recorder to inference. This is where the variance lives. On a cloud architecture, the frame has to leave the building over the property uplink. On an on-prem architecture, the frame travels a few milliseconds to a box on the same LAN segment.
Inference itself.100 to 400 ms per frame on a current edge accelerator running a detection model on a single tile. Multiplied by tile count if the box is sweeping across the recorder's grid, but parallelized across the inference engine's batch.
Notification.1 to 3 seconds for an SMS that crosses a carrier gateway. Longer for email, longer still for an in-app push if the operator's phone is asleep.

Hops one, three, and four are roughly fixed. Hop two is where you live or die. A property on a clean fiber uplink to a cloud inference cluster spends 1 to 3 seconds there. A property on a typical business cable uplink with peak-hour congestion spends 8 to 20 seconds there. A property on a cellular failover spends a coin flip. Real time is whatever survives that hop.

The alert hops, on prem vs cloud

Why HDMI ingestion is the trick that collapses the slow hop

The Cyrano box does not pull RTSP from each camera. It reads the HDMI output of the recorder, which is the same image the wall monitor in the leasing office is already showing. That sounds like a cosmetic detail. It changes the latency budget materially.

By reading HDMI, the box gets the frame after the recorder has already decoded, demuxed, and laid out a tile grid. There is no per-camera RTSP handshake, no codec negotiation, no auth dance with each camera, and no dependency on whether the camera vendor exposed a clean stream. The model sees what the wall monitor sees, in the same instant. The hop from frame-on-glass to model-input is a memory copy on the same machine.

One Cyrano unit handles up to 25 camera feeds because that is the size of the tile grid a typical multifamily recorder draws. The box runs detection per tile in parallel using the edge accelerator's batch dimension. The inference cost scales with tile count, not with the round-trip count, which is the part that would have killed a per-camera RTSP poller.

The two alert chains, side by side

Toggle the chain to compare. Same incident, same property, same recorder. The only thing that changes is where the model runs.

Same incident, two architectures

A break-in attempt at a side door at 02:11. Recorder writes the frame to disk. Frame is queued for upload to a cloud inference cluster across the property uplink. The uplink is shared with resident traffic and is mid-burst, so the frame queues for 9.4 seconds. Cloud model evaluates, decides this is suspicious, fires a push notification through a third-party push service. Notification arrives on the on-call manager's phone 14.6 seconds after the door opened. By then the second person is already inside.

Frame upload contended for the property uplink
Push notification depends on third-party service availability
Detection silently stops if the uplink drops
No way to dispatch a phone call separately from the SMS

“At one Class C multifamily property in Fort Worth, Cyrano caught 20 incidents including a break-in attempt in the first month. Customer renewed after 30 days.”

Fort Worth, TX property deployment

The de-duplication problem nobody talks about

A single intruder does not produce a single detection. They walk through the parking lot, pass under the breezeway camera, cross the courtyard, and arrive at the package room. That is four cameras and four detections. A naive alert pipeline buzzes the on-call manager four times in 90 seconds, and the manager assumes the system is broken. By incident three the next week the alerts get muted.

The fix is not at the notification layer; it is at the inference layer. The edge box keeps short-lived per-zone tracks and merges detections that share spatial and temporal continuity into a single incident. The first triggering frame fires the alert; subsequent detections that are clearly the same person are appended to the existing incident as a thumbnail strip showing the path through the property.

What the operator gets is one phone call, with a strip of four thumbnails showing the path. What the operator does not get is four phone calls in 90 seconds. The difference between those two outcomes is whether the operator still uses the alert channel at month three.

Where each event goes, by tier

Tiering is the second half of why this works. Every detection gets a LOW or HIGH classification on-device based on zone, time of day, behavior, and context. The dispatcher then routes by tier. The diagram below shows the routing.

Per-event dispatch, by threat tier

Detection on-device

Per-tile inference on the recorder's HDMI grid

Classify

Zone + time + behavior + context

LOW threat

Dashboard only; rolls into the daily digest

HIGH threat

Immediate SMS plus an outbound phone call

Why untiered alerts get muted by week two

Most camera systems treat the alert channel as a stream, not a budget. Every detection that crosses a motion threshold gets the same notification weight, which means the channel fills with leaves, headlights, residents walking dogs, and maintenance workers. An operator who gets 40 alerts a day, of which 38 are noise, mutes the app inside two weeks. From that point on, the camera system is back to being a forensic recorder, and the property is back to finding out about incidents three days later.

The fix is not more sensitivity tuning. It is a hard contract on what is allowed to ring a phone. On Cyrano, the LOW tier never rings; it shows up in the dashboard for review and rolls into the daily portfolio digest. The HIGH tier always rings, twice (an SMS and an outbound call). An operator who answers a HIGH alert is answering something that the on-device classifier already decided needed a human. That is what keeps the channel useful at month six.

The other half of the fix is portfolio-level: the daily digest gives the regional or asset manager the LOW-tier picture across all properties. They do not need to be in the loop on every loitering event at every property; they need to see patterns. Tiering produces the data shape that supports both: the on-call manager gets the urgent stuff, the regional gets the trend.

What to test on a 30 day pilot before you trust the channel

Before signing off on any real time alert system at portfolio scale, run the pilot through these four steps on one property. Skipping any of them is how operators end up with a muted alert app three months later.

The 30-day pilot, in order

Wall-clock latency
Stage a known event at a known door. Measure seconds from door open to phone buzz.
Uplink failure drill
Pull the property uplink for 30 minutes mid-pilot. Confirm events are queued in order and drained on reconnect.
Multi-zone walk
Have a tester walk through 4 to 5 cameras. Confirm one incident, one phone call, one thumbnail strip.
Tier discipline
After 14 days, count HIGH alerts that were noise. Should be near zero. If not, the tier rules are wrong.

What this looks like on a 16 to 25 camera multifamily property

Concretely: one Cyrano unit per property, plugged into the existing recorder over HDMI, on the network. Up to 25 feeds covered by that one unit. The recorder keeps doing 24x7 capture to its existing disk, which is where forensic clips still come from. The Cyrano box adds the inference layer and the dispatch layer; nothing about the recorder changes.

For the operator, the day-to-day shape is two things: a phone call when something HIGH happens, and a digest of LOW events to skim in the morning. The natural-language search surface lets you go back through past footage in plain English (“masked person near the gate last night”) when an after-the-fact question comes up, but most of the value is in the live channel staying useful past month two.

Pricing: 450 dollars one-time for the device, 200 dollars a month after the first month. For comparison, a single overnight security guard runs 3,000 to 5,000 dollars a month per property and one guard cannot watch 25 feeds at once anyway.

Walk through the latency budget on one of your properties

A 15 minute call. Bring the recorder model and a rough camera count for one property and we will walk through the four hops on real numbers. No deck, no contract.

Real time CCTV alerts: frequently asked questions

What does 'real time' actually mean for a CCTV alert in seconds?

There is no industry contract on this. In practice, an operator who needs to act (call a guard, dispatch on-site staff, decide whether to call police) treats anything under 5 seconds as real time and anything over 30 seconds as a forensic clip rather than an alert. The budget covers four hops: the camera-to-recorder hop (sub-second on the LAN), the recorder-to-inference hop (this is where most of the variance lives, ranging from milliseconds on an on-prem box to several seconds over a cellular uplink to a cloud model), the inference itself (typically 100 to 400 ms per frame on a current edge accelerator), and the notification hop to the operator's phone (1 to 3 seconds on SMS, longer on email). On-prem inference keeps the whole chain in the 2 to 5 second band; cloud inference on a building DSL link spends 8 to 20 seconds just on the second hop.

Why would an on-device box be faster than a cloud service that runs on bigger GPUs?

Because the bottleneck is not the GPU, it is the path. A cloud system has to push the camera frame across the building uplink to a remote inference cluster before the model can see it. On a typical multifamily DSL or business cable uplink, that hop is the one that bursts, queues, and adds variance. The on-prem box runs the model on the same LAN segment as the recorder, so the frame travels a few milliseconds to inference. Slower silicon close to the camera beats faster silicon at the end of an uplink for an alert workload, because alerts are tail-latency sensitive and the WAN is what owns the tail.

How does Cyrano avoid generating five alerts when one person walks past five cameras?

An incident is not the same thing as a frame with a detection. The edge box maintains short-lived per-zone tracks across the camera grid and merges detections that share spatial and temporal continuity into a single incident. The first frame that crosses a triggering rule fires the alert; subsequent detections that are clearly the same person walking through additional zones are appended to the existing incident, not dispatched as new ones. The operator sees one alert with a thumbnail strip showing the path through the property, not five separate phone buzzes.

What is the difference between a LOW THREAT and a HIGH THREAT classification?

Threat tier is determined on-device from the combination of zone (the package room at 2 AM is not the same as the leasing office at 11 AM), behavior (loitering with face partially obscured vs walking through with a known resident pattern), and context (forced-entry posture at a door vs a delivery worker at the package room). LOW THREAT events update the dashboard and roll into the daily digest. HIGH THREAT events trigger an immediate SMS plus a phone call to the on-call manager. Operators do not need to think about which alerts to ignore, the box already decided.

What happens to alerts during a property uplink outage?

Detection keeps running because the model is on the box and the camera feed never crossed the WAN to begin with. Events are appended to a local outbox file with a monotonic counter (one line of JSON per event). When the uplink returns, the drain worker walks the file forward and posts the events in strict order, so the dashboard backfills and the SMS sequence catches up without dropping anything. Cloud-only architectures cannot do this, because the inference itself lives on the other side of the dead link.

Why does the alert channel get muted after two weeks on most camera systems?

Untiered motion-based alerts trip on every leaf, headlight, or maintenance worker. An operator who gets 40 alerts a day, of which 38 are noise, mutes the app within two weeks. From that point forward the camera system is back to being a forensic recorder. The fix is not better motion sensitivity, it is treating the alert channel as a scarce resource: only the events that actually require a human decision should leave the building and ring a phone. Tiering is what makes that work in practice.

Does this work on the cameras a property already has?

Yes, that is the entire point of the HDMI ingestion path. The Cyrano box reads what the recorder is already drawing on the wall monitor, so it does not care whether the cameras are five years old, mixed brands, mixed resolutions, or analog over coax. If the recorder can render a tile grid, the edge model can detect on it. There is no camera replacement, no rewiring, and no NVR upgrade involved.

Are operators on the hook for tuning rules per property?

There is a per-property zone configuration step (mailroom, parking lot, leasing office, after-hours doors), but it is done once at deploy time through the dashboard, and most properties reuse a small set of templates. The threat-tier rules are shared across the fleet and updated centrally. A property manager does not babysit detection thresholds; if alerts skew noisy, the answer is a zone or schedule change, not pixel-level tuning.