Matthew Diakonov, Written with AI

Published April 19, 202610 min read

AI Video Surveillance Guide

AI video surveillance should be audited, not benchmarked. Four artifacts predict whether it is still running in month three.

Vendor pages advertise accuracy numbers. Installed deployments succeed or fail on something much more concrete: a small set of files, log lines, and payload fields that either exist and look sensible or do not. This guide defines those four artifacts, what to check on each, what a healthy reading looks like, and what the failure mode is when one is missing. If you read nothing else on AI video surveillance this quarter, make it this checklist.

Walk the audit on a live install

4.9from 50+ properties

Four audit artifacts that predict deployment health

Per-layout overlay mask subtracts DVR glyphs at inference time

50:1 raw-to-delivered compression ratio target band

Under 5 minutes to run the full on-site audit

Audit, don't benchmark

Four artifacts that predict a deployment's next three months.

Artifact 1: the per-layout overlay mask

Artifact 2: the layout_id cache

Artifact 3: the 24-hour compression ratio

Artifact 4: the event payload shape

If all four look sensible, the system is healthy

0:00 / 0:05

The benchmark number is the wrong unit

Every AI video surveillance vendor publishes an accuracy figure. Ninety-seven percent, ninety-nine percent, sometimes a detection chart against an open dataset. Those numbers are not lies, but they are also not useful to a property operator. The model inside the box is rarely the thing that decides whether the system is still delivering alerts by month three. The filter stack that runs on top of the model, the overlay handling, the layout awareness, and the output channel do. None of those show up on a benchmark sheet.

The alternative is an audit. Walk up to a deployed device, pull four specific artifacts, and look at what they contain. If the artifacts exist and the values are within a small target band, the system is working. If one is missing or the value is outside the band, the failure mode is obvious and usually fixable before it kills the deployment. This works because the artifacts are generated as a side effect of the pipeline actually running, so they cannot be faked by a marketing deck.

The rest of this page walks each artifact, what a healthy reading looks like, what a bad one means, and where on a our system install you find it.

The four artifacts produced by a working AI video surveillance pipeline

The four artifacts, one at a time

Each artifact sits at a specific stage of the pipeline. Each has a single question it answers. Each has a healthy shape and a failure shape. If you run the four checks in order, an on-site audit takes about five minutes.

Audit order, top to bottom

Artifact 1. The per-layout overlay mask file

Answer to: is the detector ignoring the DVR's burned-in graphics? Every DVR draws a live clock, a camera-name strip across each tile, and a channel bug with a recording indicator over the composite frame. Without a mask, inference fires phantom bounding boxes on those glyphs. On a healthy our system install, there is a mask file per layout_id (for example 4x4-std.mask and 5x5-std.mask) with polygon coordinates covering the clock, the name strip, and the channel bug. The mask is computed once at install and applied in constant time on every frame. Missing mask = wasted inference cycles and phantom detections.

Artifact 2. The layout_id cache

Answer to: does the system know when the DVR layout changes? Guards routinely switch from the full multiview to fullscreen on a single camera during an active incident. Inference at fullscreen is a completely different geometry than inference on a 4x4 grid. A healthy deployment has at least two layout_id entries in its cache after a week of operator use, typically 4x4-std and 1x1-std. If the cache only ever contains the boot-default layout, either the system is not being used by anyone on site, or it is not detecting layout changes, which means inference silently continues with the wrong mask when the operator drills in.

Artifact 3. The 24-hour compression ratio

Answer to: is the filter stack doing its job? Divide raw person detections by delivered alerts over the last 24 hours. Healthy band on a 16 to 25 camera property is 40 to 100. Below 25, the channel is going to be muted inside two weeks from alert fatigue. Above 200, either the zones are drawn too tight, the dwell threshold is too aggressive, or the system is silently dropping real incidents. Our system prints a summary line at the end of each day with raw, dropped, delivered, and compression_ratio fields ready to read.

Artifact 4. The event payload shape

Answer to: is each delivered alert actionable in under a minute? Open the most recent delivered event and confirm the payload has seven fields: tile thumbnail, zone label, dwell_seconds, camera name, layout_id, latency_ms from frame to send, and event class (pre_action_zone_entry, loitering_dwell_exceeded, and so on). If any field is missing, the responder loses the ability to triage the alert from the message alone, which is the single most common reason AI video surveillance output gets ignored even when it is firing correctly.

What the on-device log shows during an audit

Below is a shortened extract from a our system event stream during the five-minute walkthrough on an actual 16-camera property. Every artifact the audit depends on shows up as a readable line. No screen scraping, no vendor dashboard, no API call.

cyrano audit, 16-camera property, tail last 200 lines

48:1

“AI video surveillance should be audited, not benchmarked. A deployment that prints a healthy mask, a populated layout cache, a 40-to-100 compression ratio, and a complete payload is a working one, regardless of whatever accuracy figure the vendor quoted in the deck.”

Our system field audit notes, 16-camera multifamily baseline

A single day on a healthy deployment

0raw person detections

0zone-matched

0dwell-passed

0delivered to WhatsApp

Same 16-camera Class B multifamily property, one Tuesday. The 241 to 5 ratio works out to 48 to 1. Right in the middle of the healthy band. Every delivered event carried a thumbnail, a zone label, a dwell time, and a camera name.

What each artifact is actually telling you

Overlay mask: the detector is not chasing ghosts

A mask with sensible coordinates for the clock glyph, the per-tile name strip, and the channel bug proves the detector is running on pixels that are part of a real camera scene, not on DVR chrome. Every installed unit keeps these mask files in a small directory keyed by layout_id, readable with a single ls.

Layout cache: operator drill-in is respected

Two or more layout_id entries after a week of use means the engine is swapping masks when a guard goes fullscreen. A single entry after a week means no one drilled in, which is its own warning.

Compression ratio: the channel will stay active

A 40-to-100 raw-to-delivered ratio keeps the staff WhatsApp thread read. Under that, it gets muted. Over that, the filter stack is suppressing real events.

Event payload: the responder can act in under a minute

Tile thumbnail, zone label, dwell, camera, layout_id, latency, event class. Seven fields. If any are missing, the message becomes ambiguous and triage stops happening.

Failure modes are obvious at a glance

Missing mask = phantom boxes. Stale layout cache = wrong inference geometry after operator drills in. Compression under 25 = imminent mute. Compression over 200 = zones too tight or suppressed true positives. Payload gaps = alerts not being acted on. Every failure points at its own fix.

No vendor dashboard required

Every artifact is a file or a log line on the edge device itself. The audit does not depend on a cloud portal being up or on anyone producing a report. The device itself is the evidence.

Audit framework vs the usual RFP checklist

Feature	Accuracy and class-count RFP	Artifact-level audit
Primary question answered	Does the box claim high detection accuracy?	Is the deployment producing sensible artifacts right now?
Takes place	Before purchase, from a deck	After install, from files on the device
Detects overlay-induced phantom detections	No	Yes, via mask file inspection
Detects operator drill-in not being respected	No	Yes, via layout_id cache size
Predicts channel mute rate	No	Yes, via 24h compression ratio
Predicts responder triage speed	No	Yes, via payload field completeness
Time required	Days of procurement cycle	Under 5 minutes on site
Can the vendor fake the result?	Yes, the deck is controlled	No, the evidence is on the device

Two artifacts most audits miss

The layout_id written into every event

Every delivered alert should carry the layout_id active at the moment of detection. If you audit a week of events and every single one is tagged 4x4-std, nobody on the property used fullscreen during a real incident, and the re-scope behavior has never been exercised. If some events are tagged 1x1-std, the fullscreen path is working. This is a far better indicator of real-world use than a dashboard login count.

latency_ms as a silent regression alarm

End-to-end latency from frame capture to WhatsApp send is usually a few to tens of seconds, dominated by the messaging platform rather than inference. If the rolling latency_ms doubles week over week without a change on the device, something upstream (network carrier, WhatsApp API, internet uplink) is degrading. The field is a free early warning system built into the payload you are already sending.

The five-minute on-site audit, in order

ssh to the unit, list the mask directory, confirm at least one mask file per active layout_id
Open one mask and confirm clock, name strip, and channel bug coordinates look sensible for that DVR model
List layout_id cache entries; two or more after a week of use is healthy
Tail the event stream log, read the 24h summary line, confirm compression_ratio between 40:1 and 100:1
Open the latest delivered WhatsApp message, confirm thumbnail + zone_label + dwell_seconds + camera + layout_id + latency_ms + event_class are all present
If all five checks pass, the deployment is healthy; if one fails, the failure mode is named by the check that failed

DVRs this audit works on

Hikvision DS series

Dahua XVR and NVR

Lorex LNR and LHD

Swann DVR and NVR

Uniview NVR

Annke NVR

Reolink NVR

Amcrest NVR

Night Owl DVR

Q-See and rebrands

The audit framework is device-agnostic. Any DVR or NVR that outputs a multiview over HDMI works, because the four artifacts live on the unit, not on the recorder.

The short version

Stop comparing AI video surveillance vendors by their accuracy numbers. Compare them by whether a five-minute audit on a real install can produce a healthy reading on all four artifacts: the per-layout overlay mask, the layout_id cache, the 24-hour compression ratio, and the event payload shape. If the artifacts are there and the values are in range, the system is going to still be delivering alerts three months from now. If any are missing, it is not going to.

Our system runs this audit against itself on every deployment, on 0 tiles per unit, off one HDMI cable out of the DVR, and lands each surviving alert on a WhatsApp thread in under 0s from tile to notification. That is the whole contract.

Walk the audit on a live deployment

Book a 15-minute demo. You will ssh into a real unit running on a production DVR, pull the four artifacts, and confirm the values for yourself. Same five checks described on this page.

Book a demo →

AI Video Surveillance: Frequently Asked Questions

Why should I audit an AI video surveillance system instead of comparing accuracy scores?

Accuracy scores are measured on curated vendor datasets. Your property is not a curated dataset. Two systems with the same 99 percent claim can produce wildly different daily event counts, because the difference in the field is not the model, it is the filter stack, the overlay handling, and the output channel. The thing that predicts whether an installed AI video surveillance system is still being used in month three is not its accuracy number, it is a handful of verifiable artifacts on the device: the overlay mask file, the layout_id cache, the compression ratio over the last 24 hours, and the event payload shape. Those artifacts either exist and look sensible, or they do not. No marketing language in between.

What is the per-layout overlay mask and why does it matter?

A DVR burns graphics into its HDMI output that are not part of any camera feed: a live clock in the corner, a camera-name strip across each tile, and a channel bug with a recording indicator. If you run person detection on that composite frame without subtracting those glyphs, the model produces phantom bounding boxes around the text and the recording dot. On our system, the overlay mask is computed once per DVR layout at install (for example 4x4-std or 5x5-std) and cached as a file keyed by layout_id. On every frame, inference multiplies the composite by the cached mask in constant time. If the mask file is missing or empty, the system is wasting cycles and producing noise. If it exists and has sensible coordinates for the clock and name strip, you are off to a good start.

What is the layout_id cache and what should it contain?

The layout_id cache is the set of DVR multiview layouts the engine has observed and built a mask for. On a new install, it will have one entry, typically the default grid layout the guard monitor boots into, for example 4x4-std. After a guard drills into a single camera in fullscreen and back at least once, it will also contain 1x1-std. A healthy AI video surveillance deployment where the operator actually uses the system will show multiple layout_id entries after a week. A deployment where the cache only ever has one entry is not being used by anyone, which is its own failure mode, independent of model accuracy.

What compression ratio should I expect on a 16 to 25 camera property?

The compression ratio is raw camera events (mostly person detections) divided by delivered alerts over a 24-hour window. On a typical 16-camera multifamily property, a day produces 200 to 300 raw person detections from residents, vendors, and passersby. A healthy our system install compresses those to 3 to 8 delivered alerts. That is roughly 50 to 1. If the ratio is below 25 to 1, the channel is going to be muted within two weeks. If it is above 200 to 1, either you drew your zones too tight, your dwell threshold is too aggressive, or the system is dropping real incidents. The target band for an actionable AI video surveillance feed is 40 to 100 raw events per delivered alert.

What fields should be present in every delivered event payload?

At minimum: a tile thumbnail cropped from the DVR multiview (not the full multiview), the zone label that was crossed, the dwell in seconds, the camera name, the layout_id active at the time of the event, the end-to-end latency in milliseconds from frame capture to message send, and the event class such as pre_action_zone_entry or loitering_dwell_exceeded. These are not decorative. The thumbnail is what the responder reviews in under a minute. The zone and dwell are what the responder uses to decide whether to talk down, call, or log. The layout_id and latency_ms are what the integrator uses to diagnose regressions. If any of those fields is missing from the payload, the system is not deployment-ready, regardless of what its accuracy number is.

Does audit-first AI video surveillance still require new cameras?

No. The whole point of the artifact-level audit framework is that it applies to the deployment, not the sensor. Our system runs on whatever composite HDMI signal the DVR is already driving out to the guard monitor, which covers almost any DVR or NVR shipped in the last decade, including Hikvision, Dahua, Lorex, Swann, Uniview, and most rebrands. No camera firmware changes, no RTSP credentials, no ONVIF negotiation. The audit artifacts are generated regardless of camera brand, because they are properties of the detection pipeline, not the cameras.

How long does an on-site AI video surveillance audit take?

Under five minutes once you know where to look. You pull the layout_id cache (it is a small directory of mask files on the device), confirm at least one layout present with a sensible mask. You pull the last 24 hours of event stream logs, count raw person_detected lines, count event_delivered lines, and confirm the ratio is in the 40 to 100 band. You open the last delivered WhatsApp message and confirm the payload has thumbnail, zone label, dwell seconds, camera, layout_id, and latency_ms. If all four pass, the system is healthy. If one fails, the failure mode is usually obvious from the missing artifact.

Why is this framework better than the usual RFP checklist (resolution, framerate, number of object classes)?

RFP checklists optimize for spec-sheet language, not for whether the system is still delivering useful output six months after install. Resolution and framerate are hardware properties that you cannot change after purchase, so they are important. But the number of supported object classes is actively harmful: a model that can recognize 300 classes but alerts on all of them will produce hundreds of events per day and get its channel muted. The four audit artifacts in this guide are the minimum floor for a working deployment. Everything else is a preference.