Matthew Diakonov, Written with AI

Published April 19, 202611 min read

Theft Detection Guide

The spec that decides whether a theft detection system works is the raw-to-delivered alert ratio, not the accuracy number.

Every vendor pitches an accuracy percentage. Every working deployment in the field is judged on a different number: how many raw camera events get compressed into each alert that hits the staff phone. Under roughly 50 to 1, the channel stays open and alerts get read. Over that, the WhatsApp thread gets muted inside a week and the system becomes a recording. This guide is about that compression ratio, the three-layer filter stack our system uses to keep it tight, and the shape of the event payload that lands on the other side.

See the filter stack on a live DVR

4.9from 50+ properties

200+ raw detections per day compressed to 3-8 alerts

Three filter layers: zone, dwell, time window

Under 60 seconds, tile to WhatsApp notification

25 camera tiles per unit off one DVR HDMI output

Theft detection is a filter-ratio problem

Keep compression near 50 to 1 or the channel gets muted in a week.

Raw person detections on a 16-camera property: 200-300 per day

Filter 1: is the person inside a drawn polygon zone?

Filter 2: have they been there longer than the dwell threshold?

Filter 3: is the current time inside the zone's armed window?

Delivered alerts after all three filters: 3 to 8 per day

0:00 / 0:05

The real failure mode is the muted channel

Walk any property manager through their security stack and ask which alerting channels are still active. Most of them can name a motion-alarm thread, an access-control alert thread, and maybe a leasing inbox, and every one of those has been muted at least once. The failure pattern is identical: the system fires too many times, a human gets woken up at 3 a.m. for a raccoon, and the channel gets silenced. From that point onward, every true positive also goes unseen.

The industry talks about this in the wrong vocabulary. Vendor pages advertise 98 or 99 percent detection accuracy. That framing is misleading. What matters for a theft detection system staying in service is not detection rate, it is the number of events per day that make it through to a reader. If the system is 99 percent accurate and still delivers 80 messages per day, the channel is dead in a week. If it is 95 percent accurate and delivers 5 a day, the channel stays open for years.

Our system is built around that second number. The rest of this page is about the filter stack that produces it, the event payload on the other side, and the specific patterns of theft that this approach covers.

How 200+ raw detections compress to 3-8 delivered alerts

The three filter layers, in order

Person detection on its own is already a strong filter: it removes the entire weather, animal, and shadow class that makes pixel-based motion alarms unusable. But person detection alone still fires on every resident, delivery driver, and vendor crossing the camera. The three scoped filters that follow are what cut the daily event count down to a readable number.

Filter stack running on top of person detection

Layer 1. Polygon zone

On every camera tile at install, operators draw the boundaries that matter: mailroom door, package shelf, transformer pad, HVAC cage, fire lane, dock apron. A person outside every drawn polygon is silently dropped. This single filter removes roughly 80 percent of raw detections on a typical multifamily property, because most person-bearing frames happen along paths, driveways, and open lots that do not need alerting.

Layer 2. Dwell threshold

The person must remain inside the polygon longer than the dwell threshold, typically 4 seconds or more. A resident walking past a transformer in 2 seconds never clears it. A delivery driver checking a package and moving on in 3 seconds never clears it. An actor positioning at the target typically crosses it inside the first 5 to 10 seconds. This filter cuts another large fraction of candidate events.

Layer 3. Armed time window

Each zone has a schedule, and an alert only fires when the current wall-clock time is inside its armed window. A maintenance crew on the HVAC pad at 10 a.m. on a scheduled morning is not an event. The same crew on the same pad at 1 a.m. on a weekend is. Property teams tune these windows once at install and rarely touch them again.

Result. The compression ratio holds

The three filters multiply rather than add. A frame that would make it through any one of them alone has to pass all three to become a delivered alert. That is the arithmetic that takes a 200-detection day down to a 3-to-8-alert day on the same hardware. Every delivered alert is, by construction, a person inside a drawn zone, longer than the dwell threshold, during an armed window.

What the pipeline actually prints

The shape below is the internal event log for a single mailroom camera on a typical 16-camera property across one evening. Most person detections are silently dropped by one of the three filters. The ones that survive become delivered events.

cyrano event stream, 1 camera, 3 hours

50:1

“A theft detection system only stays in service if the raw-event to delivered-alert compression ratio stays aggressive. Everything else is a recording product wearing a detection label.”

Our system field deployment notes, 16-camera multifamily baseline

A typical weekday in numbers

0raw person detections

0events that crossed a zone

0cleared the dwell threshold

0delivered to WhatsApp

Sample day on a 16-camera Class B multifamily property. The 248 to 5 compression ratio is roughly 50 to 1. Every one of the 5 delivered events was person-in-zone, dwell above threshold, during an armed window.

What each filter is actually dropping

Paths and driveways

A person walking a dog, a resident moving a grocery cart, a vendor heading to a unit. No polygon covers general-purpose paths, so these frames end at layer 1 and never reach the dwell stage.

Transient zone crossings

Someone cutting through the corner of a transformer pad polygon on their way elsewhere. Dwell of 1.2 seconds dies at layer 2.

Scheduled vendors

HVAC crew, landscapers, trash hauler. Inside the polygon, dwelling as long as the work takes, but the time window is disarmed during scheduled hours. Dies at layer 3.

Short check-ins

A property manager swinging by the mailroom to verify a package. 3 to 4 seconds of dwell. Under the threshold, dropped.

Animals and weather

A dog, a raccoon, a plastic bag, a shadow at sunset. None of them are a person. Never made it past person detection, never reached the filter stack at all.

What survives

A person inside a drawn polygon, stopped for more than the dwell threshold, during the armed window. Every survivor becomes a WhatsApp message with a tile thumbnail, zone label, dwell seconds, and timestamp.

Filter-stack theft detection vs. generic motion alerting

Feature	Generic motion alarm	Our system
Trigger primitive	Pixel change in a frame	Person detection, then zone, dwell, and time
Typical daily events (16 cameras)	40 to 60 per night	3 to 8 per day
False positive sources filtered	None; all pixel change is an event	Animals, weather, shadows, paths, transients, scheduled vendors
Configuration	Sensitivity slider per camera	Polygons + dwell seconds + armed schedule per zone
Compression ratio (raw to delivered)	~1 to 1	~50 to 1
Mute rate on staff phones	Silenced within 7 to 14 days	Stays active across quarterly reviews
Event payload	Camera name, timestamp	Tile thumbnail, zone label, dwell seconds, camera, layout_id, latency_ms

The DVR detail most vendors skip

Overlay masks are cached per layout_id

Every DVR burns a clock glyph, camera-name strip, and channel bug into its multiview. Running inference over those glyphs wastes cycles and produces phantom bounding boxes. Our system measures and stores one overlay mask per DVR layout_id (for example 4x4-std or 5x5-std) at install, then applies it in constant time per frame. The work only happens once, not on every inference pass.

Fullscreen re-scope on operator focus

If the guard on site switches the DVR to fullscreen on a single camera during an active incident, the tile is now the full frame at full resolution. It detects the layout change, swaps to single-camera mode, and inference runs at higher effective pixels per person. Per-tile accuracy improves during exactly the moments when accuracy matters most.

What you need on the property to run this

A DVR or NVR with an HDMI output (Hikvision, Dahua, Lorex, Swann, Uniview, any of them)
A wall monitor or guard screen that currently receives that HDMI signal
One free HDMI cable length between DVR and monitor for the passthrough tap
Power and a network cable at the DVR location
A WhatsApp thread that the property team already reads
Zones, dwell thresholds, and time windows worth drawing once

The theft categories this filter stack covers

Package theft

Copper theft at transformer pads

HVAC condenser theft

Cargo theft at loading docks

Catalytic converter prep

Trailer yard pilfering

Jobsite conex box break-ins

Mailroom loitering

Fire lane staging

After-hours dock apron

Every one of these shares the same structural shape: a defined target zone, a pre-action pause while the actor positions, and a time window during which legitimate presence is near zero. The filter ratio math is identical across all of them.

The one-minute version

Accuracy percentages are marketing. The spec that predicts whether a theft detection system is still delivering alerts in month three is its compression ratio. On a 16-camera property, aim for roughly 50 to 1: hundreds of raw person detections compressed into a handful of alerts a day. Hit that, and the WhatsApp thread stays active. Miss it, and you have built a recording.

The way to hit it is not a better model. It is three scoped filters running after person detection: a polygon zone, a dwell threshold over 4 seconds, and an armed time window. The three multiply, which is why a modest dwell raise combined with a smaller zone combined with a tighter schedule cuts the daily event count faster than any accuracy tuning.

Our system runs that stack on a single HDMI tap from the DVR you already have, covers up to 0 tiles per unit, and lands each surviving event in a WhatsApp thread in under 0s from tile to notification. That is the whole shape of the product.

See the filter stack on a real DVR

Book a 15-minute walk through an actual deployment. You will see the raw event feed on one side, the three filter layers in the middle, and the delivered WhatsApp messages on the other end, including the compression ratio for the last 24 hours.

Book a demo →

Theft Detection: Frequently Asked Questions

What do you mean by the filter ratio in theft detection?

Every camera on a property generates raw events all day: a person walks past a lobby, a car crosses a lot, a dog runs across a lawn. A theft detection system that forwards all of those to a human is a system that gets muted. The filter ratio is the number of raw camera events consumed for every delivered alert. On a 16-camera multifamily property, a typical day produces 200 to 300 raw person detections. A system keeping its alert channel usable delivers 3 to 8 alerts. That is roughly a 50-to-1 compression ratio, and it is achieved by stacking three scoped filters on top of person detection: a polygon zone, a dwell threshold, and a time window.

Why is alert channel mute rate the real failure mode, not detection accuracy?

Because accuracy on its own is not a deployment metric. A model that is 99 percent accurate but fires 150 times a day at a 16-camera property will have its WhatsApp thread muted by the property manager before the week is out. Once the channel is muted, the true positives go unseen too. The system is now a recording, not a detection product. In the field, the question that actually determines whether alerts continue to be read is: how many events does the staff see per shift? Keep that number in the single digits per day and the channel stays active. Push it into the dozens and the mute happens every time.

How does our system compress 200 plus raw detections down to a handful of alerts?

Three filter layers running after person detection. First, the polygon zone: on every camera tile at install, operators draw the boundaries that matter, mailroom door, transformer pad, HVAC cage, dock apron, fire lane. A person outside every drawn polygon is not an event. Second, the dwell threshold: the person must stay inside the polygon longer than the threshold, typically 4 seconds or more. A resident walking past a transformer in 2 seconds never crosses it. Third, the time window: each zone has an armed schedule, and an alert only fires when the current time is inside it. A maintenance crew on the pad at 10 a.m. during a scheduled window is not an event. The three filters combine multiplicatively, which is why the compression ratio compounds so aggressively.

What makes this different from a generic motion alarm on a DVR?

A DVR motion alarm triggers on pixel change. A raccoon, a tree shadow at sunset, a plastic bag blowing across the lot, and an actual person are all pixel changes. Most DVR motion alarms wake staff dozens of times per night, which is why most of them are disabled within a month. Our system runs object-aware person detection first, which removes the entire animal, weather, and lighting class of false positives. Then the zone, dwell, and time filters run on top of that. The result is that the same 16-camera property that generated 40 to 60 motion alarms a night generates 3 to 8 zone-verified alerts a day, every one of which is a person inside a polygon longer than the dwell threshold during an armed window.

What does the event payload look like when a theft detection alert fires?

Each delivered event carries a tile thumbnail (cropped from the DVR multiview), the zone label that was crossed, the dwell seconds counted, a timestamp, the camera name, the DVR's layout_id (e.g. 4x4-std or 5x5-std, used to apply the overlay mask at inference time), and the end-to-end latency in milliseconds from frame capture to message send. The event class for a pre-action pattern is pre_action_zone_entry. These fields are what make an alert actionable rather than decorative: a dispatcher reading the message in under a minute can verify the scene, read the dwell, and decide whether to talk down, dispatch, or log.

How does our system install on an existing DVR without replacing cameras?

Our system taps the DVR's HDMI multiview output, the same composite signal that drives the guard monitor. That signal already has every camera on the recorder mosaiced into tiles, so one HDMI tap gives inference access to every feed at once. Install is under 2 minutes: HDMI in from the DVR, HDMI out to the monitor, network cable, power. It works on Hikvision, Dahua, Lorex, Swann, Uniview, and any DVR or NVR with an HDMI port. No ONVIF negotiation. No per-camera credentials. No camera firmware changes. Up to 25 camera tiles covered per unit, re-scoping to full-res on whichever camera the operator switches the DVR to fullscreen during an active incident.

What categories of theft does this actually detect in production?

Package theft at mailroom doors and lobby shelves, cable and copper theft at transformer pads and conduit chases, HVAC theft at condenser cages and line-set runs, cargo theft at loading-dock aprons and trailer yards, parking-lot theft including catalytic converter prep, and jobsite theft from conex boxes and staging areas. Each of these scenes shares the same structural pattern: a defined target zone, a pre-action pause while the actor positions, and a time window during which legitimate presence is zero or near zero. The filter ratio math works identically across all of them.

Why deliver over WhatsApp instead of a dashboard or a monitoring portal?

Because the channel staff already reads is the channel that stays read. WhatsApp is already on every property manager's phone for vendors, maintenance, and move-outs. Adding a single thread for theft alerts is a zero-friction change. Dedicated monitoring dashboards start around 250 dollars per camera per month because they assume a human watching a feed. One WhatsApp thread per property, with a tile thumbnail and a timestamp on every delivered alert, matches the communication pattern that Class B and C multifamily, commercial, and construction teams already use. It is a deliberate product choice tied to the filter-ratio thesis: the compression ratio only matters if the channel carrying the output is read.