Cyrano Security

Published April 18, 202613 min read

AI Theft Detection Guide

AI theft detection outside retail runs on a pre-action window, not a pose model.

Search for “AI theft detection” and the first ten results are retail: a shopper, an aisle, a POS register, a pose classifier looking for concealment. That stack does not apply to the theft that actually generates alerts on apartment, commercial, and jobsite properties. Those incidents happen at mailrooms, transformer pads, HVAC pads, parking lots, and loading docks, and the detection loop that works there is a different shape: person detection tuned against a zone, a dwell threshold, and a time window, firing inside a 30 to 180 second pre-action intervention window. This guide is about that loop, and how Cyrano runs it by tapping the DVR's HDMI multiview output on cameras you already own.

See pre-action detection on a live DVR

4.9from 50+ properties

Zone + dwell + time, not pose classification

Up to 25 tiles per unit off one DVR HDMI output

Under 60 seconds, tile to WhatsApp notification

Works on analog and IP cameras already installed

AI theft detection, outside retail

The pre-action window is 30 to 180 seconds. That is the whole game.

Retail AI theft detection = shopper pose in an aisle

Non-retail AI theft detection = person in a zone at a time

Signal: zone + dwell + time window, not gait

Capture point: the DVR's HDMI multiview, not the IP stream

Delivery: one WhatsApp thread per property, in under 60 s

0:00 / 0:05

Where the SERP ends and where real theft starts

The top organic results for this keyword are written for retail loss prevention. Veesion, ECAM, Pavion, viAct, Avigilon, and the rest all describe the same stack: a pose-based behavior classifier watching a shopper inside a store, cross-checked against the POS to detect concealment or a walk-out. That stack is real and it works for the context it was built for. It does not extend to package theft, cable theft, HVAC theft, cargo theft, trailer theft, or parking-lot theft, because none of those have a shopper, an aisle, or a POS.

Outside retail, the actor is not identifiable by pose. They are identifiable by where they are, how long they have been there, and whether that combination is allowed at the current time. A resident walking across the lot is identical to a thief walking across the lot, until you add a zone around the transformer and a time window that says “no one should be in this polygon after midnight.” At that point the signal is unambiguous and the false-positive rate drops from dozens per night to a handful per week.

The rest of this page is about that signal. What it is made of, how to capture it on cameras that already exist, and why the 30 to 180 second pre-action window is the number that actually matters.

Retail AI theft detection vs. the detection that works outside retail

Designed for a shopper inside an aisle. The model scores concealment gestures, unusual dwell near high-shrink SKUs, and item removal without a POS scan. It runs on the store's IP cameras or a ceiling-mounted analytics device. The alert goes to a loss prevention dashboard. Everything about the stack assumes a retail environment.

Actor is assumed to be a shopper
Scene is an aisle or a register line
Signal is pose, gait, concealment
Cross-check is against the POS
Does not map to mailrooms, lots, pads, docks

What happens inside the pre-action window

Between the actor arriving at the target and the act itself there is almost always a measurable pause. They scan, they open a tool bag, they verify they are not being watched, they take position. That pause is the window. Every number below is what a live deployment sees on typical non-retail theft.

The 30 to 180 seconds between arrival and act

Second 0 to 5. Zone entry

The actor crosses the polygon boundary around the target: mailroom, transformer pad, HVAC cage, fire lane, dock apron. Person detection fires on the tile. Dwell timer starts.

Second 5 to 30. Dwell stabilizes

The dwell timer clears its false-positive threshold. A resident walking past the pad in 4 seconds never crosses this line. A thief who parks and walks in does. The event candidate is now scored against the zone's time window.

Second 20 to 60. Alert delivered

Tile thumbnail is cropped. Zone label, dwell seconds, time, and camera name are packaged. Message lands in the property's WhatsApp thread. End-to-end under 60 seconds in measured deployments.

Second 30 to 180. Intervention window is still open

This is where a live talkdown, a two-way speaker ping, or a dispatched responder changes the outcome. The actor has not yet cut, pried, or lifted. Because the alert is verified (zone + dwell + time), dispatch can treat it as priority, not a generic motion alarm.

After second 180. The act has probably happened

If the alert lands after this, detection has collapsed into documentation. The footage is good for insurance and good for a police report. Recovery rates on copper, packages, and cargo once the actor leaves the property are low. This is why the window is the spec.

The capture point: DVR HDMI multiview, not the IP stream

Most AI theft detection vendors require IP cameras with their own analytics SDK, or a cloud ingest path where every camera pushes its stream to a remote inference service. Both assume a modernised camera stack. Most real properties do not have that. They have a DVR or NVR from 2015, 16 to 32 cameras wired into it, and a guard monitor displaying the multiview grid. Cyrano's capture point is that monitor signal.

Tap the signal that already shows every camera at once

Because the capture point is downstream of the camera stream, the vendor of the camera does not matter. Hikvision, Dahua, Lorex, Amcrest, Reolink, Uniview, and every rebrand that outputs to a DVR with an HDMI port are all compatible. There is no ONVIF negotiation, no credentialing per camera, no firmware assumption. The physical install on a running DVR is under 2 minutes: HDMI in from the DVR, HDMI out to the monitor, network cable, power.

The anchor numbers

No invented benchmarks, no “99% accuracy” marketing numbers. These are the operating constants of the detection loop as it actually runs.

0Tiles inferred per unit, off one DVR HDMI

0 sTile to WhatsApp, end-to-end

0 sUpper edge of pre-action window

0 minPhysical install on a running DVR

Seven places pre-action AI theft detection has measurable leverage

Every one of these has a defined target, a pre-action pause, and a time window where the zone should be empty. Every one of them is outside the scope of a retail shoplifting model.

Package theft at mailrooms and lobbies

The pre-action pause is the moment an outsider enters the package alcove and scans for cameras. Zone around the package shelves, dwell threshold of 10 to 20 seconds, armed outside delivery windows.

Copper and cable theft at transformer pads

60 to 180 seconds between a vehicle stopping and bolt cutters touching wire. Detection fires on zone entry, not on the cut.

HVAC theft: condensers and line sets

Cage or condenser pad is a tight zone with an explicit time window. Any zone entry outside service hours is an event.

Cargo theft at loading docks

Trailer yards and dock aprons are defined polygons with shift-based time rules. A person or a second vehicle entering outside shift is the event.

Parking-lot theft and catalytic converter cuts

A person lying down between vehicles in a lot is a zone+dwell signal that pose models miss entirely.

Jobsite theft of tools and wire

Staging area, conex box, spool rack are tight zones. Armed outside work hours with a short dwell threshold.

Trailer theft and tow-away attempts

Kingpin, landing gear, and rear doors are all in-frame. Dwell near either during a closed window is the alert.

How a WhatsApp alert is actually assembled

The payload is deliberately boring. A thumbnail, a camera label, a zone name, a dwell count, a timestamp, a latency number. That is what a property manager reads in two seconds while walking between units.

Event payload · single alert, anonymised from a real deployment

“On a Class C multifamily property in Fort Worth, pre-action zone detection flagged 20 incidents in the first 30 days, including a break-in attempt. The DVR had been recording silently for months before anyone reviewed anything.”

Fort Worth, TX deployment

Cyrano pre-action detection vs. retail AI theft detection

Same keyword, two different products. Use the right one for the problem you actually have.

Feature	Retail pose classification	Cyrano, zone + dwell + time
Target environment	Retail store interiors	Apartment, commercial, jobsite, yard
Primary signal	Shopper pose, concealment gesture	Person in zone, dwell, time of day
Cross-check	POS transaction log	Zone schedule
Capture point	Store IP cameras or cloud stream	DVR HDMI multiview, existing cameras
Cameras supported	Modern IP cameras only	Analog + IP, any DVR brand
Install time	Per-camera integration	Under 2 minutes per DVR
Alert delivery	LP dashboard or portal	WhatsApp thread per property
Pre-action window usable	In-the-act only	30 to 180 seconds
False positives from residents	Not modelled	Filtered by zone + time

DVR and NVR brands the capture point works on

Because Cyrano taps the HDMI multiview output rather than the per-camera IP stream, compatibility is at the recorder level, not the camera level. If the recorder has an HDMI port driving a monitor, it works.

Hikvision DS-7xxx

Dahua XVR / NVR

Lorex

Amcrest

Reolink NVR

Uniview

Swann

Night Owl

Q-See

ANNKE

EZVIZ

Honeywell Performance

Bosch DIVAR

Panasonic WJ-NX series

Any DVR with HDMI out

The part of the pipeline most vendors skip

A DVR composite is not a clean video feed. Before inference runs, three kinds of glyphs have to be handled. Cyrano does this once per DVR layout at install time. Skipping the step is why a lot of “AI on existing cameras” trials never cross the false-positive threshold.

Overlay 1

DVR clock

Fixed-position timestamp, pinned to the top corner of the composite. Masked before inference. Otherwise detection models occasionally fire bounding boxes on the colon glyph.

Overlay 2

Camera name strip

Per-tile label strip on the top of each tile. Overlapping it with the scene triggers spurious person detections on tall letterforms. Masked per-tile using the layout id.

Overlay 3

Channel bug

DVR brand watermark, often bottom-right per-tile. Small but consistent; masked because it otherwise steals attention from the actual scene at dusk.

Have a DVR today. Want pre-action detection by this weekend.

15-minute demo. We connect to a running DVR's HDMI, show per-tile inference, walk through a real pre-action alert landing on WhatsApp.

Book a demo →

How to evaluate any AI theft detection vendor in one question

Run the test that exposes whether the vendor is solving the pose problem or the zone problem. Ask them: “On a Tuesday afternoon at a 16-camera apartment property with 200 residents, how many raw person-detection events will your model produce, and how many alerts will land on the property manager's phone?” If the ratio is not at least 50 to 1 between raw detections and delivered alerts, the event layer is not doing its job. That ratio is what separates a spec-sheet product from a product a property will still be using 90 days in.

Raw person detections per day from residents, contractors, deliveries, and staff at a 16-camera apartment property.

0 to 0

Delivered alerts per day after zone, dwell, and time filters are applied. This is the read-able number.

End-to-end latency from tile capture to notification, well inside the pre-action window.

Frequently asked questions

Why does retail AI theft detection not transfer to apartments, jobsites, and parking lots?

Retail models classify shopper pose and gait: a hand going to a pocket, a coat held in front of a shelf, an item leaving the aisle without a POS scan. Outside retail, the actor is not a shopper and the scene is not an aisle. A person walking toward a transformer at 3 a.m. has no pose signature. A person standing near a mailroom package at 7 p.m. has no pose signature. What separates the ordinary resident from the thief is zone (are they in a place they should not be), dwell (how long have they been there), and time (is this within the window this zone is armed). Pose classification on its own fires on half of the residents walking their dog. The reliable signal for non-retail theft is person detection plus zone and dwell and time, not a pose model.

What is the pre-action intervention window and why does it matter for AI theft detection?

Watch footage of any non-retail theft and time the segment between the actor stopping in the target area and the actual act (cutting, lifting, prying). It is almost always between 30 and 180 seconds. That is the window where intervention still works. Inside it, a talkdown, a dispatch, or a loud speaker ping changes the outcome. Outside it, you are doing an insurance claim and reviewing footage. Detection that fires in minute 4 is a recording. Detection that fires in second 20 is a prevention. Non-retail AI theft detection has to be tuned around that window, which is why latency from tile to notification is a core spec, not a nice-to-have.

How does Cyrano capture video without replacing my cameras?

Cyrano does not pull the IP streams from individual cameras. It taps the DVR's HDMI multiview output, the same signal that drives the guard monitor. That composite video already has every camera on the recorder mosaiced into tiles, so one HDMI tap gives inference access to every feed at once. This works identically for analog cameras, IP cameras, rebranded Hikvision, Dahua, Lorex, Uniview, and any DVR or NVR with an HDMI port. No camera firmware involvement, no ONVIF negotiation, no credentials to coordinate. Physical install on a running DVR is under 2 minutes: HDMI in from the DVR, HDMI out to the monitor, network cable, power.

What about DVR on-screen graphics like the clock, camera names, and channel bug? Do they confuse the model?

Yes, and this is the part most vendors skip. Every DVR overlays a clock, a per-tile camera name strip, and a channel bug. To a detection model, those glyphs are visual noise that can trigger on anything from a bounding box on a camera name to a false positive on the clock's colon. Cyrano's install step labels each tile and masks the fixed-position overlays before inference so the model only sees the scene. This happens once per DVR layout, not once per frame.

How many cameras can one Cyrano unit cover?

Up to 25 tiles per unit off a single DVR multiview. If the DVR is set to a 4x4 grid the unit runs inference across 16 tiles in parallel. If it is a 5x5 it runs across 25. If the operator switches the DVR to fullscreen on a specific camera, the unit re-scopes to that single camera at full resolution, so during an active incident you actually get higher per-tile accuracy, not lower. A property with more than 25 cameras typically runs one Cyrano unit per DVR.

Why is the alert delivered over WhatsApp instead of a dedicated portal?

Because the channel that gets read is the channel already open on the phone. Property staff already use WhatsApp for maintenance, move-outs, and vendor coordination. Adding one more thread is a free change. Adding another app with another login is how alerts end up unread. Dedicated monitoring portals start around $250 per camera per month because they assume a human watching a feed. One WhatsApp thread per property, with a tile thumbnail and a timestamp on every alert, matches how Class B and C multifamily, construction, and small commercial teams already communicate. It is a deliberate product choice, not a shortcut.

What kinds of theft does this actually catch?

The categories where pre-action detection has the most leverage: package theft from mailrooms and lobbies, cable and copper theft at transformer pads and conduit runs, HVAC theft (copper line sets, condenser units), cargo theft at loading docks and trailer yards, parking-lot theft of vehicles and catalytic converters, and jobsite theft of tools, wire, and material. Every one of these has a pre-action window where the actor enters a defined zone, pauses, assesses, and then acts. Retail shoplifting detection models do not cover any of these, because none of them involve a register, an aisle, or a shopper gait.

Does this actually produce fewer false positives than a generic motion alarm?

Yes, because the entire pipeline is scoped. A raccoon, a trash bag blowing across the lot, a delivery driver turning around, or a shadow at sunset triggers a motion sensor and mutes the alert channel within a week. Person detection plus a drawn zone plus a dwell threshold plus a time window is three filter layers compared to one. In production deployments, a typical 16-camera property generates a handful of read-able alerts per day rather than dozens per night. That is the difference between an alert channel that stays in service and one that gets muted.

Worth saying plainly

AI theft detection is not one product. It is two. The retail version classifies pose inside an aisle and is built around the POS. The non-retail version watches zones against a schedule and is built around the pre-action intervention window. The SERP is saturated with the first one, and most property teams searching for “AI theft detection” are actually shopping for the second. Cyrano is the second.

If you are evaluating for apartment, commercial, or construction use, the checklist is short: does the capture point work on the DVR you already have, does the detection fire inside the pre-action window, does the alert land somewhere a human will actually read it. Every other marketing claim is noise.