Matthew Diakonov, Written with AI

Published April 21, 202613 min read

Theft Detection System Architecture

A theft detection system has seven components. On your property, five of them are already installed.

Every vendor page for a theft detection system quietly assumes you are buying the whole stack: sensors, cabling, recorder, analytics, alert channel, and dispatch protocol. On any building that has had CCTV for three years or more, five of those layers are already running. This guide decomposes the stack, names the two components that actually need to be added, and walks through the one architectural choice that decides whether it is a two-minute install or a twelve-week project.

See the seven-layer breakdown on a live DVR

4.9from 50+ properties

$450 hardware, $200/mo software, 2-minute HDMI install

5 of 7 components reused from existing DVR stack

25 camera tiles per unit across one HDMI multiview

Edge inference, no raw video leaves the building

You already own a theft detection system

Five of the seven components are already installed. Only two need to be added.

Component 1-3: cameras, cabling, DVR. Already installed.

Component 7: staff WhatsApp thread. Already in use.

Missing: component 4 (signal access) and component 5-6 (detection + scope).

Our system is a single node that occupies exactly those missing layers.

0:00 / 0:05

The stack the market refuses to decompose

Search for “theft detection system” and the first results describe integrated stacks you buy as a unit. Retail EAS pages itemize transmitter pedestals, AM or RF tags, detection gates, and deactivation pads, sold together. Vehicle anti-theft pages bundle alarms, immobilizers, and GPS trackers as a single subscription. Enterprise computer-vision platforms sell cameras, cloud analytics, and a monitoring portal as an 18-month integration project. Even the residential alarm market packages sensors, a central host, and a keypad as one SKU.

None of those pages will tell you that on a typical Class B or C multifamily property, the last time somebody installed CCTV was three to eight years ago. The cameras, the coax, the DVR, the guard-monitor HDMI cable, and the staff WhatsApp thread are all already there. So when a vendor quotes $50,000 for a theft detection system, the line items you are actually being asked to pay for are mostly replacements for hardware that works.

The honest framing is that a theft detection system is a stack of seven layers, five of which are on your property today, and the only real purchase decision is how to fill components four, five, and six without disturbing the rest.

The seven components, mapped to your property

Each card below names one layer of the stack, what it does, and whether it is already installed or needs to be added. Five of them read as “already installed.” Two are the actual work. This is the map the industry does not draw on its pricing pages.

Component 1. Optical capture

Camera lens, image sensor, IR illuminator. Already installed on every property with CCTV. Our system runs on whatever the installers put up in 2018, 2020, or last year. Brand does not matter.

Component 2. Transport

Coax, siamese cable, or PoE ethernet runs back to the recorder. Also already installed. Replacing this is rewiring, and rewiring is a six-figure job.

Component 3. DVR / NVR

The box that composites feeds, records to disk, and drives the guard monitor. Already installed. Hikvision, Dahua, Lorex, Swann, Uniview, or a rebrand. All of them expose an HDMI multiview.

Component 4. Signal access (MISSING)

The bridge from the existing recorder to a detection process. Industry default is per-camera ONVIF/RTSP pull, which means credentials, firmware checks, and multicast config per camera. Our system's choice: tap the HDMI multiview. One cable. 25 tiles at once.

Component 5. Detection (MISSING)

Object-aware person detection and classification. Replaces pixel-based motion alarms. This is where the AI actually lives. Runs on the edge AI device next to the DVR.

Component 6. Scope filters (MISSING)

Zone, dwell, and armed-time-window gates. Without this layer, a detection model floods the alert channel with every resident, vendor, and dog. With it, the 200 to 300 daily detections on a 16-camera property compress to 3 to 8 delivered alerts.

Component 7. Egress channel

The communication channel that carries the alert to a reader. On most properties this is WhatsApp, because it is already on every manager's phone. Already installed. Our system posts into the existing thread as a webhook; no new app, no new login.

Where the signal enters and exits

The diagram below shows the boundary between what is already on site and what the our system node adds. Everything on the left edge and everything on the right edge is reused as-is. The hub is the three missing components, collapsed onto one device.

Existing hardware (left) and existing channels (right), with the missing three components on one node

The architectural choice that collapses the install

Component 4 is where most theft detection systems die on install. The industry default is per-camera ingress: the detection process pulls an RTSP or ONVIF stream from each camera individually. Every camera needs a reachable IP, a credential, and a firmware that speaks a compatible profile. On a 16-camera property with four camera vintages and two brands, that integration step alone runs weeks.

Our system swaps that default for a different ingress: the DVR's HDMI multiview output. That signal already has every camera composited into a tile grid (4x4 for 16, 5x5 for 25), because that is what the DVR has been showing on the guard monitor the whole time. Tapping it is one HDMI-to-HDMI splitter. No camera configuration happens on the camera side of the tap. The cameras do not know the our system node exists.

The price of the choice is that our system has to handle every DVR's burned-in on-screen graphics cleanly, which is the subject of the next section.

The overlay mask, cached by layout_id

Every DVR prints graphics onto its multiview output: a clock in one corner, a channel bug in another, a per-tile camera name strip at the top or bottom of each tile. To a vision model those glyphs are visual noise that generate false positives (the colon in a clock flickers; a bounding-box model can classify a camera name strip as a small rectangular object). Our system handles this at install, not per frame.

The layout_id overlay pipeline

Step 1. Identify the DVR layout at install

On first boot, the node reads the multiview and classifies the tile grid into a layout_id string: 4x4-std, 5x5-std, 2x2-pip, 9-grid-plus-one, etc. Most installations are one of four common layouts.

Step 2. Compute the mask once

For that layout_id, the node computes a static pixel mask that blacks out the clock region, the channel bug region, and each tile's name strip. The mask is a binary PNG sized to the DVR's native output (typically 1920x1080 or 1280x720).

Step 3. Cache the mask and key it

The mask is stored on the device and keyed by layout_id. If the operator later switches the DVR from a 4x4 layout to a 5x5 layout, the node recognizes the new layout_id and loads (or computes once) the matching mask. No retraining.

Step 4. Apply before every inference frame

At frame time, the mask is applied to the captured frame as a bitwise-AND before the person detection model runs. The model sees the scene without the clock, the channel bug, or the name strip. The mask operation costs microseconds; it is not a per-frame cost.

Step 5. Log the layout_id with every event

Every delivered event payload includes the layout_id so downstream replay tools know which mask was in effect when the alert fired. This is what appears in the terminal log below.

What the event log actually prints

Below is a compressed segment of the internal event stream for a single mailroom camera across three evening hours. Every line names the camera, the layout_id the mask is keyed to, the zone the person entered, the dwell seconds counted, and the reason the event was or was not delivered. Most detections are silently dropped by one of the three scope filters. The survivors become WhatsApp messages.

cyrano event stream, layout_id=4x4-std, camera=mailroom-01

5 of 7

“A theft detection system is seven components deep. Five of them are already installed on any property with working CCTV. The purchase decision is which node fills the remaining two without replacing the five you own.”

Our system architecture notes, property retrofit deployments

The retrofit economics, in one row

$0one-time hardware

$0/mosoftware subscription

0camera tiles per unit

0minutes to install

Hardware is $450 one-time per device. Software starts at $200 per month beginning month two. Install is measured in minutes because components one through three are not being touched.

The bundled-vendor model, priced against the retrofit

Below is the side-by-side that follows from the seven-component decomposition. A bundled theft detection system is charging you to replace hardware that works. A retrofit system charges you only for the components that are actually missing.

Feature	Bundled smart-camera vendor	Our system
Components 1-3 (cameras, cabling, DVR)	Replace. $30,000 to $70,000 per property.	Reuse. $0. Any brand, any age.
Component 4 (signal access)	Per-camera IP pull, ONVIF negotiation, credentials.	HDMI multiview tap. One cable, 25 tiles at once.
Component 5 (detection)	Cloud inference. Uploads footage continuously.	Edge inference on the device next to the DVR.
Component 6 (scope filters)	Vendor-set defaults. Hard to tune per camera.	Polygon zones drawn per tile, dwell and window per zone.
Component 7 (alert channel)	New portal. New app. New login per staff member.	Existing WhatsApp thread. Webhook, no client.
Install timeline	8 to 12 weeks of install and integration.	Under 2 minutes of physical install, under 1 hour of zone config.
Year-one cost, 16-camera property	$50,000 to $100,000 plus $250/camera/month cloud.	$2,850 total.

The signal path, end to end

Every delivered alert travels the same path. Four existing components at the front, one our system node in the middle, two existing components at the back.

capture -> compose -> tap -> mask -> detect -> scope -> deliver

Camera

already installed

DVR

composites tiles

HDMI tap

cable split

Mask

layout_id lookup

Detect

person + object

Scope

zone + dwell + time

staff thread

Where this leaves the buyer

The market for theft detection systems sorts into two honest positions and one dishonest one. The first honest position is greenfield: if you have no CCTV at all, buy a bundled system. You are paying for all seven components because you have none. The second honest position is retrofit: if you have working cameras and a recorder, buy a node that fills only components four, five, and six and leaves the rest alone. That is what We sell.

The dishonest position is the bundled retrofit, where a vendor arrives at a property with working cameras and charges $50,000 to rip them out because their system assumes it owns every layer. That arrangement is often what “theft detection system” means in an enterprise RFP, and it is why the line item gets pushed out of the budget cycle after cycle.

The seven-component map is how to notice the difference in advance, before the quote lands.

A compact summary

A theft detection system has 0 components; 0 are already on any established CCTV site.
The missing layers are signal access, detection, and scope filters.
Our system occupies exactly those three layers on one device.
Signal access is an HDMI multiview tap, not a per-camera IP pull.
An overlay mask keyed by layout_id handles DVR on-screen graphics.
Alerts exit into the staff WhatsApp thread; no new app, no new login.

Want the seven-layer map against your property?

Book 20 minutes. We will walk your exact DVR and show which layers you already own and which ones our system replaces.

Frequently asked questions

What are the seven components of a theft detection system?

In order from the optical end to the operator: (1) the camera sensor and lens, (2) the coax or ethernet transport, (3) the DVR or NVR that composites feeds and records to disk, (4) the signal-access layer that exposes video to a detection process, (5) the detection and classification model itself, (6) the scope filters that gate which detections become alerts (zone, dwell, armed time window), and (7) the egress channel that carries alerts to a reader. On any property with CCTV that is more than three years old, components 1 through 3 and component 7 are already installed and running. The actual work of adding a theft detection system is confined to components 4, 5, and 6.

Why should I care about the seven-component decomposition instead of just buying a product?

Because every 'all-in-one' theft detection system on the market sells you the five components you already have bundled in with the two you do not. That is where the $50,000 to $100,000 per-property rip-and-replace quotes come from. The bundle math only works in vendor spreadsheets. If you separate the layers, the question 'what does it cost to add theft detection to this property' becomes a question about components 4, 5, and 6 only. Those three components can be delivered as a single $450 device and a $200 per month software subscription. The rest of the stack does not get touched.

How does our system handle the signal-access layer (component 4) without replacing my cameras?

By tapping the DVR's HDMI multiview output, which is the same composite signal that already drives the guard monitor. That signal carries every camera on the recorder mosaiced into tiles, so a single HDMI cable gives the inference pipeline access to every feed at once. This avoids ONVIF negotiation, per-camera credential management, firmware compatibility checks, and multicast network configuration, because all of that work has already been done by the DVR when it composed the multiview. The physical install is HDMI in from the DVR, HDMI out to the guard monitor, network cable, power. Under two minutes on a running system.

What is the overlay mask, and why does it live at the install step instead of the frame step?

Every DVR burns its own on-screen graphics into the multiview output: a clock in one corner, a channel bug in another, and a per-tile camera name strip at the top or bottom of each tile. Those glyphs are visible in the pixels the inference model receives. Without special handling, a bounding-box model can misclassify a camera name text strip as an object, or a clock colon flicker as motion. Our system computes an overlay mask once per DVR layout_id (for example 4x4-std or 5x5-std), caches it, and applies it before every inference frame so the model never sees those fixed pixel regions. The mask is a lookup by layout_id, not a per-frame computation.

What goes in the event payload when a theft detection system alert fires?

A delivered alert carries a tile thumbnail cropped from the DVR multiview, the polygon zone label that was crossed, the dwell seconds that were counted before the alert fired, a timestamp, the camera name, the DVR's layout_id (4x4-std, 5x5-std, etc., used to apply the overlay mask), and the end-to-end latency in milliseconds from frame capture to message send. The event class for a pre-action pattern is pre_action_zone_entry. Those fields are what make an alert actionable: a dispatcher opening the WhatsApp message can verify the scene, read the dwell count, and decide to talk down, dispatch, or log.

How many cameras does one unit cover?

Up to 25 tiles off a single DVR HDMI multiview. If the DVR is set to a 4x4 grid the unit runs inference across 16 tiles in parallel; if it is set to a 5x5 it runs across 25. When an operator switches the DVR to fullscreen on a specific camera during an active incident, the unit re-scopes to that single camera at full resolution, so per-tile accuracy goes up during the moments that matter. A property with more than 25 cameras typically runs one unit per DVR.

Does video leave the property for inference?

No. All detection and classification happens on the device, which sits in the same rack as the DVR. Only the alert payload (a small thumbnail, the zone label, dwell seconds, timestamp) leaves the building, and only when a zone-verified dwell-threshold-cleared in-window event fires. Raw video is never uploaded. This keeps bandwidth costs zero for the detection layer and keeps the system compatible with tenant-privacy expectations in multifamily.

Which categories of theft does this architecture actually cover in production?

Package theft at mailroom doors and lobby shelves, cable and copper theft at transformer pads and conduit runs, HVAC theft at condenser cages and line-set chases, cargo theft at loading-dock aprons and trailer yards, parking-lot theft including vehicle entry and catalytic converter removal, and jobsite theft from conex boxes and staging areas. Each of these shares a structural pattern the filter stack maps onto cleanly: a defined target zone, a pre-action pause while the actor positions, and a time window during which legitimate presence is near zero.

How long does it take from purchase to the first delivered alert?

Hardware install is under 30 minutes, counting cable routing. Zone configuration (drawing polygons on each camera tile and setting the armed time windows) typically takes 15 to 45 minutes per property depending on camera count and how many zones the operator wants. A 16-camera multifamily property with 8 zones is usually alert-ready inside one afternoon. The tagline is not marketing filler: 'live in one afternoon, not one quarter' is a deliberate contrast with the 8-to-12-week timeline of a camera-replacement project.

What does this cost compared to the bundled alternatives?

Our system is $450 one-time for the hardware plus $200 per month for the software starting month two. For a 16-camera multifamily property that is $450 + ($200 * 12) = $2,850 in year one and $2,400 per year thereafter. A full-replacement smart-camera deployment on the same property typically runs $50,000 to $100,000 up front plus $250 per camera per month in cloud subscription fees. On a 16-camera site the bundled model crosses $100,000 within three years. A single on-site guard runs $3,000 to $5,000 per month, which is more per month than It costs for two years.