Matthew Diakonov, Written with AI

Published April 18, 202611 min read

CCTV Operations Guide

CCTV real time event detection is only useful if a human reads the alert.

Every r/cctv and r/homedefense thread about adding AI to a camera system converges on the same two arguments. One side says get an 80-class object detector, it sees everything. The other side says you will ignore every notification inside a week. They are both right, and that is the actual problem. This guide is about the shortlist approach: six events, one delivery channel, one thread per property, read rate that stays high after the first week.

See the six-event shortlist on a real DVR

4.9from 50+ properties

Six event classes, not eighty

One WhatsApp thread per property

Under 60 seconds from event to notification

Runs on your existing DVR, no camera replacement

The Six-Event Shortlist

Real time event detection that does not collapse under alert fatigue

Object detection finds things. Event detection finds trouble.

Our system ships six event classes, not eighty

Each event has a dwell time, a time window, a zone

Delivery lands in one WhatsApp thread per property

Under 60 seconds from the tile to your phone

0:00 / 0:05

Object detection is not the same thing as event detection

The most common mistake in spec sheets is conflating the two. Object detection is the model saying that is a person, that is a car, that is a bicycle. Event detection is the operator deciding that person in that zone at that hour with that dwell time is worth interrupting someone for. A modern YOLO-class model does the first job out of the box. The second job is a scoping decision, not a model choice.

The practical effect of treating object detection like event detection is alert fatigue. At a typical 16-camera property, raw person and vehicle detection fires hundreds of times per day during normal operations: residents walking to the mailbox, cars entering the lot, contractors arriving, kids heading to the pool. None of it is a security event. All of it generates a notification if the AI is wired directly to the alert channel. Within five days, the person receiving those notifications either mutes the channel or stops looking at their phone.

Real time event detection is the layer that decides which object detections become alerts. The decision is specific to a property, a time window, a zone, and a behavior. That layer is what the rest of this guide is about.

The six events that make the shortlist

Each of these has a hard operational trigger. A model classification on its own is never enough. The event only fires when classification plus dwell plus zone plus time window all agree.

Our system event taxonomy

After-hours restricted-zone entry. Person detected in a zone (pool deck, leasing office, rooftop, gym) outside its configured open hours.
Loitering past dwell threshold. Person stationary inside a specific zone longer than a per-tile dwell setting, typically 60 to 180 seconds.
Tailgating at a gate or garage arm. Two persons or a person + vehicle passing through a single access event.
Package handling anomaly. Unattended package beyond a dwell threshold, or a person carrying a package away from a delivery zone outside delivery hours.
Vehicle in fire lane or tow zone. Vehicle detected inside a painted-off zone for longer than a short grace period.
Crowd formation at an entry. More than N people clustering within a defined polygon at a main entry, outside of the configured egress windows.

Signal path: DVR to notification in one diagram

The inputs on the left are the existing camera feeds running into the DVR. The DVR composites them to its HDMI multiview output, the same signal that drives the guard monitor. Our system taps that signal, runs per-tile inference, filters through the six-event rules, and routes to the delivery channels on the right.

The six-event pipeline, end to end

What the payload looks like when the alert lands

The difference between a message that gets read and a message that gets ignored is almost entirely payload quality. Raw model output (person, 0.87, bbox 214,418,298,612) is for machines. The message below is what actually hits a property manager's phone.

WhatsApp payload · one alert from a real deployment

The numbers, not the adjectives

Operational constants of the event detection layer. No marketing padding, no imagined scenarios.

0Event classes

0Tiles per unit

0 sEnd-to-end alert budget

0WhatsApp thread per property

“At a Class C multifamily property in Fort Worth, the six-event shortlist flagged 20 incidents in the first 30 days, including a break-in attempt. The DVR had been recording silently for months before anyone reviewed anything.”

Fort Worth, TX deployment

80 classes vs. six, as an operator actually experiences it

Toggle between the two. The 80-class system is not wrong in the model-accuracy sense. It is wrong in the alert-consumption sense.

A Tuesday afternoon at a 16-camera property

The model fires on every person, vehicle, bicycle, package, dog, umbrella, stroller, and handbag that enters frame. A 16-camera property running during a normal weekday generates 200+ notifications by 4 p.m. The on-site manager has muted the alert channel by Thursday. When a real incident fires on Saturday night, the notification arrives in a muted thread and is seen Monday morning.

200+ alerts per day during normal operations
Channel muted inside one week
Real incidents lost in the noise
No per-zone, per-time scoping
Reviewer reads confidence scores, not scenes

Watch a live shortlist fire on a DVR

15-minute demo. We connect to a running DVR's HDMI, show per-tile inference, and walk through a real alert landing on WhatsApp.

Book a demo →

What happens in the 60 seconds between event and notification

Tile capture

HDMI frame grabber pulls the current composite frame from the DVR's multiview output at ~30 fps. The composite is split into per-tile crops using the layout detected at install time.

Per-tile object detection

A detection model runs on each tile. Output is a list of object classes and bounding boxes, masked for DVR overlay graphics (clock, camera name strip, channel bug).

Event rule evaluation

Each detection is checked against the tile's event rules: is this zone armed right now, does the object class match a shortlist trigger, has the dwell threshold been crossed?

Debounce and deduplicate

A short debounce window prevents the same loitering incident from firing ten alerts. Only the first alert per event window becomes a message.

Payload assembly

The triggering tile is cropped to a thumbnail, the event class + tile label + zone + dwell + timestamp are formatted into a short WhatsApp-friendly message.

Delivery

The message is posted into the property's WhatsApp thread. This is where the only latency jitter lives; WhatsApp fan-out is usually fast but not bounded.

Event-scoped detection vs. raw object detection

Same cameras, same property, same week. The difference is at the event layer, not the model layer.

Feature	80-class object detection	Six-event shortlist
Daily alerts at a 16-camera property	200+	3 to 8
Per-zone time windows	Rarely	Yes, per tile
Dwell thresholds per event	None	60 to 180 s tunable
DVR overlay masking	Not applicable	Per-tile at install
Delivery channel	App notification or email	WhatsApp thread per property
Payload includes tile thumbnail	Sometimes	Yes
Typical read rate after week one	Channel muted	High
Latency, event to phone	Variable, sometimes minutes	Under 60 s

How an event earns a spot on the shortlist

A new event class is not added because a customer says “it would be cool to detect X.” It is added when a pattern repeats at a specific property with a specific operational cost, and ignoring it costs more than the alert fatigue of listening to it. The decision criteria are explicit.

An event earns the shortlist when:

There is a clear physical action an operator takes in response
The scene has a time window where it is non-trivially different
The event has a verifiable dwell or zone condition
The false-positive rate at the chosen threshold is under a few per week
Ignoring the event has a measurable cost on incident reports

An event gets rejected when:

The response is “hmm, good to know” (that is a report, not an alert)
The scene looks normal 23 hours a day (zone and time cannot scope it)
The dwell threshold needed to stabilize it is longer than the response window
Detection only works in one unusual lighting condition
It duplicates an existing shortlist event under a different label

What the system deliberately does not alert on

Every chip below is a thing the model can classify and the event layer refuses to turn into an alert. The refusal is the product.

bicycle

stroller

umbrella

pet dog

handbag

delivery truck, delivery window

resident, leasing hours

contractor, work hours

leaf blowing across pool deck

shadow at sunset

headlights through fence

maintenance cart, daylight

friend dropping off, 8pm

cat on hood of car

How the under-60-second number actually breaks down

The “real time” in real time event detection is a latency budget, not a marketing adverb. Here is what fills it.

Capture

~0 ms

One composite frame off the DVR's HDMI at ~30 fps.

Per-tile inference

<0 s

On-device object detection across all visible tiles.

Dwell + debounce

~0 s

Typical loitering dwell before the event stabilizes.

Delivery

<0 s

WhatsApp fan-out. The one part we do not control.

Dwell is the big chunk of the budget, and it is intentional. A loitering alert fired at second 3 has a 40% false positive rate against residents waiting for rides; fired at second 15 the rate drops to near zero. The latency budget is spent on precision.

Frequently asked questions

Why six events and not eighty? My IP camera vendor advertises eighty detection classes.

Eighty classes is an object detection inventory, not an event list. Detecting a bicycle, a stroller, a cat, a suitcase, and a fire hydrant are all things a YOLO model can do. None of them are security events. The six-event shortlist (after-hours restricted-zone entry, loitering past a configured dwell, tailgating, unattended or mishandled package, vehicle in a fire lane, crowd formation at an entry) was chosen to match the events an operator on a property actually responds to. A property manager who gets a bicycle alert three times a day stops reading alerts inside a week. That is worse than no alerts, because now there is a false sense of coverage.

What is the end-to-end latency from event happening to phone notification?

Under 60 seconds. The bottleneck is not inference, it is WhatsApp's fan-out delivery. Capture runs at roughly the DVR's HDMI output frame rate (typically 30 fps composite), per-tile inference is sub-second on the edge device, the event passes through a short debounce so you do not get ten alerts for one loitering incident, the thumbnail is cropped from the triggering tile, the payload is pushed to the WhatsApp thread for the property, and WhatsApp delivers. On a tested real deployment the perceived latency from the event occurring to the notification landing is a few seconds.

What does the actual alert look like when it lands?

A thumbnail cropped from the triggering tile, the tile label (the human-readable camera name, for example 'pool gate' or 'dumpster corral'), the event class (one of the six), the timestamp, and a short description. It lands in one WhatsApp thread per property, so a regional manager covering twelve buildings gets twelve separate threads they can mute or escalate individually. No custom app to install, no separate portal to log into.

How does the device filter out DVR overlay graphics like the clock, camera name strips, and the channel bug?

Tile labeling pins the DVR's on-screen graphics to a fixed position within each tile's bounding box and masks those regions before inference. The camera name strip at the top of each tile and the channel bug in the corner are part of the grid, not the scene, so they get zeroed out before the detection model sees the frame. This is why the install includes a 'walk the property and label each tile' step rather than just auto-detecting a 4x4 grid.

What happens when an operator switches the DVR from the multiview grid to single-camera fullscreen?

The device detects the layout change and re-scopes to the single camera being shown. Inference now runs on one full 1080p tile instead of twenty-five cropped tiles, so per-event accuracy actually improves. When the operator switches back to the grid, the tile layout is re-detected and per-tile inference resumes. This is important in practice because guards routinely go full-screen on a specific camera during an active incident.

Why WhatsApp and not a dedicated dashboard with an SLA-backed monitoring service?

Property staff already use WhatsApp. Adding another app is the surest way to get alerts ignored. A dedicated monitoring service with guaranteed response times starts around $250 per camera per month and assumes a trained human is watching your feed continuously. For a 24-camera building, that is $6,000 per month, ten times the cost of real-time event detection plus WhatsApp delivery. The tradeoff is honest: WhatsApp does not come with a monitoring SLA. It comes with the channel your team actually opens. For most Class B and C multifamily properties, 'alerts that get read' beats 'alerts that go to a portal nobody logs into.'

Can the system learn new event types, or am I stuck with six?

New event types get added when a specific property has a repeated operational problem the shortlist does not cover, for example a recurring dumpster-diving pattern or a specific vehicle that should never be on the lot. What is deliberately avoided is adding events just to pad a marketing page. The working assumption is that every event you add to the shortlist shortens the attention budget of every other event. An event only earns a place on the list when ignoring it costs the property more than the alert fatigue of listening to it.

Worth saying out loud

The six-event shortlist is not a ceiling on what is technically detectable. It is a ceiling on what is operationally readable. Real time CCTV event detection is an alert-economics problem wearing a machine-learning costume. The model is the cheap part. The shortlist, the zones, the dwell, and the delivery channel are the part that decides whether the system is used a month after install.

If you are evaluating vendors for a multifamily or commercial property, run the test that matters: ask how many alerts a 16-camera property will send on a typical Tuesday. Anything north of 20 is a spec sheet, not an operational plan.

CCTV real time event detection is only useful if a human reads the alert.

Object detection is not the same thing as event detection

The six events that make the shortlist

Signal path: DVR to notification in one diagram

The six-event pipeline, end to end

What the payload looks like when the alert lands

The numbers, not the adjectives

80 classes vs. six, as an operator actually experiences it

A Tuesday afternoon at a 16-camera property

What happens in the 60 seconds between event and notification

Tile capture

Per-tile object detection

Event rule evaluation

Debounce and deduplicate

Payload assembly

Delivery

Event-scoped detection vs. raw object detection

How an event earns a spot on the shortlist

An event earns the shortlist when:

An event gets rejected when:

What the system deliberately does not alert on

How the under-60-second number actually breaks down

Frequently asked questions

Worth saying out loud

Comments (••)

Comments ()