CCTV real time event detection is only useful if a human reads the alert.
Every r/cctv and r/homedefense thread about adding AI to a camera system converges on the same two arguments. One side says get an 80-class object detector, it sees everything. The other side says you will ignore every notification inside a week. They are both right, and that is the actual problem. This guide is about the shortlist approach: six events, one delivery channel, one thread per property, read rate that stays high after the first week.
See the six-event shortlist on a real DVRObject detection is not the same thing as event detection
The most common mistake in spec sheets is conflating the two. Object detection is the model saying that is a person, that is a car, that is a bicycle. Event detection is the operator deciding that person in that zone at that hour with that dwell time is worth interrupting someone for. A modern YOLO-class model does the first job out of the box. The second job is a scoping decision, not a model choice.
The practical effect of treating object detection like event detection is alert fatigue. At a typical 16-camera property, raw person and vehicle detection fires hundreds of times per day during normal operations: residents walking to the mailbox, cars entering the lot, contractors arriving, kids heading to the pool. None of it is a security event. All of it generates a notification if the AI is wired directly to the alert channel. Within five days, the person receiving those notifications either mutes the channel or stops looking at their phone.
Real time event detection is the layer that decides which object detections become alerts. The decision is specific to a property, a time window, a zone, and a behavior. That layer is what the rest of this guide is about.
The six events that make the shortlist
Each of these has a hard operational trigger. A model classification on its own is never enough. The event only fires when classification plus dwell plus zone plus time window all agree.
Cyrano event taxonomy
- After-hours restricted-zone entry. Person detected in a zone (pool deck, leasing office, rooftop, gym) outside its configured open hours.
- Loitering past dwell threshold. Person stationary inside a specific zone longer than a per-tile dwell setting, typically 60 to 180 seconds.
- Tailgating at a gate or garage arm. Two persons or a person + vehicle passing through a single access event.
- Package handling anomaly. Unattended package beyond a dwell threshold, or a person carrying a package away from a delivery zone outside delivery hours.
- Vehicle in fire lane or tow zone. Vehicle detected inside a painted-off zone for longer than a short grace period.
- Crowd formation at an entry. More than N people clustering within a defined polygon at a main entry, outside of the configured egress windows.
Signal path: DVR to notification in one diagram
The inputs on the left are the existing camera feeds running into the DVR. The DVR composites them to its HDMI multiview output, the same signal that drives the guard monitor. Cyrano taps that signal, runs per-tile inference, filters through the six-event rules, and routes to the delivery channels on the right.
The six-event pipeline, end to end
What the payload looks like when the alert lands
The difference between a message that gets read and a message that gets ignored is almost entirely payload quality. Raw model output (person, 0.87, bbox 214,418,298,612) is for machines. The message below is what actually hits a property manager's phone.
The numbers, not the adjectives
Operational constants of the event detection layer. No marketing padding, no imagined scenarios.
“At a Class C multifamily property in Fort Worth, the six-event shortlist flagged 20 incidents in the first 30 days, including a break-in attempt. The DVR had been recording silently for months before anyone reviewed anything.”
Fort Worth, TX deployment
80 classes vs. six, as an operator actually experiences it
Toggle between the two. The 80-class system is not wrong in the model-accuracy sense. It is wrong in the alert-consumption sense.
A Tuesday afternoon at a 16-camera property
The model fires on every person, vehicle, bicycle, package, dog, umbrella, stroller, and handbag that enters frame. A 16-camera property running during a normal weekday generates 200+ notifications by 4 p.m. The on-site manager has muted the alert channel by Thursday. When a real incident fires on Saturday night, the notification arrives in a muted thread and is seen Monday morning.
- 200+ alerts per day during normal operations
- Channel muted inside one week
- Real incidents lost in the noise
- No per-zone, per-time scoping
- Reviewer reads confidence scores, not scenes
Watch a live shortlist fire on a DVR
15-minute demo. We connect to a running DVR's HDMI, show per-tile inference, and walk through a real alert landing on WhatsApp.
Book a demo →What happens in the 60 seconds between event and notification
Tile capture
HDMI frame grabber pulls the current composite frame from the DVR's multiview output at ~30 fps. The composite is split into per-tile crops using the layout detected at install time.
Per-tile object detection
A detection model runs on each tile. Output is a list of object classes and bounding boxes, masked for DVR overlay graphics (clock, camera name strip, channel bug).
Event rule evaluation
Each detection is checked against the tile's event rules: is this zone armed right now, does the object class match a shortlist trigger, has the dwell threshold been crossed?
Debounce and deduplicate
A short debounce window prevents the same loitering incident from firing ten alerts. Only the first alert per event window becomes a message.
Payload assembly
The triggering tile is cropped to a thumbnail, the event class + tile label + zone + dwell + timestamp are formatted into a short WhatsApp-friendly message.
Delivery
The message is posted into the property's WhatsApp thread. This is where the only latency jitter lives; WhatsApp fan-out is usually fast but not bounded.
Event-scoped detection vs. raw object detection
Same cameras, same property, same week. The difference is at the event layer, not the model layer.
| Feature | 80-class object detection | Six-event shortlist |
|---|---|---|
| Daily alerts at a 16-camera property | 200+ | 3 to 8 |
| Per-zone time windows | Rarely | Yes, per tile |
| Dwell thresholds per event | None | 60 to 180 s tunable |
| DVR overlay masking | Not applicable | Per-tile at install |
| Delivery channel | App notification or email | WhatsApp thread per property |
| Payload includes tile thumbnail | Sometimes | Yes |
| Typical read rate after week one | Channel muted | High |
| Latency, event to phone | Variable, sometimes minutes | Under 60 s |
How an event earns a spot on the shortlist
A new event class is not added because a customer says “it would be cool to detect X.” It is added when a pattern repeats at a specific property with a specific operational cost, and ignoring it costs more than the alert fatigue of listening to it. The decision criteria are explicit.
An event earns the shortlist when:
- There is a clear physical action an operator takes in response
- The scene has a time window where it is non-trivially different
- The event has a verifiable dwell or zone condition
- The false-positive rate at the chosen threshold is under a few per week
- Ignoring the event has a measurable cost on incident reports
An event gets rejected when:
- The response is “hmm, good to know” (that is a report, not an alert)
- The scene looks normal 23 hours a day (zone and time cannot scope it)
- The dwell threshold needed to stabilize it is longer than the response window
- Detection only works in one unusual lighting condition
- It duplicates an existing shortlist event under a different label
What the system deliberately does not alert on
Every chip below is a thing the model can classify and the event layer refuses to turn into an alert. The refusal is the product.
How the under-60-second number actually breaks down
The “real time” in real time event detection is a latency budget, not a marketing adverb. Here is what fills it.
Capture
~0 ms
One composite frame off the DVR's HDMI at ~30 fps.
Per-tile inference
<0 s
On-device object detection across all visible tiles.
Dwell + debounce
~0 s
Typical loitering dwell before the event stabilizes.
Delivery
<0 s
WhatsApp fan-out. The one part we do not control.
Dwell is the big chunk of the budget, and it is intentional. A loitering alert fired at second 3 has a 40% false positive rate against residents waiting for rides; fired at second 15 the rate drops to near zero. The latency budget is spent on precision.
Frequently asked questions
Why six events and not eighty? My IP camera vendor advertises eighty detection classes.
Eighty classes is an object detection inventory, not an event list. Detecting a bicycle, a stroller, a cat, a suitcase, and a fire hydrant are all things a YOLO model can do. None of them are security events. The six-event shortlist (after-hours restricted-zone entry, loitering past a configured dwell, tailgating, unattended or mishandled package, vehicle in a fire lane, crowd formation at an entry) was chosen to match the events an operator on a property actually responds to. A property manager who gets a bicycle alert three times a day stops reading alerts inside a week. That is worse than no alerts, because now there is a false sense of coverage.
What is the end-to-end latency from event happening to phone notification?
Under 60 seconds. The bottleneck is not inference, it is WhatsApp's fan-out delivery. Capture runs at roughly the DVR's HDMI output frame rate (typically 30 fps composite), per-tile inference is sub-second on the edge device, the event passes through a short debounce so you do not get ten alerts for one loitering incident, the thumbnail is cropped from the triggering tile, the payload is pushed to the WhatsApp thread for the property, and WhatsApp delivers. On a tested real deployment the perceived latency from the event occurring to the notification landing is a few seconds.
What does the actual alert look like when it lands?
A thumbnail cropped from the triggering tile, the tile label (the human-readable camera name, for example 'pool gate' or 'dumpster corral'), the event class (one of the six), the timestamp, and a short description. It lands in one WhatsApp thread per property, so a regional manager covering twelve buildings gets twelve separate threads they can mute or escalate individually. No custom app to install, no separate portal to log into.
How does the device filter out DVR overlay graphics like the clock, camera name strips, and the channel bug?
Tile labeling pins the DVR's on-screen graphics to a fixed position within each tile's bounding box and masks those regions before inference. The camera name strip at the top of each tile and the channel bug in the corner are part of the grid, not the scene, so they get zeroed out before the detection model sees the frame. This is why the install includes a 'walk the property and label each tile' step rather than just auto-detecting a 4x4 grid.
What happens when an operator switches the DVR from the multiview grid to single-camera fullscreen?
The device detects the layout change and re-scopes to the single camera being shown. Inference now runs on one full 1080p tile instead of twenty-five cropped tiles, so per-event accuracy actually improves. When the operator switches back to the grid, the tile layout is re-detected and per-tile inference resumes. This is important in practice because guards routinely go full-screen on a specific camera during an active incident.
Why WhatsApp and not a dedicated dashboard with an SLA-backed monitoring service?
Property staff already use WhatsApp. Adding another app is the surest way to get alerts ignored. A dedicated monitoring service with guaranteed response times starts around $250 per camera per month and assumes a trained human is watching your feed continuously. For a 24-camera building, that is $6,000 per month, ten times the cost of real-time event detection plus WhatsApp delivery. The tradeoff is honest: WhatsApp does not come with a monitoring SLA. It comes with the channel your team actually opens. For most Class B and C multifamily properties, 'alerts that get read' beats 'alerts that go to a portal nobody logs into.'
Can the system learn new event types, or am I stuck with six?
New event types get added when a specific property has a repeated operational problem the shortlist does not cover, for example a recurring dumpster-diving pattern or a specific vehicle that should never be on the lot. What is deliberately avoided is adding events just to pad a marketing page. The working assumption is that every event you add to the shortlist shortens the attention budget of every other event. An event only earns a place on the list when ignoring it costs the property more than the alert fatigue of listening to it.
Worth saying out loud
The six-event shortlist is not a ceiling on what is technically detectable. It is a ceiling on what is operationally readable. Real time CCTV event detection is an alert-economics problem wearing a machine-learning costume. The model is the cheap part. The shortlist, the zones, the dwell, and the delivery channel are the part that decides whether the system is used a month after install.
If you are evaluating vendors for a multifamily or commercial property, run the test that matters: ask how many alerts a 16-camera property will send on a typical Tuesday. Anything north of 20 is a spec sheet, not an operational plan.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.