Matthew Diakonov, Written with AI

Published April 18, 202613 min read

Suppression is the product

The part of an AI surveillance system nobody writes about is the pipeline between the detector and the phone.

Every guide on AI surveillance systems describes the same seven capabilities. Face detection, license plates, weapons, anomaly, objects, tailgating, loitering. What none of them describe is what turns a raw model firing into an alert a human actually reads. That is the suppression pipeline, and on a real deployment it has seven stages, most of which have no public writeup. This page walks each of them, in order, on one running system.

See the seven stages on a live deployment

4.9from 50+ properties

Raw detector firings per hour per property: 200 to 500 daytime

Delivered HIGH alerts per day to operator phone: 5 to 20

Stages between raw firing and phone buzz: 7

End-to-end frame to buzz latency: about 1.1 s

AI surveillance systems, seen from the operator's inbox

The detector is a commodity. The suppression pipeline is the product.

200 to 500 raw detector firings per hour

7 pipeline stages between firing and phone

Stage 5 is a second model: LOW vs HIGH

Operator sees 5 to 20 HIGH events per day

Latency: about 1.1 s, payload: about 240 KB

0:00 / 0:05

The public writing on AI surveillance systems is about capability. The deployed reality is about suppression.

Search “ai surveillance systems” and the first page is the Wikipedia article, the Brookings piece, the Pelco buyer's guide, the Avigilon explainer, and Coram AI's 2025 guide. Every one of them organizes the category by capability. Can the system recognize faces. Can it detect a weapon. Can it read a license plate. Can it flag an anomaly. That is a useful way to shop for a demo. It is not the shape of a deployed system at day 30.

At day 30 the question a property manager asks is not “can the AI detect a person.” It can, obviously. The question is how many times per day does the phone buzz, and how many of those buzzes were the thing the operator needed to see. That ratio is not a capability. It is the output of a pipeline that sits between the detector and the phone, and on a well-configured system it suppresses ninety-nine percent of raw firings before they reach a human.

This page is about that pipeline, specifically. Seven stages, in order, each with a named job and a named class of firings it drops.

0Stages between detector and operator

0%+Raw firings suppressed before delivery

0 sFrame to phone buzz

0 KBOutbound payload per delivered event

The pipeline, stage by stage

A raw person-class firing starts at stage 1 and either reaches stage 7 or gets dropped along the way. Most firings get dropped before stage 4. The ones that survive are the ones the operator was meant to see.

Stages 1 through 7 on a our system deployment

Stage 1. Overlay mask

Zero out the DVR clock, the per-tile channel name strip, and the channel bug before the detector ever sees the composite frame. Drops: Fires from moving digits, thin high-contrast bars, and logo bugs.

Stage 2. Class filter

Keep only person, vehicle, and package detections. Drop animals, foliage, light artifacts, reflections. Drops: Raccoons on jobsites, squirrels on patios, bobbing trees, headlight sweeps.

Stage 3. Zone-in-polygon

Check whether the bounding box is inside an operator-drawn polygon for that tile. No polygon, no alert. Drops: People on public sidewalks, cars on the access road, residents in their own doorway.

Stage 4. Dwell timer

Require 2 to 8 seconds of persistence in the zone before the event is real. No brief walkthroughs. Drops: A resident cutting through a corner of the loading dock, a delivery driver pivoting at a gate.

Stage 5. Threat classifier

Second model. Reads zone class, time of day, dwell seconds, class, and recent event history. Outputs LOW or HIGH. Drops: Daytime loiter in a public amenity. Morning trash drop in the compactor. The tenth benign event this hour.

Stage 6. Deduplicator

Suppress if the same zone on the same tile fired in the last N seconds. One event, one notification. Drops: A single visitor ringing for 90 seconds becomes one event, not thirty frames.

Stage 7. Delivery router

Pick the right operator WhatsApp thread for this property and zone class. Push the 240 KB payload. Log the LOW event to the dashboard. Drops: Cross-property noise. Wrong-operator delivery. The silent HIGH that nobody saw.

One firing, traced through every stage

This is what a single raw detection actually looks like as it moves through the pipeline. Captured on a 25-camera Class C multifamily property, tile index 2 by 3, overnight.

frame_ts=03:12:04.410 tile=[2,3] camera=compactor_alcove

Where each stage drops firings, with examples

The suppression rate is not a single tunable number. It is the sum of what each stage drops. Here are the specific classes of firings each stage is responsible for, at a typical multifamily property.

Stage 1 drops: the DVR's own chrome

The digital clock advancing a digit fires an edge detector. The channel name strip looks like a pedestrian at low resolution. The channel bug produces reliable false hits at sunset as global contrast shifts. All three are zeroed in pixel rectangles before the detector sees the frame.

Stage 2 drops: non-security classes

Raccoons, possums, deer, stray cats, flag poles flapping, trees in high wind, headlight sweeps across a garage wall. All detected, none escalated. Property security only needs person, vehicle, package.

Stage 3 drops: legitimate presence

A resident in their own doorway. A visitor on a public sidewalk. A maintenance worker at the dumpster during business hours. The zone polygon and active hours encode which pixels on which tile actually matter.

Stage 4 drops: transient passthrough

Someone pivoting at a gate, a delivery driver cutting a corner, a jogger crossing a frame. Two to eight seconds of dwell separates presence from intent in the zones where it matters.

Stage 5 drops: low-stakes events

A single-party loiter in the pool amenity at 6 PM on a summer evening is LOW. The same loiter in the compactor alcove at 3 AM is HIGH. Same detection, different zone class, different label.

Stage 6 drops: the same event, again

A visitor standing at a gate for 90 seconds produces thirty detector firings at 1 fps. The deduplicator collapses them into one event. The operator sees one notification, not thirty.

Stage 7 drops: wrong-operator delivery

An event on property A does not ring the B operator. A HIGH event routes to the night on-call thread, not to the property manager's personal phone. A LOW event logs to the dashboard and does not ring.

How the stages fit into the pixel path

The detector does not run in isolation. Inference pulls from the composite HDMI frame the DVR already renders. The seven suppression stages are the fan-out that converts one inference pass into zero, one, or a few operator-facing events.

From composite frame to operator phone

What the operator actually sees, as a sequence

The same event, viewed from the operator's side. The WhatsApp thread is the last stage of the pipeline, and the format is not decorative. The three attachments map one-to-one with what survived all seven stages.

HIGH event delivery, operator view

Phone buzz

New WhatsApp message in the night on-call thread, property name and zone in the first line.

Thumbnail

18 KB JPEG cropped to the tile that fired, bounding box overlaid. Operator reads the room before opening the clip.

Clip

220 KB H.264, 6 seconds bracketing the detection. No other tile visible. No frames before or after.

Metadata

612 byte JSON. Zone id, dwell seconds, threat label, tile index, timestamp. Parseable for downstream dispatch.

Operator action

Acknowledge, escalate to dispatch, or mark false positive. The feedback flows back into stage 5 training for that property.

Suppression stack versus raw detector, side by side

A typical AI surveillance system sold on capability against an AI surveillance system sold on operator outcome. The difference is measured on the operator's phone, not in the product datasheet.

Detector-only systems vs suppression-pipeline systems

Same underlying model. Radically different day-30 experience.

Feature	Typical detector-only AI surveillance	Our system (7-stage suppression pipeline)
Pipeline stages after raw detection	1 to 2 (sensitivity slider)	7 (mask, class, zone, dwell, threat, dedup, route)
Second model after the detector	no	yes (LOW vs HIGH threat classifier)
Per-tile polygon zones	global bounding box	operator-drawn, per-property
Per-zone active hours	rare	yes
Dedup against recent same-zone firings	often missing	yes
Delivery routing by zone class	single channel per property	yes
Typical delivered alerts per day	80 to 300, then muted	5 to 20 HIGH
Operator feedback path to stage 5	none, or ticket system	acknowledge / escalate / false-positive buttons
Frame to phone buzz latency	1 to 3 s	~1.1 s
Payload per delivered event	cloud link + full-frame archive	~240 KB (thumb + clip + JSON)

Four questions that tell you whether a vendor has a suppression pipeline

Ask these on the procurement call. The answers separate a system with seven stages from a system with one.

The suppression pipeline audit

Ask: per tile per day, how many raw detector firings produced a delivered operator alert? A vendor with a suppression pipeline records both numbers separately. A vendor without one reports a single count because the two are the same.
Ask: show me the per-zone polygon and active-hours configuration on my property. A vendor with stage 3 pulls it up on the dashboard. A vendor without stage 3 points at a sensitivity slider or a global bounding box.
Ask: how does the system distinguish a person in a public hallway at 6 PM from a person in a trash alcove at 3 AM? Stage 5 is the answer. A vendor without a threat classifier describes both as equal alerts with identical delivery.
Ask: what is the typical delivered alerts per day per property on your deployed fleet? Without a deduplicator and a threat classifier, the honest answer is in the hundreds per day, and operators mute the channel within two weeks.

~1 in 200

“On a 25-camera Class C multifamily property in Fort Worth, the raw detector produced roughly four thousand person-class firings in the first twenty-four hours of deployment. After the full seven-stage pipeline, twenty events reached the operator WhatsApp thread as HIGH, and every one of them corresponded to a real security-relevant moment (a tailgate, a package linger, an after-hours loiter, a break-in attempt). The suppression ratio is the product.”

Our system deployment, Fort Worth, TX

What each stage hands to the next

A shorthand for the pipeline, as chips. Read left to right.

raw: class=person conf=0.87 bbox

S1: mask(clock, strip, bug) -> pass

S2: class in {person, vehicle, package}

S3: polygon(tile[2,3]).contains(bbox)

S4: dwell >= 4s -> pass

S5: threat(zone, time, dwell, hist) -> HIGH

S6: no same-zone event in 180s

S7: route(whatsapp, oncall-night)

out: thumb.jpg + clip.mp4 + meta.json

delivery: ~240 KB in ~1.1 s

Three failure modes the pipeline is built to avoid

These are the outcomes the suppression pipeline exists to prevent. Each one is a failure mode we see on deployed detector-only systems inside the first thirty days of use.

Failure 1

The muted channel

A detector-only system delivers 0+ alerts per day to an operator phone. Within two weeks the channel is muted, and the HIGH event that matters arrives to a phone nobody is watching. Suppression is how the channel stays loud enough to be heard when it matters.

Failure 2

The legitimate-presence flood

A system with a global bounding box and no zone polygon alerts on every resident walking through every hallway at every hour. The operator learns to ignore the alerts, including the ones that are not residents. Stage 3 is the difference.

Failure 3

The equal-severity alert stream

Without a stage 5 threat classifier, a pool loiter at 6 PM arrives with the same tone and the same urgency as a compactor-alcove loiter at 3 AM. Operators lose the ability to triage in real time, which defeats the point of real-time alerts.

See all seven stages fire on a real composite frame.

We bring a unit to a running DVR on a 15-minute demo, tap the HDMI multiview, and walk one raw detection through every stage from overlay mask to WhatsApp delivery. You see what gets dropped, where, and why, and the 240 KB payload that lands on the phone at the end.

Book the suppression-pipeline demo →

When a detector-only system is the right pick

Not every deployment needs seven stages. A single-camera doorway, a test bench, a proof-of-concept on one tile, a demonstrator for a board meeting: these are fine with a raw detector and a sensitivity slider. The suppression pipeline earns its complexity when the camera count is large enough and the zones are varied enough that a global rule cannot separate interesting presence from routine presence.

At the scale our system targets (mid-market multifamily, construction jobsites with 16 to 32 camera trailers, small commercial with existing DVR infrastructure), a detector-only system fails inside the first thirty days because the operator mutes the channel. The suppression pipeline is not a differentiator there. It is the precondition for the system being used at all.

Frequently asked questions

What is the suppression pipeline in an AI surveillance system, and why is it the thing that matters more than the detector?

The detector is the part everyone shows in a marketing demo: a model fires a person-class bounding box on a camera frame. On a 25-camera property running at even conservative inference rates, the raw detector fires hundreds of times per hour. Almost none of those firings are events a human needs to see. The suppression pipeline is the chain of stages that sit between the raw firing and the operator's phone, and its job is to drop everything that is not actionable. On our system the chain has seven named stages (per-tile overlay mask, class filter, zone-in-polygon check, dwell timer, threat classifier, deduplicator, delivery router). The detector is a commodity; the suppression pipeline is the product, because it is the difference between a system that buzzes an operator ten times a day with real events and a system that buzzes two thousand times and gets muted.

What are the seven stages on our system, in order?

Stage 1, per-tile overlay mask: zero out the DVR clock, the per-tile channel name strip, and the channel bug on every tile, so those pixels cannot fire the detector. Stage 2, class filter: keep only person, vehicle, package; drop animals, foliage, light artifacts. Stage 3, zone-in-polygon: check whether the bounding box center is inside an operator-configured polygon for that tile (trash area, entry, package room, restricted zone). Stage 4, dwell timer: require a minimum persistence in the zone (typically 2 to 8 seconds depending on zone class) before the event is real. Stage 5, threat classifier: a second model that looks at the event context and returns LOW or HIGH, where LOW goes to the dashboard but not to a phone and HIGH wakes a human. Stage 6, deduplicator: suppress if the same zone on the same tile fired in the last N seconds. Stage 7, delivery router: pick the right operator WhatsApp thread for this property and zone class, and push a 240 KB payload to it.

Why are stage 1 (overlay mask) and stage 3 (zone polygon) necessary if the detector is already good?

Because the composite multiview frame the DVR renders to HDMI has three always-on features that look like edges and contrast to a detector trained on natural images: a digital clock in the bottom corner, a per-tile camera name strip, and a channel bug. A raw detector will fire occasionally on the clock (because digits change and the model sees motion), on the name strip (because a narrow horizontal bar of high-contrast pixels resembles certain learned features), and on the channel bug (for the same reason). The overlay mask zeros those pixel rectangles before the detector ever sees them. That is stage 1. Stage 3 exists because most of the pixels in a property camera feed (sidewalks, hallways, parking aisles) are places where a person is not interesting. A person walking down a public sidewalk at 2 PM does not need to wake a human. A person in the trash compactor alcove at 3 AM does. The polygon is how the system encodes that difference per property, per tile.

How does stage 5, the threat classifier, decide LOW versus HIGH?

It is a second model that runs only on events that already passed stages 1 through 4. Its inputs are the event context, not raw pixels: the zone class (public, semi-private, restricted, after-hours-only), the time of day (business hours, evening, overnight), the dwell seconds in the zone, the class of the detection (person, vehicle, package), and the event history for that tile in the last 30 minutes (is this the tenth loiter this hour or the first). The output is a score plus a label. HIGH goes to WhatsApp with a phone buzz. LOW goes to the dashboard as a logged event but does not ring. A concrete example: a person detected in the pool area at 11:45 PM with an 18-second dwell is LOW because the pool closes at 10 PM and single-party loiters in that zone have a high false-alarm rate during the summer. The same detection in the trash compactor alcove with the same dwell is HIGH because that zone has no legitimate overnight traffic.

What is the suppression rate that the pipeline actually achieves?

On a typical Class B or C multifamily deployment on a 16 to 25 camera DVR, the raw detector fires at roughly 200 to 500 hits per hour across all tiles during daylight and 20 to 80 per hour overnight. The operator sees roughly 5 to 20 HIGH events per day delivered to WhatsApp. That is a suppression rate in the 99 percent plus range between stages 1 and 7. The rate is not a tunable marketing number; it is the natural consequence of how restrictive the zone polygons, dwell thresholds, and threat rules are set during install calibration. A property that configures twenty restrictive zones will see fewer events than a property that leaves most cameras in permissive default zones.

Why is the suppression pipeline hidden from public writing on AI surveillance systems?

Because the public writing is produced by three constituencies, and none of them have a reason to describe it. Policy researchers write about capability (Brookings, ScienceDirect, the ACLU) because the policy argument is at the level of what the system could do if turned up to eleven, not what a deployed system actually emits after filtering. Vendor marketing pages (Pelco, Avigilon, Coram, Volt) list capabilities because capabilities sell demos; a section titled Stage 5 Threat Classifier in a buyer's guide is not a lead magnet. Encyclopedia pages (Wikipedia) define the category rather than describing a specific deployed stack. The suppression pipeline is the thing a property manager actually lives with on day 30, but nobody has a commercial or editorial reason to publish it. That is the gap this page fills.

Can I tune the suppression pipeline per property, or is it a fixed setting?

Per property, and per tile within a property. During install calibration the operator draws polygons on each tile (this is the restricted zone, this is the public zone, this is a no-zone area) and sets per-zone parameters: dwell seconds, active hours, threat weight. A dormant outdoor amenity zone that only matters after 10 PM gets a tight polygon with short dwell and high threat weight during overnight hours, and either a permissive polygon or no polygon during the day. The threat classifier reads those per-zone parameters as inputs. If a property changes operationally (a pool opens earlier for summer, a construction area gets fenced off), the operator re-draws the polygon from a phone and the pipeline picks up the change on the next inference tick.

What happens when the DVR layout or overlays change, and how does the mask stage handle it?

The overlay mask and the per-tile polygon are both tied to a reference composite frame captured at install. If the DVR layout changes (2x2 to 3x3, a channel renamed, the clock moved), the installer re-runs the calibration from the property dashboard. Re-calibration captures a new reference frame, the operator confirms which tile is which camera in a dropdown per tile, and new pixel rectangles are recorded for the clock, name strip, and channel bug. The detector model is unchanged; only the per-tile coordinates move. The whole re-calibration is under a minute. No firmware flash, no model retrain.

What specifically crosses the property boundary after all seven stages have fired?

About 240 KB per delivered HIGH event. The payload is three parts: an 18 KB JPEG thumbnail cropped from the tile that fired, a 220 KB H.264 clip of roughly six seconds bracketing the detection on that tile, and a 612 byte JSON metadata object (event id, property id, timestamp, detection class, bounding box, confidence, tile index, zone id, dwell seconds, threat label). That payload lands in the operator WhatsApp thread. Nothing else leaves the device. No continuous upload of the full multiview frame, no face embedding, no license plate string, no gait vector, no upload of any of the 24 other tiles that did not fire. The same payload shape is what has already been published in detail on /t/ai-and-surveillance for readers who want to audit the egress contract.

How can a property manager verify that a vendor actually has a suppression pipeline, not just a detector?

Four concrete questions the vendor should be able to answer on demand. One, per tile, how many raw detector firings became delivered alerts in the last 24 hours; a vendor with no suppression stack cannot report the first number because it is not recorded separately from the second. Two, show me the per-zone polygon and dwell configuration for my property; a vendor with no zone stage can only point at sensitivity sliders. Three, how does the system distinguish a person walking through a public hallway at 6 PM from a person loitering in a trash alcove at 3 AM; a vendor with no threat classifier will answer with alert rules that treat both as equal. Four, what is the delivery rate to operator phones per camera per day; a vendor without a deduplicator will produce more events than a phone can render without muting. All four answers are either present or they are marketing.

The one ratio that tells you if an AI surveillance system works

Raw detector firings per hour, divided by delivered operator alerts per hour. A healthy system runs two to three orders of magnitude. A system with a single stage between the model and the phone runs close to one to one, and that system gets muted before it finishes its first month. Every stage this page describes exists to move that ratio.

Our system publishes the seven stages because the stages are the product. A $450 device, a $200 monthly subscription, up to 25 camera tiles on one HDMI port, and seven suppression stages that turn a noisy detector into an operator channel that is still loud enough to be heard when something real happens.