Smart camera alert filtering is a four-stage pipeline. Every top search result talks about stage three.
The usual advice is: turn down sensitivity, draw a detection zone, toggle person-vs-vehicle, set a confidence threshold. That is one stage of four, and it only works if your input is a clean per-camera IP stream. It is not. If you are running this against a DVR, your input is an HDMI multiview carrying 16 or 25 tiles at once, stamped with a running clock, a per-tile camera name strip, and a channel bug that animates. This guide is the four stages that actually filter that input.
Stage one masks the overlays. Stage two turns grid coordinates into per-tile zones. Stage three does the confidence and persistence work every vendor talks about. Stage four deduplicates on tile.label so one loiterer produces one alert, not 18,000.
See the filter pipeline on my DVRWhat the top SERP results all quietly assume
Scylla, Actuate, March Networks, Eufy, Reolink, TP-Link, and the IP Cam Talk threads that dominate the first page on this keyword share one assumption: your cameras deliver clean, per-camera RTSP or ONVIF streams to whatever is doing the filtering. Under that assumption, every piece of advice is reasonable. Tune sensitivity per camera. Draw a polygon zone in the camera app. Toggle person vs vehicle. Set a confidence threshold.
That world does not match a real commercial DVR closet. The cameras are analog or proprietary HD-over-coax. The only output anyone ever looks at is the HDMI multiview going to a wall monitor. The recorder is a black box running firmware from 2019 that refuses to give you RTSP, or gives it to you after two hours on a forum. Every top result skips past that reality because it is inconvenient for their pitch.
The angle of this page is that smart filtering on the HDMI multiview is genuinely different. You need a filter at stage one that nobody talks about (overlay masking), and you need a dedup key at stage four that is keyed on tile.label, not on camera ID, because the multiview does not give you camera IDs. Those two stages are where Cyrano does most of its noise reduction.
The four stages, end to end
This is the pipeline between the HDMI cable you unplugged from the wall monitor and the WhatsApp message that lands on your on-call person's phone. Four filters, in order. Each one drops a different class of noise.
HDMI multiview -> filtered alert
Stage one: overlay_mask
Every DVR stamps three classes of pixels on top of the multiview before it leaves the HDMI port. A running clock (changes every second). A per-tile camera name strip (usually at the top or bottom edge of each tile). A channel bug or mode badge (often animated, in a corner). A naive object detector run on the raw composite frame will lock onto those regions because they are the only pixels that change reliably. Half the detections will be clock digits.
Cyrano ships with mask templates for every supported recorder. At boot, it reads the recorder model and layout_id, pulls the matching template, and blanks those regions before the frame ever reaches the detector. The regions that got blanked are recorded on the event itself as overlay_mask. You can grep any event to see exactly which overlays were dropped, per frame, per tile. Here is what one looks like.
overlay_mask: ["clock", "cam_name_strip", "channel_bug"]
That literal array shows up on every Cyrano event that comes off an HDMI multiview. It is the per-event audit trail proving the detector never saw the pixels that change every second. No IP-camera vendor emits this field, because no IP-camera vendor has an HDMI overlay to mask. If you run a general-purpose detector on an unmasked multiview you will generate one false detection per second per tile, forever.
Stage two: tile-grid zones
Zones on an IP camera are drawn on that one camera's frame. Zones on an HDMI multiview have to be drawn on the composite and then resolved per tile. Cyrano does that through a two-key lookup: layout_id gives you the grid (4x4-std, 5x5-std, 3x3-std, and so on), and tile.index gives you the row-major cell position. Together they pin a zone to a specific rectangle on the composite frame.
The trick is what happens when the recorder switches layouts (which maintenance staff do more often than you would believe). tile.index shifts, but tile.label (the camera name the DVR stamps on the strip) does not. Cyrano keys zones on tile.label, so a layout change automatically re-anchors every zone. This is the opposite of how IP-camera VMS software works, where zones are keyed on camera ID and every layout change requires a support ticket.
How a zone resolves on the composite frame
Read layout_id at boot
Look up tile rectangle
Match tile.label from strip
Apply zone in tile coords
Detection runs, zone-gated
Stage three: confidence hysteresis and multi-frame persistence
This is the stage every top-ranking page covers. A detection only counts if it survives across 3 to 5 consecutive frames. A new alert only fires above 0.70 confidence, but an existing detection stays alive down to 0.55 so it does not flicker at the threshold. Single-frame noise from shadows, compression, insects, and headlight sweeps never makes it out of stage three.
This is the stage where the familiar advice actually applies. Tune persistence window. Tune confidence upper and lower bounds. Tune minimum bounding-box size in tile coordinates. Cyrano ships reasonable defaults and exposes the tunables in the dashboard.
Stage four: dedup on property + tile.label + event_class
A person who walks into a detection zone and stands there for ten minutes will produce tens of thousands of true-positive detections from stage three. Every one of them is a real person on a real tile. Every one of them is also the same alert. The dedup stage collapses them.
The dedup key is property + tile.label + event_class, plus a cooldown window. Default cooldowns: 30 seconds for person_in_zone, 5 minutes for loiter, 2 minutes for vehicle_dwell, 10 minutes for tamper. Inside the window, new detections with the same key are logged as suppressed but not emitted as alerts. Outside the window, the key resets and the next detection fires a new alert. This is the single biggest reduction in operator-facing noise.
What each stage actually drops
Every suppressed detection gets logged with a reason code so you can audit the funnel. Here is what the stages drop on a representative 25-tile multiview over a 24-hour window at one apartment property.
Before the pipeline, after the pipeline
Same 24-hour window, same recorder, same 25 tiles. What the operator actually sees in their inbox.
Operator inbox on a single apartment property
Sensitivity-based motion on the HDMI composite with no overlay handling. Every clock tick, every channel-bug frame, every compression block on a dark tile triggers. The operator's phone becomes useless inside an afternoon and gets silenced by end of week.
- ~412,000 candidate detections per 24 hours
- ~280,000 fired on clock / name strip / bug pixels
- One true loitering event = tens of thousands of alerts
- Phone gets muted, real break-in attempts get missed
Filtering on HDMI multiview vs. filtering on a per-camera IP stream
If you have clean RTSP streams for every camera, most of the work on this page does not apply. But most commercial DVRs do not give you clean RTSP streams. This is what changes when the only output is HDMI.
| Feature | Per-camera IP filter | HDMI multiview filter |
|---|---|---|
| Input signal | One RTSP stream per camera, clean pixels | One HDMI composite with 16-25 tiles and overlays |
| Overlay handling | Not needed, no overlays | overlay_mask blanks clock + strip + bug pre-inference |
| Zones | Drawn in camera coords, keyed on camera ID | Drawn in tile coords, keyed on tile.label |
| Layout changes | Not applicable | tile.index shifts, tile.label does not, zones survive |
| Dedup key | camera_id + event_class | property + tile.label + event_class |
| Per-frame inference cost | N streams, N detector passes | 1 composite, 1 detector pass, scatter by tile.index |
| Install effort | Per-camera RTSP setup, IT involvement | Unplug HDMI from monitor, plug into Cyrano, done |
Try the pipeline on your own DVR
15-minute demo on the recorder already in your office closet. We run the four-stage filter live on your multiview and show you the suppression audit log.
Book a demo →The four stages as a tuning order
When alerts feel wrong, tune in this order. Each stage filters a different class of noise, and the later stages can only compensate so much for a miscalibrated earlier stage.
Re-verify overlay_mask on the noisy tile
If you see a steady firehose on one tile, pull a recent event JSON and confirm overlay_mask is ["clock", "cam_name_strip", "channel_bug"]. If the clock position is non-standard, recalibrate that tile's mask in the dashboard. This fixes the 70 percent case in one minute.
Tighten the zone on that tile.label
If overlays are masked but you still get daytime-traffic noise, the zone is too wide. Redraw the zone in the dashboard. Because zones are keyed on tile.label, a DVR layout change will not break it.
Raise persistence or fire threshold
If the zone is tight but single-frame artifacts still slip through, push persistence from 3 to 5 frames or raise the fire threshold from 0.70 to 0.80. Watch the dropped:persistence_failed counter climb.
Lengthen dedup cooldown on dwell events
If you are seeing multiple alerts for the same person standing in the same zone over ten minutes, extend cooldown on loiter from 5 minutes to 10. Dedup keys reset when tracked identity changes, so this does not suppress a different person entering the zone.
Why each stage belongs where it is
Stage one is first because overlays never sleep
The clock changes every second, forever. If overlay_mask runs after inference instead of before, the detector has already wasted its budget on the pixels that were going to be thrown away. Masking pre-inference is cheaper and cleaner.
Stage two is before the detector
Zones gate which tile-rectangles the detector runs on, not which detections it emits. Pre-inference zoning cuts compute in half on properties where half the tiles are parking-lot views you do not care about at night.
Stage three is after the detector
Persistence and hysteresis only make sense once you have bounding boxes to track frame-to-frame. They do not belong earlier in the pipeline.
Stage four is last because dedup needs event_class
You cannot deduplicate until you know the detection type. A person_in_zone and a loiter on the same tile.label at the same time are different alerts and should both fire.
Every drop is reversible
Stages one through four all log suppressed detections with a reason code. You can replay them offline to see what would have fired under different parameters. No drop is silent.
Works with the recorder already in your closet
The overlay_mask template library ships with coverage for the DVR and NVR brands Cyrano sees in the field. Each entry is a clock position, a cam-name-strip region, and a channel-bug region per supported layout_id.
A representative 24-hour funnel
One apartment property in Fort Worth. One recorder. 16 active tiles on a 4x4-std layout. 24 hours of traffic. The numbers below came off one suppression_audit.log export. Percentages are relative to candidate_detections_total.
“At one Class C multifamily property in Fort Worth, Cyrano caught 20 incidents including a break-in attempt in the first month. Customer renewed after 30 days.”
Fort Worth, TX property deployment
Frequently asked questions
What exactly is smart camera alert filtering on an HDMI multiview, and why is it different from filtering on IP cameras?
Smart camera alert filtering on an HDMI multiview means applying object detection and gating rules to a single composite frame that the DVR has already tiled into a 4x4 or 5x5 grid, with a running clock, per-tile camera name strip, and a channel bug overlaid on top. It is different from filtering on IP cameras because the input is one decoded stream carrying 16 or 25 scenes, not 16 or 25 independent streams. That one difference changes every filter. Zones are grid coordinates, not camera coordinates. Object detection runs once per composite frame, then results get scattered back out by tile.index. And every detection has to survive an overlay mask step, because otherwise the model will chase the clock digits flipping once per second.
What is the overlay_mask field and why does it show up on every Cyrano event?
overlay_mask is a JSON array field on every event Cyrano emits. Its value is typically ["clock", "cam_name_strip", "channel_bug"]. It lists the three classes of DVR overlay pixels that got blanked out before inference ran. Cyrano records it on every event so you have a per-event audit trail proving the detector never saw the pixels that change every second. No IP-camera vendor emits this field because they have no DVR overlay to mask. On an HDMI multiview, skipping this step produces a firehose of motion alerts pinned to the running clock.
How does Cyrano define detection zones on a tiled multiview?
Zones are drawn on the composite frame but resolved per tile. Cyrano reads layout_id at boot (for example 4x4-std or 5x5-std), computes the pixel rectangle for each tile.index, and stores your zone in tile-relative coordinates. When the recorder switches layouts, tile.index shifts but tile.label (the camera name the DVR overlays on the strip) does not, so your zones re-anchor automatically. This means a property manager can draw one zone once, on the monitor they are already looking at, and have it follow that camera through layout changes and firmware updates.
What is the multi-frame persistence filter and how many frames does it require?
Multi-frame persistence means a detection has to survive across 3 to 5 consecutive frames of the multiview stream before it can fire an alert. At 30 fps that is roughly 100 to 170 ms. Single-frame detections are intrinsically noisy: a headlight glare, a shadow sweep across a tile, a compression artifact, or the channel-bug animation cycle can all produce a confident bounding box on one frame. Requiring the bounding box to stay on the same tile across several frames collapses those artifacts to zero without losing real people or vehicles, who persist for seconds.
How does event dedup work when the same person stays in frame for ten minutes?
Every candidate alert gets a dedup key of property + tile.label + event_class. If that key has already fired within a configurable cooldown window (default 5 minutes for loiter, 30 seconds for person_in_zone), the new detection is suppressed rather than emitted. Practically, this means one loiterer standing at a mailbox for ten minutes produces one alert, not 18,000 frames of alerts. The cooldown is per key, so a second person walking into the same tile while the first is still dwelling still fires, because the dedup key resets when the tracked object identity changes.
What does one Cyrano event payload actually look like after filtering?
Nine fields plus a 480x270 JPEG thumbnail. tile.label is the camera name the DVR stamps on the strip (for example "Loading Dock NE"). tile.index is the row-major grid position starting at 0. tile.coords is x, y, w, h in the composite frame. property is the site identifier. layout_id is the recorder layout. overlay_mask is the array of overlays that were blanked. event_class is the detection label (person_in_zone, vehicle_dwell, loiter, tamper). iso8601_ts is the recorder clock timestamp. latency_ms is capture-to-delivery time. The thumbnail is a crop of just the triggering tile. Same shape from every Cyrano unit in the portfolio.
How fast is the end-to-end filtered alert on a 25-tile multiview?
Median capture to delivery is 7 to 8 seconds, with a 5 to 15 second envelope across the portfolio. That includes HDMI capture, overlay masking, batched inference across all visible tiles, multi-frame persistence, dedup, thumbnail crop, and WhatsApp or webhook delivery. Adding tiles does not linearly add latency because the detector runs once on the composite frame, not once per tile. Latency_ms is recorded on every event so you can chart p50 and p95 per property and detect a slow uplink before staff notice missed alerts.
Does the filter pipeline require per-camera configuration or does it calibrate automatically?
Most of it auto-calibrates. layout_id detection, overlay_mask template matching, and the persistence filter are all derived from the multiview itself at boot. The only thing that benefits from manual tuning is zones. Out of the box, Cyrano treats every tile as a full-frame zone and runs person, vehicle, and loiter detection on all of them. You narrow zones later ("only alert on person_in_zone inside the dashed box on Loading Dock NE") once you have seen a week of what the cameras actually see. Calibration is done through the dashboard, not by SSHing into anything.
Can this filtering approach work if my DVR runs a weird custom layout?
Yes, because the overlay_mask template is keyed on recorder model plus layout_id, not on Cyrano's guess. Supported recorders include Hikvision DS-7xxx, Dahua XVR and NVR, Lorex, Amcrest, Reolink NVR, Uniview, Swann, Night Owl, Q-See, ANNKE, EZVIZ, Bosch DIVAR, Honeywell Performance, Panasonic WJ-NX, and most white-label rebrands. Each has its own conventions for where the clock sits, how the camera-name strip renders, and what the channel bug looks like. Cyrano ships with templates for all of them. A custom layout that is not in the library just gets a one-time mask calibration and then behaves the same.
Why not just buy new AI cameras instead of filtering the HDMI output of the old ones?
Cost, time, and downtime. A full camera replacement at a single apartment property runs $50,000 to $100,000+ including new cameras, new wiring, and a new NVR, plus months of installation and an ongoing cloud subscription. One Cyrano unit plugs into the HDMI output of whatever DVR you already have in the office closet. Hardware is $450 one-time, software is $200/month, install is under 2 minutes, and the guard monitor on the wall keeps showing the same multiview because Cyrano passes the HDMI signal through. The filter pipeline described on this page is what makes that HDMI tap useful instead of noisy.
What happens to alerts that the filter pipeline drops? Is there an audit trail?
Dropped alerts are logged with the reason they got dropped: overlay_masked, persistence_failed, zone_miss, deduped_within_cooldown. You can query that log by property, tile.label, and event_class to see what would have fired before filtering and how much noise was removed. In practice this is how property managers tune zones in weeks 2 through 4 of a deployment. They see that overlay_masked drops fire every second on the south camera because its clock position is non-standard, and they recalibrate that one tile's mask in a minute.
Adjacent reading
Security Camera False Alarm Reduction
Where AI processing happens determines alert accuracy. A companion to this pipeline deep dive.
Live Monitoring vs Automated Alert Fatigue
200 alerts a day means zero alerts get read. How hybrid verification actually works.
Security Video Monitoring Systems: The Plural Problem
What running eight recorders at eight properties through one event stream looks like.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.