Matthew Diakonov, Written with AI

Published April 20, 202611 min read

Local NVR, Local AI

You bought a local NVR to stay off the cloud. Then the AI add-on put you back on it.

Every top result for “local NVR security cameras” treats “local” as a storage property. Footage on disk, not in a vendor cloud. Subscription-free retention. Those guides end at the NVR. What happens when you want smart alerts on top of that NVR is the part the SERP skips, and it's where local-by-default quietly stops being local. This guide walks through the overlooked third path: tapping the NVR's own HDMI multiview output with an on-box classifier, so the footage, the inference, and the event index all stay on premises.

See a live HDMI tap on a real NVR

4.9from 50+ properties

One HDMI cable in, one HDMI cable out to the existing monitor

Up to 25 tiles per port (the 5x5 NVR multiview ceiling)

Per-layout overlay mask blanks clock, cam-name banner, channel bug

Nothing opened on the NVR LAN, no RTSP credentials, no cloud hop

Local NVR deserves local AI

HDMI is the interface the NVR was already going to use anyway.

Your NVR already composites every camera into one multiview grid

That grid is a live signal on the HDMI port driving the monitor

One HDMI pass-through captures it without touching the NVR

The inference runs on the box, on-prem, no RTSP and no cloud

Event rows land in a local index you choose to sync or not

0:00 / 0:05

What “local” actually means when you ask for it

When a Redditor in r/HomeSecurity or r/homeautomation or r/FrigateNVR writes “I want a local NVR,” they are almost always making three requests at once. One: storage stays on site, no monthly fee to pull up last Tuesday. Two: nothing phones home to a vendor cloud, because a recorder that phones home is a recorder whose retention and access policy aren't really yours. Three: if the system makes decisions (motion, person, package, loiter), those decisions also happen on site, because an inference call to someone else's GPU is a round-trip of your camera frames.

The retail category answers request one. Swann, Lorex, Reolink, Hikvision-branded NVR kits, Dahua OEM boxes, Amcrest kits, all of them ship a local NVR with an internal disk. Request two is mostly handled too, if you leave the companion mobile app uninstalled and keep the NVR off the internet. Request three is the one that collapses. The NVR's own built-in “AI” is a pixel-diff motion detector dressed up in a menu labelled “smart detection.” Adding real classification (person vs. vehicle, zone dwell, loitering, tamper) means one of three things, and two of them break the locality promise.

The three ways to add AI to a local NVR

Two of these break the reason you picked local in the first place.

Option A: Cloud AI service

Sign up for a vendor cloud, point the NVR or a bridge at it, let motion clips upload for classification. Works, but every frame containing an event transits the vendor's infrastructure. Footage is local on disk, but the decision is made off-site. Monthly fee. Uplink dependency. Clip retention on their terms.

Option B: Replace the NVR

Rip out the consumer NVR, put Frigate or CodeProject.AI or a commercial VMS in its place, pull RTSP directly off each camera. Genuinely local once it's working. But now you own the recorder too: retention policy, disk failures, firmware, warranty, evidence export, all of it. You rebuilt the recording layer just to add the inference layer.

Option C: HDMI tap on the existing NVR

A pass-through capture on the HDMI port that already drives the monitor in the office. Composite frame gets tile-cropped, overlay-masked, classified on the unit, and an event row goes to a local index. NVR keeps recording. Cameras stay on the NVR's PoE switch. Nothing opens on the LAN side.

Cloud AI vs. HDMI tap, at the pixel level

The difference is easiest to see in terms of where the frame lives during the decision.

Where does a camera frame travel before becoming an alert?

The NVR's companion bridge or a cloud camera app pulls an RTSP clip when motion fires, encrypts it, uploads it, the vendor's GPU runs detection, the vendor stores the clip for N days on their bucket, a push notification lands on your phone.

Clip leaves the LAN for every event
Per-device cloud credentials stored with the vendor
Uplink saturation becomes a detection reliability issue
Vendor outage = no alerts, even though your NVR is fine

The physical topology, on one line

Four inputs in, four outputs out of the edge unit. None of them route through a cloud.

Signal flow: HDMI in, HDMI out, local event row

The single preprocessing step that makes HDMI AI viable

If you ever tried pointing a generic object detector at an NVR's HDMI multiview, you already know the failure mode: the event log fills with phantom detections on the clock strip and channel bug. The colon glyph inside 14:32:07 scores as a tiny vertical person. The letterforms of “Mailroom Interior” score as low-confidence text-shaped objects. The red recording dot scores as a small object. A naive classifier turns an NVR's own chrome into an event firehose.

The fix is a per-layout overlay mask, keyed by layout_id. Each mask is a list of pixel rectangles the pipeline paints black on the composite frame before any classifier runs. The mask state at inference time is written onto every event row in the overlay_mask field, so months later a reviewer can reconstruct what chrome was blanked. The mask set covers the layouts the NVR can produce:1x1-stdfullscreen,2x2-std,3x3-std,4x4-std,5x5-std, plus any custom grids the operator saved on the NVR.

Overlay regions blanked before the classifier runs

Top clock strip (HH:MM:SS, updating colon glyph)
Per-tile camera-name banner ('Cam01 Front Door', 'Mailroom Interior')
Per-tile channel bug (CH1, CH2, CH3 small pill)
Recording indicator dot (red, varies by NVR brand)
Disk / network status icon (bottom right on most Hikvision NVRs)
Date badge (sometimes embedded in clock, sometimes separate)

Layout change = mask swap, on the same box

When a guard maximises one tile to fullscreen, or a property manager flips to a 2x2 view on a smaller monitor, the NVR's on-screen chrome moves. The clock now lives at different pixel coordinates. The camera-name banner changes size. A classifier pointed at the old mask starts scoring events on uncovered chrome. The engine detects the layout change from the tile geometry, loads the mask for the new layout_id out of its local cache, and resumes inference. Typical swap latency is under 20 ms. No frames are dropped past the swap.

/var/log/cyrano/edge.log

The event row that lands in the local index

Nine scalar fields plus one small tile object. Flat JSON, one row per event. Written to a local index first; the dashboard sync is opt-in on top of that.

/var/cyrano/events/ev_01HW9Q.json

The numbers that matter on the local side

Four metrics a local-NVR buyer should hold any HDMI AI solution to. Our system reports each on the event row so regressions show up at query time, not buried in a monitoring dashboard nobody opens.

0Max tiles per HDMI port (5x5 multiview ceiling)

0 sMedian capture to searchable, 25-tile grid

0 msMask swap on layout change

0Ports opened on the NVR LAN side

Fields on every event row written to the local index. Tile label, tile index, tile coords, property, layout_id, overlay_mask, event_class, iso8601_ts, latency_ms.

0 s

Physical install time on a running NVR. HDMI in, HDMI out to the existing monitor, one ethernet cable, one power plug.

Changes to the NVR configuration. Passive HDMI pass-through. The NVR still sees one sink, exactly as before the tap.

Cloud NVR AI vs. HDMI-tapped local AI

Both land a push notification on your phone. The architecture is different enough that the trade-offs are different too.

Feature	Cloud-NVR AI subscription	HDMI-tapped local AI (our system)
Footage leaves the building	Yes, clips for every event upload	No, frames are classified on the edge unit
Monthly fee	Yes, per-camera or per-site	No, unit is on-prem with optional portfolio sync
Detection when uplink is down	Degrades to motion-only or silent	Unchanged, detection runs on the LAN
Credentials stored with vendor	RTSP or cloud account per camera / NVR	None. HDMI has no auth.
NVR config changes required	Enable RTSP, forward ports, install companion app	None. HDMI pass-through only.
Survives NVR vendor going out of business	Usually not. Cloud dashboard disappears.	Yes. The NVR and our system both keep working.
Evidence export for the police	Must be pulled from the vendor cloud	Pulled from the NVR itself, same as before
Works on mixed-brand NVR fleets	Only within one vendor's ecosystem	Yes. HDMI is brand-agnostic.

Five minutes from out-of-box to first local event

The NVR doesn't change. The monitor on the wall shows the same multiview. Nothing gets rewired on the PoE side.

Installing on top of an existing local NVR

Unplug the HDMI from the NVR to the monitor

One cable. The monitor briefly goes dark. The NVR keeps recording.

HDMI from NVR into our system 'HDMI IN'

Pass-through capture. The unit is now seeing the same composite frame the monitor was showing, typically 1080p at 25 or 30 fps.

HDMI from our system 'HDMI OUT' into the monitor

The monitor comes back on. Guard or PM sees the exact same multiview they were watching before the swap.

Ethernet into the LAN, power plug in

LAN is used for optional webhook delivery and dashboard sync. Everything works without it; the unit just queues events locally until it comes back.

Layout sweep

Walk the NVR through the layouts you use (1x1-std, 4x4-std, plus whatever custom grids). The engine captures a reference frame per layout and computes the overlay mask on-box. Typically 30 seconds.

Tile label OCR

The engine OCRs the camera-name banners the NVR is already painting ('Cam01 Front Door', 'Mailroom Interior') and uses those as the stable key for search. No per-camera naming needed.

First event in the local index

Median 7 to 8 seconds from capture to searchable on a 25-tile multiview. You see the event row appear in the dashboard, with the thumbnail rendered, while still standing at the rack.

What the dashboard looks like after a week of events

The CLI surface is the same shape as the dashboard filters: property, tile label, event class, time window. Below is the scenario “anyone walk up to the front door overnight” on a local NVR, answered in 184 ms against the local event index.

cyrano search --local --property oak-ridge --tile 'Front Door' --class person_in_zone

Which local NVRs the HDMI path covers

The overlay mask templates are stored per recorder family because each brand paints its chrome in a slightly different spot. The classifier upstream doesn't care which mask produced a given event, so mixed fleets (Hikvision at one property, Dahua at another) produce a single unified event index.

Hikvision DS-7xxx NVR

Dahua NVR4xxx

Dahua NVR5xxx

Lorex N-series

Reolink RLN series

Uniview NVR301 / NVR302

Swann NVR-7xxx

Amcrest NV4xxx

ANNKE NVR

EZVIZ NVR

Bosch DIVAR

Honeywell Performance

Panasonic WJ-NX200

Panasonic WJ-NX400

Samsung SRN

Night Owl NVR

Q-See NVR

Zosi NVR

Three ways to sanity-check a “local NVR AI” pitch

If you're evaluating any vendor that promises local AI on top of a consumer NVR, three checks tell you fast whether “local” is load-bearing.

1. Does the pitch require RTSP credentials?

If yes, the inference box needs a route to each camera. That means the NVR's RTSP service is open on the LAN, credentials are stored somewhere, and a firmware update can break the stream URL format. HDMI has no credentials and no URL.

2. What leaves the building on every event?

Ask specifically. Is it the clip, the thumbnail, or just a JSON row? If the answer is “the clip” your locality is storage-only. If the answer is “nothing unless you opt in,” the inference itself is local.

3. What happens when the uplink is down?

If detection pauses, classification was happening off-site. If detection continues and events queue locally until the uplink returns, classification was on-box. This is the hardest question to get a straight answer on and the most telling one.

See the HDMI tap on your NVR brand

15 minutes. Pick your NVR model on the call and we show a live overlay-mask calibration plus the first event landing in the local index.

Questions from the Reddit thread that landed here

I bought a local NVR to stay off the cloud. Why does adding smart alerts suddenly push me back to the cloud?

Because almost every AI bolt-on for consumer NVRs is architected as an RTSP consumer that lives somewhere else. The common shapes are: a mobile app that pulls camera feeds through the vendor's SaaS (footage transits their cloud even if it's also stored locally on the NVR), a Wi-Fi bridge that uploads motion clips for cloud inference, or an enterprise VMS that replaces the NVR entirely. Each one either opens a port, ships video off-site, or forces you to rewire cameras out of the NVR's PoE switch and into a different box. The only path that keeps everything local is inference that runs on a unit physically attached to the NVR, reading a signal the NVR is already emitting on its own. The HDMI multiview output is exactly that signal.

Why HDMI instead of RTSP? Isn't RTSP the standard for NVR integrations?

RTSP is the standard when you're willing to pay three costs: (1) credentials that must be stored, rotated, and scoped per camera; (2) network topology changes, because an RTSP consumer needs a route to each camera or to the NVR's RTSP proxy; and (3) the fragility of a vendor-specific stream URL format that breaks on firmware updates. HDMI avoids all three. The NVR is already decoding every camera into a composite frame for the monitor on the office wall. Tapping that HDMI port with a pass-through capture means one cable, no credentials, no port forwarding, no per-camera stream URLs, and zero changes to the NVR configuration. The trade-off is that you see one composite frame instead of N individual high-resolution streams, which is fine for event indexing (the use case) and not for evidence export (still the NVR's job).

What exactly is in the composite frame that an HDMI tap sees?

The NVR composites every camera into a grid, typically 2x2, 3x3, 4x4, or 5x5 depending on how many cameras are live and how the operator configured the display. That gives a hard ceiling of 25 tiles on one HDMI port, because 5x5 is the largest simultaneous multiview consumer NVRs render; past 25 cameras the recorder starts cycling groups of cameras through the slots on a timer. Each tile is a live view of one camera, scaled down, typically 384x216 or 480x270 at the 5x5 scale. On top of the tile grid, the NVR paints its own chrome: a clock strip along the top or bottom, a per-tile camera-name banner that reads something like 'Cam01 Front Door' or 'Mailroom Interior', and a small channel bug (CH1, CH2, CH3). On some models there's a red recording dot and a disk-status icon. All of that gets baked into the HDMI frame before it reaches the capture card.

What's the overlay mask and why is it the thing that makes the HDMI approach work?

A person-detection classifier that sees those burned-in overlays will score false-positive bounding boxes on them. The colon glyph inside 14:32:07 looks like a tiny vertical person at the right scale. The letterforms of 'Mailroom Interior' score as low-confidence text-shaped objects. The red recording dot scores as a small object. Run the classifier on the raw HDMI frame and the event log fills with phantom detections on chrome pixels, not camera pixels. The fix is a per-layout overlay mask: a binary rectangle list that the pipeline paints black on the composite frame before any classifier runs. It's keyed by layout_id (4x4-std, 5x5-std, 3x3-std, 1x1-std, custom) because the chrome moves when the operator changes the grid. Store one mask per layout the NVR can produce, swap at inference time when layout_id changes, and every event after the swap is scored on pixels inside a tile. This is the preprocessing step every other HDMI-based analytics approach gets wrong, which is why people assume HDMI is unusable for AI.

What happens when someone switches the NVR from multiview to fullscreen on one camera?

That's a layout change, usually 4x4-std to 1x1-std or similar. The our system engine notices the tile geometry changed, loads the overlay mask for the new layout_id out of its local cache, and resumes inference on the new grid. The event record written after the swap carries the new layout_id in the overlay_mask field, so the mask state at index time is reconstructable months later. The cache is populated at install time from a one-time layout sweep (the installer walks the NVR through each layout it supports). If an operator creates a brand-new custom layout that the cache hasn't seen, the engine flags it, captures a reference frame, and the mask gets computed on-box before inference resumes.

Which NVRs does this actually work on?

Any NVR that drives a monitor drives our system, because the interface is the HDMI port, not a vendor-specific API. Confirmed on Hikvision DS-7xxx NVRs, Dahua NVR4xxx / NVR5xxx, Lorex N-series, Reolink RLN series, Uniview NVR301/302, Swann NVR-7xxx, Amcrest NV4xxx, ANNKE, EZVIZ NVR, Bosch DIVAR, Honeywell Performance series, Panasonic WJ-NX200/400, Samsung SRN, and the long tail of rebrands. Mixed fleets work the same way because the overlay mask is keyed per recorder model at install, and the event index upstream of the mask doesn't care which mask produced a given event. A property with one Hikvision NVR and one Dahua NVR renders a single unified search experience.

Does tapping the HDMI change anything on the NVR itself?

No. The unit sits in pass-through: HDMI in from the NVR, HDMI out to the monitor the guard or property manager was already watching. The NVR sees one HDMI sink, exactly as it did before. No config change, no firmware change, no port opened on the NVR's LAN side. The monitor stays lit with the same multiview. If you unplug the our system, the monitor goes straight back to direct-from-NVR. This matters for two reasons specific to local-NVR buyers: warranty and compliance. The NVR vendor can't void a warranty over a passive HDMI sink, and the compliance posture of 'local-only footage' doesn't change because nothing leaves the NVR.

If I'm already running Frigate or CodeProject.AI, why would I use this instead?

Different problem. Frigate and CodeProject.AI are full NVRs; they want RTSP streams and they replace (or parallel) your existing NVR. That's a fine architecture if you're building a home lab from scratch. It's a bad architecture if you already have a Lorex or Hikvision NVR installed, cabled, and under warranty, because you'd be rebuilding the recording layer just to add the inference layer. The HDMI approach keeps the NVR as the recording layer (warranty, retention, evidence export all untouched) and adds inference alongside it. The event index writes back out to webhooks or a dashboard; it doesn't try to take over storage. If you're on Frigate and happy, stay on Frigate. If you have a consumer NVR and don't want to replace it, the HDMI path is the least-invasive option.

How many cameras can a single unit handle?

Up to 25 per HDMI tap, because that's the 5x5 multiview ceiling consumer NVRs render simultaneously on one output port. Past 25 cameras the NVR starts cycling groups through the slots on a timer, which breaks continuous per-camera inference. Properties with more than 25 cameras typically have two things: either the NVR already has a second HDMI output (most mid-range and higher NVRs do, labelled 'HDMI2' or 'SPOT'), or they're running multiple NVRs. Either way you add a second unit on the second HDMI output, and both event streams land in the same index keyed by the property field. The 25-tile ceiling is per HDMI port, not per property.

Does any of this leave the premises?

Only the event row by default, and only if you want it to. The composite frame is cropped per tile, masked, classified, and discarded on the unit. What gets written out is a nine-field JSON record per event (tile label, tile index, tile coords, property, layout_id, overlay_mask, event_class, iso8601 timestamp, latency_ms) plus a 480x270 thumbnail. You can route that to a local webhook for a PMS or ticketing system and have nothing leave the LAN. You can also route it to the our system dashboard for portfolio search, in which case only the JSON record and thumbnail go over the wire, not the clip. The clip itself stays on the NVR's own disk under whatever retention policy you already have. That's the actual definition of 'local' that local-NVR buyers were asking for: storage local, inference local, and a choice about whether the event row syncs for cross-site search.

What should a Redditor with a 16-camera Hikvision or Lorex NVR do this week?

Three checks, then a decision. Check 1: confirm the NVR has an HDMI output that's currently driving a monitor (almost certainly yes). Check 2: walk the NVR through the layouts you actually use day to day (fullscreen, 4x4, 3x3) and write them down; those become the layout_ids the overlay mask has to cover. Check 3: count cameras; if you're over 25 you need a second HDMI output or a second unit. If all three check out, the install itself is under two minutes of physical work: HDMI out of the NVR into device HDMI-in, HDMI out of our system into the monitor, one ethernet, one power. Everything else (overlay mask calibration, tile label OCR, event classifier loading) happens on the box without touching the NVR.