Matthew Diakonov, Written with AI

Published April 19, 202612 min read

Architecture guide

The security video monitoring system you already own is the tiled HDMI grid on the guard wall.

Search results for this phrase will sell you new cameras, a new recorder, or a remote monitoring contract. None of them mention the obvious thing already in your closet. Your DVR already produces a perfectly mosaiced grid of every camera over HDMI. That signal is, physically, your monitoring system. It exists whether anyone is watching the wall or not. This guide is about treating that one HDMI cable as a programmable input, not a passive output.

See it tap a live DVR multiview

4.9from 50+ properties

One HDMI tap covers up to 25 camera tiles

Capture to WhatsApp under 15 seconds

Layout-aware overlay masking on every event

Installs on a running recorder in under 2 minutes

Your monitoring system already exists. It just isn't watched.

The DVR HDMI multiview is the system. We make the system inferable.

DVR mosaics every camera into one HDMI signal

Wall monitor displays the grid. Nobody watches.

Our system taps the same signal in pass-through

One inference engine sees all 25 tiles per unit

Events ship to WhatsApp in 5 to 15 seconds

0:00 / 0:05

What the top search results miss

The first page for this phrase is split into three categories. Camera-and-recorder bundles from Lorex, GW Security, and CCTV Camera Pros. Generic remote monitoring services from Eagle Eye, Deep Sentinel, and AMAROK. Multiview hardware from VigilLink and similar AV processors. Each treats the question differently, and each makes the same omission.

The bundles assume you do not have a system yet. The monitoring services assume you want a human in a chair watching your tiles for $200 to $1500 a month per site. The multiview hardware assumes the goal is a bigger or sharper wall display. None of them name the simple fact that the multiview signal driving any existing guard monitor is already a tiled, synchronized, decoded, time-stamped composite of every camera at the property. That signal is the monitoring system. It is already produced. It is mostly unwatched. The unanswered question is how to read it.

The answer in this guide is to treat the wall as the input. One HDMI cable in, the same HDMI cable back out to the monitor, and inference running in parallel against the same composite the guard would have stared at if you had paid one to.

The three components that already exist on every site

Walk into any apartment building, jobsite trailer, warehouse, or storefront with cameras and you will find a version of this same triangle. The names on the boxes change. The shape does not.

Cameras to recorder to wall, the path you already paid for

Cameras

Mix of analog over BNC and IP over PoE. Count varies. The recorder normalizes them.

Recorder (DVR or NVR)

Decodes every feed, writes to disk, mosaics tiles into a single HDMI multiview, stamps a clock and a name strip on each tile.

Wall monitor

Plugged into the recorder's HDMI output. Continuously displays the tiled grid. Usually mounted in a back office and ignored.

edge AI unit

Sits between the recorder and the wall monitor. Captures the multiview, masks overlays, runs inference, emits per-tile events. Pass-through to the wall is preserved.

How the multiview signal becomes structured events

The pipeline is short on purpose. Most of the engineering effort is in handling the recorder's idiosyncrasies (layout selection, overlay positions, name strip fonts) so the inference layer can treat each tile as a clean independent scene.

Inputs into our system, outputs out of our system

The actual event payload, anchored to a layout id

A representative event from a property running a 4x4 multiview on a Hikvision DS-7716 recorder. The fields specific to the monitoring-system framing are the layout_id, the tile.coordinates, and the overlay_mask. Together they make the event traceable back to a specific pixel region of a specific recorder configuration.

Tile event off a 4x4 DVR multiview

Read the layout_id, tile.index, and tile.coords together. The event is anchored to a specific pixel rectangle of a specific recorder layout, which means you can audit it back to the source frame even months later. The overlay_mask line records that the recorder's clock, name strip, and channel bug were excluded from inference, so the trigger is provably scene content.

Naive RTSP per camera vs. one HDMI multiview tap

Most engineering teams that try to add intelligence to an existing camera fleet start with the wrong primitive. Per-camera RTSP feels obvious because each camera looks like an independent stream. In practice, it scales badly and breaks at install. The multiview signal collapses the same 25 streams into one capture.

Two ways to get inference onto an existing fleet

Open ports, rotate credentials per camera, decode 25 streams, deal with cameras dropping off the network individually, hope every camera is ONVIF-friendly, hope nothing changes when the property's IT contractor swaps the switch. One model per stream or careful batching. Failure mode: a camera goes offline and its tile is silently empty until someone notices.

25 negotiations, 25 decoders
Per-camera credential drift
ONVIF and codec edge cases
Silent failure when a camera drops

Numbers that actually decide the architecture

These are the constants from the our system side of the system. Each one shapes whether the monitoring grid you already own can become a watched system without a rebuild.

0Camera tiles handled per unit, off one HDMI

0 minInstall on a running recorder, end to end

0 sUpper bound for capture-to-WhatsApp event latency

0Camera firmware changes required

One HDMI

“The recorder already decoded every camera and laid them out in a synchronized grid. That work is done. The only thing missing is something on the other end of the cable that can read it.”

Our system architecture notes, 2026

The install, second by second, on a live recorder

Properties resist new security infrastructure because past installs meant pulled cable, IT escalations, and a weekend outage. None of that applies here. The wall monitor stays live throughout. Cameras stay live throughout. The recorder never reboots.

Install on a running DVR

0:00. Confirm wall monitor is on the recorder's HDMI

Trace the existing HDMI cable from the DVR to the wall monitor. That is the multiview signal our system will tap.

0:20. Unplug HDMI from the wall monitor

Disconnect the cable on the monitor end only. Keep the recorder side connected. Wall monitor goes dark for the next 60 seconds.

0:30. Plug DVR-side HDMI into our system input

Our system accepts the same HDMI signal the wall monitor was reading. Inference does not begin until the device boots.

0:50. Run a second HDMI from our system output to the wall monitor

Pass-through restores the wall display. The guard wall now shows the same grid it always did.

1:10. Plug network and power into our system

Wired ethernet preferred. Device boots, detects layout id from the multiview, and registers with the property dashboard.

1:50. Configure zones and armed hours from a laptop

Draw polygons over the tiles that need attention (mailroom shelves, parking entries, dumpster cages). Set armed windows. Save.

2:00. Test event fires on a walk-through

Walk the camera. Our system emits a person_in_zone event with tile.label and a thumbnail to the configured WhatsApp group. Latency_ms is recorded so you can audit it later.

Where this architecture earns its keep

The HDMI multiview tap is not a niche pattern. It fits any site where the recorder is already in place and the cameras are already live. The reason it gets adopted fastest in the following segments is that they share two traits: a fixed recorder, and a small staff who cannot stare at the wall.

Class B and C multifamily, 50 to 200 units

On-site teams of 2 to 4 people. Recorder is in the leasing office or maintenance closet. Wall monitor exists but only the maintenance lead glances at it. One our system per recorder turns the same grid into alerts the leasing team can act on.

Construction trailer CCTV systems

DVR sits in the trailer next to the foreman's desk. After-hours coverage was the gap. HDMI tap converts it into push alerts during armed windows.

Storefront chains and franchise locations

One DVR per store, regional manager covers many. Per-store HDMI taps roll up into one portfolio dashboard.

Self-storage and bike cages

Manager on duty is part-time. The recorder runs anyway. Tile-level alerts flag after-hours entry without requiring eyes on the wall.

Logistics yards and laydown areas

DVR mounted in a guard shack. Multi-tile mosaic of perimeter cameras. Our system flags zone entries at fence breach lines.

Property management portfolios with mixed recorders

Sites have different DVR brands from years of acquisitions. Layout id detection handles each model without a per-site integration.

Schools and small institutions on legacy CCTV

Recorder predates the current admin team. Replacement budget is years away. HDMI tap adds intelligence without a capital project.

Three ways to get a watched system, compared

Same camera fleet, three architectures. The HDMI multiview tap is the only one that does not require ripping out the recorder or paying a per-camera per-month service.

Feature	Rip-and-replace smart cameras	device HDMI multiview tap
Camera replacement	Full replacement. $50k to $100k+ per site.	None. Use what is on the wall.
Recorder replacement	New cloud recorder, new license.	None. Use the existing DVR or NVR.
Cabling	Pulled cable per camera, weeks of work.	One HDMI in, one HDMI out.
Install time	Multi-week project per site.	Under 2 minutes per site.
Cameras supported per device	One smart camera per camera.	Up to 25 tiles per unit.
Inference location	Cloud, video uploaded.	On-device, video stays on site.
Event latency	Variable, gated by upload speed.	5 to 15 seconds capture to WhatsApp.
Layout drift handling	N/A, smart cameras own their own layout.	layout_id + overlay_mask on every event.
Compatibility risk	New cameras, new recorder, new failure modes.	Defined at the recorder, not per camera.
Up-front cost per site	$50k to $100k+ project.	$450 hardware + $200/month software.

What gets configured, not built

Once the HDMI signal is tapped, the work shifts from infrastructure to configuration. The configuration surface is small on purpose. Each item below either changes when an alert fires or changes where it goes.

The eight things you actually configure

Confirm the recorder layout id (4x4-std, 5x5-std, 3x3-std, etc.) so tile boundaries are accurate.
Verify overlay_mask covers the recorder's clock, name strip, and channel bug.
Map tile.index to a human tile.label per camera (mailroom, west-gate, parcel-shelf).
Draw zones (polygons) over the tiles where attention is needed.
Set the armed time window per zone (after hours, shift-change, weekend, always).
Pick the event class per zone (person_in_zone, loitering, vehicle_arrival, tile_blank).
Choose a delivery channel per zone (WhatsApp group, webhook, phone call).
Run a walk-through and confirm latency_ms lands inside your target.

DVR and NVR brands the multiview tap works on

Compatibility is at the recorder, not the camera. If the recorder has an HDMI port and you have a guard monitor connected to it, the multiview signal already exists.

Hikvision DS-7xxx

Dahua XVR / NVR

Lorex

Amcrest

Reolink NVR

Uniview

Swann

Night Owl

Q-See

ANNKE

EZVIZ

Honeywell Performance

Bosch DIVAR

Panasonic WJ-NX

Any DVR with HDMI multiview

The three failure modes worth pre-empting

The HDMI multiview tap has its own failure modes. Each one is preventable, and each one has a single corresponding field in the event payload that catches it.

Failure 1

DVR layout switched and tile.index shifted

Someone changes the recorder from 4x4 to 3x3 to enlarge a camera. Tile.index 6 is no longer the mailroom. layout_id on the event payload is the audit trail. Our system re-detects layout on signal change and remaps labels.

Failure 2

Stuck DVR clock generates noise

A frozen clock pixel can look like motion to a naive detector. overlay_mask on every event records that the clock, name strip, and channel bug were excluded before inference. No frozen-clock false alerts.

Failure 3

A camera drops and its tile goes blue

Per-camera RTSP fails silently. The HDMI tap sees the blue or black tile and emits a tile_blank event with tile.index and tile.label, so a missing camera becomes an alert instead of a hole.

Want to see your existing DVR multiview turned into events?

15-minute call. We connect to a live recorder over screenshare or in person, identify the layout id, draw a polygon over a tile, and watch a person_in_zone event with a thumbnail land in WhatsApp under 15 seconds. No camera changes, no recorder changes.

Book a demo →

Frequently asked questions

What does a security video monitoring system actually consist of in most buildings today?

Three pieces of hardware that already exist: cameras (analog or IP), a recorder (DVR or NVR) that ingests every camera and writes it to disk, and a wall monitor connected to the recorder's HDMI multiview output that mosaics every camera into one tiled grid (commonly 2x2, 3x3, 4x4, or 5x5 layouts). That tiled HDMI signal is the security video monitoring system's actual visual artifact. It exists 24x7 whether anyone looks at it or not. Adding intelligence does not require new cameras, new wiring, or a new recorder. It requires reading the signal that the recorder already produces.

How is tapping the HDMI multiview different from pulling RTSP streams from each camera?

RTSP requires per-camera credentials, ONVIF compatibility, network reachability for every camera, and a per-stream decode pipeline. On a 25-camera site that is 25 separate negotiations and 25 decoders running. The DVR HDMI multiview is one signal. The recorder has already done the decoding, layout, and synchronization. One HDMI capture brings every camera into the inference engine in one frame, with the recorder's clock embedded. There is no firmware to flash on the cameras, no port forwards to open, and no credential rotation to coordinate. Our system captures the multiview, parses the layout id (for example 4x4-std or 5x5-std), and indexes each tile back to the camera name strip the recorder already overlays.

What does our system do about the DVR overlays baked into the multiview signal?

Every DVR adds three overlays onto the HDMI tiles: a clock readout, a per-tile camera name strip, and a channel bug or mode badge. If you ran object detection naively, the model would learn to fixate on those pixels because they change every second. Our system emits an overlay_mask field with every event listing which overlays were masked before inference, tied to the recorder's layout_id. A typical event reads overlay_mask = clock + cam_name_strip + channel_bug, layout_id = 4x4-std. This pre-empts any review challenge that the model fired on baked-in pixels rather than scene content. It also keeps a stuck DVR clock from generating a parade of motion alerts.

How many cameras can one unit cover?

Up to 25 tiles per unit, off a single DVR HDMI multiview. That maps to recorders running 4x4 (16 tiles), 5x5 (25 tiles), or smaller layouts. Properties with more than 25 cameras typically have multiple recorders already, and add a unit per recorder. The math here matters: 25 simultaneous inference streams off one HDMI input runs at a fraction of the cost and complexity of 25 RTSP decoders.

Where does the inference actually run? Does footage leave the property?

Inference runs on the edge AI unit, sitting next to the recorder. Video frames stay on the device. What leaves the building is structured event metadata: tile.label (camera name), zone name, event class, dwell seconds, ISO 8601 timestamp, a 480x270 thumbnail crop of the relevant tile, overlay_mask, layout_id, and capture-to-delivery latency in milliseconds. Delivery is a WhatsApp message, a phone call, or a webhook. The recorder continues to handle long-form storage. Our system does not duplicate the archive.

How long does install take on an active recorder?

Under 2 minutes on a recorder that already has a guard monitor connected. The HDMI cable that runs from the DVR to the wall monitor unplugs from the monitor, plugs into our system's HDMI input, and a second HDMI cable from our system's pass-through output goes back into the wall monitor. Network in, power in. The guard wall continues to display the same multiview because our system passes the signal through. Inference begins on the captured copy within seconds of boot.

Which DVR and NVR brands are supported?

Compatibility is defined at the recorder, not the camera. Any DVR or NVR with an HDMI multiview output works, which includes Hikvision DS-7xxx, Dahua XVR and NVR, Lorex, Amcrest, Reolink NVR, Uniview, Swann, Night Owl, Q-See, ANNKE, EZVIZ, Bosch DIVAR, Honeywell Performance, Panasonic WJ-NX, and most rebranded white-label recorders. Camera mix can be analog over BNC, IP over PoE, or both, because the recorder has already normalized them into the multiview signal by the time our system sees the frame.

What is the typical latency from a tile event to a staff alert?

5 to 15 seconds end to end. Capture frame, run inference, evaluate against the configured zones and event classes, build the event payload with tile thumbnail, deliver to WhatsApp or webhook. Latency_ms is recorded with every event so you can audit it. On a properly configured deployment, the median lands near 7 to 8 seconds. Anything above 15 seconds usually points to a network egress problem on the property's uplink rather than the inference path.

Does our system replace a remote monitoring service?

It can, and it can also feed one. The cheapest configuration is direct WhatsApp delivery to a property team that already responds to maintenance pings on the same channel. The next configuration is a hybrid: alerts route to an in-house operator during business hours and to an outsourced monitoring agency overnight, both consuming the same event stream. The most stable configuration we see at scale is a property staff group during business hours plus a paid central station after midnight, with the same our system events feeding both endpoints. The system itself is the alert producer; the choice of who acts on it is operational.

What does our system not do?

Our system does not store long-form video (the recorder already does). Our system does not replace cameras. Our system does not access the cloud for inference. Our system does not police access control directly (it can hand off events to access control systems via webhook). Our system does not coach staff on what to do once they receive an alert. The product is intentionally narrow: turn the multiview signal that already exists into structured events, fast enough to act on.

Worth saying plainly

A security video monitoring system is, in most buildings, not something to buy. It is something already produced by hardware that has been running in a closet for years. The cameras are connected. The recorder is mosaicing them. The wall monitor is showing the grid. The thing missing is not another product. It is something on the other end of the HDMI cable that can read what the recorder is already showing.

Our system is a single piece of hardware that sits in that position. The DVR HDMI cable plugs into one side. A second HDMI cable continues to the wall monitor on the other side, so the guard wall keeps working. In between, every tile is inferred against in parallel, layout-aware, with overlays masked. The output is structured events with tile labels, thumbnails, ISO timestamps, and capture-to-delivery latency under 15 seconds.

The phrase security video monitoring system describes something most properties already own. This guide describes how to make the system actually monitor.

0 tiles, one HDMI

Maximum camera tiles inferred per unit off a single DVR HDMI multiview signal. Same cable that has been driving the guard monitor.