Matthew Diakonov, Written with AI

Published April 19, 202614 min read

The gap between the firewall and the whiteboard

Your intellectual property cyber security stack protects the bytes on the wire. The camera watching the server room is the gap.

DLP watches files. EDR watches endpoints. CASB watches SaaS. None of them see the phone pointed at the whiteboard, the contractor photographing a prototype, or the after-hours walk through the rack aisle. That is why an AI surveillance layer gets added to IP-bearing rooms. The trap is that most of those deployments are themselves a new network-attached, credential-holding, cloud-connected Linux box sitting one hop from the IP they were supposed to protect. This page is about the architecture that closes the physical gap without reopening the network one.

See the air-gapped surveillance path

4.9from IP-bearing deployments

LAN IPs added by the AI surveillance unit: 0

Raw multiview footage that leaves the building: 0 bytes

Event payload size: about 240 KB per delivered HIGH event

Egress path: cellular eSIM only, no property Wi-Fi or Ethernet

Intellectual property cyber security, past the perimeter

The camera stack inside the perimeter is often the softest target.

DLP, EDR, CASB stop at the wire

IP cameras + NVR + cloud AI sit next to the IP

HDMI tap: zero LAN IPs, zero RTSP, zero cloud video

Event egress: ~240 KB per event over cellular

Raw multiview never leaves the building

0:00 / 0:05

The IP cyber security stack as usually drawn, and the layer it skips

Open any competent intellectual property cyber security deck and the diagram is familiar. Network perimeter at the edge. DLP and CASB watching outbound traffic. EDR on every endpoint. Identity at the front door. Secrets management for build systems. Source control with signed commits. Data classification tied to access control. The whole program is built to keep the IP from escaping as bits.

What the diagram almost never shows is the physical layer. Which is where a phone camera in an aisle, a USB stick in a workstation, a photo of a wall of post-its, or a laptop walked out in a bag defeats every upstream control. A program that stops at the wire concedes everything that happens in the room.

The fix is obvious in the abstract (cameras plus AI looking for things that should not happen in rooms that hold IP), and dangerous in the concrete. Because the default way to add AI to cameras in 2026 is to put a new Linux box on a network segment with routes to every camera, one TLS tunnel to a vendor cloud, and credentials for the DVR on its disk. That is a new attack surface, one network hop from the source code, exactly where the IP cyber security program least wants one.

0LAN IPs added by an HDMI-tap AI unit

0 bytesRaw multiview footage that leaves the building

0 KBEvent payload per delivered HIGH event

0Camera tiles covered per unit, off the LAN

The rooms that actually hold your IP

Every one of these rooms is covered by existing cameras on most mid-market properties, and every one of them is where a physical IP exfiltration attempt has to happen. The surveillance layer that watches them is part of the IP cyber security program whether it is named that way or not.

The server room and MDF closet

Racks, patch panels, and the hardware-encryption appliances that back the whole identity and secrets program. Access is usually badge-gated, which makes an actual intrusion a badge event worth recording and a tail-gater worth flagging. The question 'who was in here at 2:14 on Saturday' is the IP-cyber question that only the camera layer can answer.

The R&D lab

Prototype benches, test fixtures, whiteboards covered in block diagrams. The single richest room of unshipped IP in the building. A phone pointed at the whiteboard is a full product roadmap in 200 KB.

The prototype machine shop

CNC jobs, 3D prints, first-article parts. A photo of a part on the bench is the geometry of the next generation. DLP does not see a phone camera.

The exec office and boardroom

M&A materials, unfiled patent disclosures, forecast decks. The content on that table is IP before it is classified anywhere else.

Document vault / law firm IP room

Patent files, sealed discovery, licensing drafts. Physical security policy already requires cameras. What is rarely asked is whether the cameras themselves introduce a cyber path to that material.

Loading dock and mailroom

The path your prototypes and drives take in and out of the building. The only point at which a physical exfiltration of bulky IP has to cross a controlled threshold.

Where the IP actually flows, and where the AI surveillance sits next to it

On a camera-first AI deployment, the inference box lives on the same physical floor as the IP, talks to every camera on the camera VLAN, holds DVR credentials on disk, and opens a long-lived TLS tunnel to a vendor cloud. It is a new ingress path, a new egress path, and a new credential store, all colocated with the IP it was supposed to watch. An HDMI-tap unit sits on the HDMI cable between the DVR and the monitor. The network never sees it.

Where the AI sits in your network, or does not

Two shapes of AI surveillance, from the IP cyber security desk

The same detection workload (person, vehicle, package, zone violation) can be delivered by two radically different architectures. From a model-quality perspective they can look identical on a datasheet. From an IP cyber security perspective, one of them is a new attacker-addressable endpoint inside the perimeter and the other is not on any network you own.

Camera-first AI versus HDMI-tap AI, as a cyber risk surface

The rows that actually matter to a CISO evaluating surveillance of IP-bearing rooms.

Feature	Camera-first AI surveillance (RTSP / cloud)	HDMI-tap (our system monitor-first)
New LAN IPs introduced	1 to N (inference box, switch, sometimes management VLAN)	0
Credentials held on the AI device	DVR admin, per-camera credentials	none (no RTSP, no DVR login)
Long-lived cloud TLS tunnel from inside the LAN	yes, persistent outbound to vendor	no (only cellular uplink, not on your network)
Inbound RTSP ports opened from camera VLAN	up to 25 sustained connections	0
Raw multiview crossing the property boundary	often continuous cloud upload	never
Daily egress per property	1 to 3 GB per camera per day typical	roughly 1 to 10 MB (event payloads only)
Blast radius of an inference-box compromise	attacker gets camera VLAN + DVR credentials	attacker sees 240 KB event thumbnails
Rollback if the AI is compromised	re-image inference box, rotate camera credentials, audit DVR	unplug the HDMI unit, monitor goes back to direct
Fits behind strict egress policy (air-gapped sites)	usually no (cloud is load-bearing)	yes (on-prem event sink over cellular supported)
New CVE exposure per deployed property	Linux + GPU driver + RTSP parser + vendor agent	none on your LAN

The attack path that a networked AI box opens, drawn in order

A camera-first AI surveillance stack, from the perspective of an attacker who has already established a foothold on the corporate LAN. Each arrow below is a step the IP cyber security program would prefer not to exist at all. The HDMI-tap architecture does not remove the first two arrows (the attacker is already in). It removes every arrow that comes after, because the AI unit is not reachable from the LAN.

Attacker path to IP-room video, camera-first stack

Before and after: the IP cyber security posture with and without HDMI-tap AI

The comparison that matters is not AI versus no AI. It is the shape of the risk surface when AI gets added to the IP-bearing camera coverage. Two states, drawn as an auditor would draw them.

Adding AI surveillance to IP-bearing rooms

A new Linux + GPU box appears inside the perimeter, close to the IP. It holds DVR and per-camera credentials on disk. It opens 25 sustained RTSP connections from the camera VLAN. It maintains an outbound TLS tunnel to a vendor cloud. It uploads up to several GB of video per camera per day. It introduces new CVEs every patch cycle. If compromised, the attacker gains video coverage of the rooms the AI was installed to protect, plus a foothold on the camera VLAN.

New LAN IP, new CVE surface
DVR and camera creds on disk
Persistent outbound TLS tunnel
Continuous cloud video upload
Blast radius: full camera VLAN + IP-room video

What actually leaves the building, per day, on a live IP-bearing floor

Below is a real egress log shape from a deployed unit on an R&D floor. The numbers are what lands on the cellular uplink over the course of a quiet weekday. No raw multiview, no clip from a non-event, no telemetry larger than a few kilobytes. The comparison line at the end is a typical cloud AI camera platform at the same coverage.

cyrano:egress-audit property=rd-floor-b3 window=24h

The deployment path, from the security team's perspective

What the rollout actually looks like when the IP cyber security team runs the change control, not just facilities. Every step is scoped to preserve the property that the new device does not enter the LAN.

Rollout on an IP-bearing floor

Classify the covered rooms

Identify the cameras whose field of view contains IP (server room, R&D lab, prototype bench, mailroom). Tag those camera channels in the DVR. Any policy difference (retention, alert routing) applies to these tiles.

Confirm the HDMI path

Verify the DVR has an HDMI-out already going to a guard monitor, confirm no HDCP on the path (security DVRs never have HDCP), and confirm the grid (4x4 or 5x5 is typical for these floors).

Insert the unit on the HDMI cable

Unplug the DVR-to-monitor HDMI cable, run it through the unit's HDMI in and HDMI out. Power the unit. No port opened, no LAN cable connected, no DVR credential requested. This step is a physical install, not a network change.

Calibrate tiles to rooms

In the dashboard (reached over the cellular uplink, not via the property LAN), tag each tile with its camera name, and mark which tiles are IP-bearing. Draw zones and after-hours windows. The unit begins scoring events immediately.

Route event egress per policy

For cloud-allowed environments, events go to the our system event cloud. For regulated environments, point the cellular-side HTTPS POST at an on-premise event sink. The multiview never leaves the building either way.

Log and audit

The unit exposes a signed event log queryable by event ID. Each record carries the tile, the zone, the timestamp, and the SHA-256 of the clip. Useful for incident response against physical IP events.

“Zero LAN IPs, zero RTSP connections, zero raw multiview crossing the property boundary. The AI watching the server room shares no network segment with the servers it is watching. The only uplink is a cellular eSIM that ships about 240 KB per delivered HIGH event. Raw footage of IP-bearing rooms does not leave the building, under any condition, by design.”

Our system egress contract, deployed architecture

Questions to ask any AI surveillance vendor before it watches IP-bearing rooms

These eight questions separate a camera-first stack from an HDMI-tap stack in a procurement call. The answers decide whether you are hardening the physical layer of your IP cyber security program or quietly expanding its attack surface.

IP-cyber checklist for AI surveillance vendors

How many new LAN IPs does your unit introduce to the property network? If the answer is anything other than zero, the unit is part of the IP cyber security scope from day one.
Does the unit require the DVR admin password or per-camera credentials? Every credential the AI holds is a credential an attacker can lift if they compromise the AI.
Does the unit open inbound RTSP or ONVIF connections from the camera VLAN? Each one is a port on a device sitting between your cameras and your corporate network.
Is there a long-lived outbound TLS tunnel from inside our LAN to your cloud? If yes, it is a persistent data path that needs to be in your egress policy and your incident plan.
What leaves the property per camera per day, in bytes? Distinguish between continuous cloud upload, motion-triggered clips, and event-only payloads. Orders of magnitude matter.
Can event egress be routed to an on-premise sink rather than your cloud? For regulated or air-gapped environments, this is often a hard requirement.
If your unit is compromised, what is the blast radius on our network? The honest answer for a camera-first box is 'camera VLAN plus DVR credentials.' The honest answer for an HDMI-tap box is 'the unit's own event cache.'
Can we roll back in under five minutes without touching the camera network? Unplugging one HDMI unit counts. Re-imaging a Linux inference box and rotating camera credentials does not.

DVR brands that the monitor-first path has been verified against

The HDMI tap works on anything that emits a standard multiview to a monitor. For an IP cyber security rollout that cannot tolerate a mixed-brand gap, this matters: you do not have to replace the camera plant to add the surveillance AI layer on top.

Hikvision

Dahua

Lorex

Swann

Uniview

Amcrest

Reolink

Q-See

Annke

ZOSI

Night Owl

Defender

LaView

Samsung SDH

Flir DNR

Honeywell rebadges

LTS (Hikua)

Generic NVR w/ HDMI out

Three failure modes the HDMI-tap architecture does, and does not, prevent

An honest IP cyber security posture names what a control covers and what it does not. Here is what the architecture does, in the specific shape of the threats the rest of the program worries about.

Covered

Network pivot from the AI box to the cameras

An attacker who owns the AI surveillance box cannot use it to reach the camera VLAN or the DVR, because the box is not on any LAN you own. The canonical camera-first attack path (compromise inference box, pivot to cameras, exfil video) requires a network path that does not exist.

Covered

Cloud copy of raw IP-room video

Because the egress contract is event-only and capped at about 0 KB per event, the continuous cloud copy of multiview footage that camera-first stacks rely on does not exist. There is no bucket of server-room video sitting on a vendor cloud.

Not covered

Compromise of the DVR itself over its own interface

An attacker who already has a path to the DVR (a shared VLAN, a reused admin password, a publicly exposed interface) still owns the camera footage through that path. HDMI-tap does not create this risk and does not solve it. DVR hardening remains a separate line item.

See the air-gapped surveillance path on your own DVR.

In a 15-minute demo we bring a unit, intercept the monitor HDMI on your existing DVR in the server room or on the R&D floor, and show the zero-LAN install, the per-event payload shape, and the event routing to either cloud or on-prem. No RTSP credentials requested. No ports opened. Nothing added to your network.

Book the IP-room demo →

Why this architecture exists at all

The monitor-first path was designed for a property manager install on a multifamily building. The HDMI is already there going to the guard monitor. The DVR password is lost. The building IT stack cannot tolerate 25 new RTSP streams. Two minutes with a screwdriver and the AI is on.

The same properties turn out to apply, one-for-one, to an IP cyber security problem. The HDMI is already there. The credentials belong to the DVR, not the AI. The corporate LAN cannot tolerate a new Linux box with GPU drivers and cloud tunnels sitting next to the source repos. The architectural answer that makes multifamily install painless is the same one that makes the IP-bearing floor install auditable. Zero LAN IPs, zero new credentials, zero raw video off-property, by construction.

Frequently asked questions

Why is physical surveillance treated as part of intellectual property cyber security at all? Isn't it a facilities problem?

Because modern surveillance is no longer a set of dumb coax cameras and a DVR in a locked closet. A contemporary deployment adds IP cameras with their own ARM SoC and Linux stack, a VMS or NVR with an RTSP listener, a cloud relay for remote viewing, and increasingly an AI inference box that opens long-lived connections to every camera on the property. Each of those is a networked computer, each has firmware with CVEs, and each sits on or near the same network segments that carry your source repositories, your CAD exports, and your build artifacts. The moment a surveillance device speaks TCP, it enters the IP cyber security scope. Treating it as a facilities problem is how IP-bearing rooms end up guarded by the weakest-patched Linux box on the floor.

What is the attack surface specifically introduced by a networked AI surveillance box watching IP-bearing rooms?

On a typical camera-first deployment you add a new Linux device with a GPU, a web dashboard on port 443, an SSH or similar admin interface, one outbound TLS tunnel to the vendor cloud, and up to 25 persistent inbound RTSP connections from the camera VLAN. That device is often racked in the same closet as the DVR, sometimes on the same VLAN, and almost always with a route to the corporate network for single sign-on. The surface includes: the inference box's OS and GPU driver stack, the RTSP parser, the vendor cloud relay, the cached video stored locally for review, and the credentials the box holds for the cameras. A compromise of any one of those turns the camera watching your R&D lab into a live feed available to whoever owns the box. For an IP cyber security posture, that single device is often the softest Linux target inside the perimeter.

What exactly does an HDMI-tap AI surveillance architecture remove from that attack surface?

It removes the network entirely. The unit reads the composite HDMI multiview the DVR already renders for the guard monitor, so it never opens an RTSP connection, never asks for a camera password, and never joins the property LAN. It has zero IPs on your network. Its only uplink is a cellular eSIM. The DVR-to-monitor HDMI cable passes through the unit on its way to the screen, which means from the camera network's perspective, nothing changed. For an IP cyber security program, that means adding AI to watch a server room does not add a single attacker-addressable endpoint to the LAN that carries your IP.

What is the exact egress contract? What leaves the building?

Per delivered HIGH event, roughly 240 KB: an 18 KB JPEG thumbnail, a 220 KB H.264 clip of about six seconds bracketing the detection, and a 612 byte JSON metadata object. Nothing else. No continuous upload of the multiview frame, no face embedding, no plate string, no audio, no live stream. On an average IP-bearing floor with tuned zones, that is between 5 and 40 events per day, or on the order of 1 to 10 MB per day off-property. For comparison, a typical cloud AI camera platform uploads 1 to 3 GB per camera per day of continuous or motion-triggered clips. Three orders of magnitude less video crosses the property boundary, all of it over cellular, none of it on your LAN.

What about the DVR itself? That is a networked device too.

It is, and an IP cyber security program still has to harden it. The difference is that the DVR is already in your scope and usually already on an isolated camera VLAN with no inbound route. Adding a monitor-first AI unit does not change that posture. It does not require the DVR admin password, it does not require opening a port, it does not require putting a new device on the camera VLAN. The control plane for the AI is the cellular modem, not the property LAN, so the blast radius of a our system compromise ends at 'an attacker sees 240 KB event thumbnails' rather than 'an attacker has a shell on a Linux box with credentials to every camera.'

Where does this fit alongside DLP, EDR, and the rest of an IP cyber security program?

It fills a gap the data-centric tools cannot reach. DLP watches file movement, EDR watches endpoint behavior, CASB watches SaaS egress. None of them see a phone pointed at a whiteboard, a contractor photographing a prototype, or a person with a USB stick in a rack aisle. Physical surveillance of IP-bearing rooms covers that layer. The trap is that most AI surveillance stacks added to do this job re-import the exact risk the rest of the program spent years reducing: a networked, cloud-connected, credential-holding Linux device inside the perimeter. An HDMI-tap architecture closes the physical-layer gap without reopening the network-layer one.

What IP-bearing spaces is this actually deployed in today?

Small and mid-market R&D floors, prototype machine shops, biotech wet labs, secure data closets on multi-tenant office floors, engineering mock-up bays, and law firm document rooms. The common shape: a DVR or NVR already exists for compliance or insurance reasons, the monitor already lives in a back office, and the company wants AI on the footage without reworking the camera plant. The architecture also fits well with regulated environments where camera footage of the interior cannot legally leave the country or the building: the raw multiview never leaves, only event-level payloads do, and those can be routed to an on-premise event sink over cellular backhaul if a cloud receiver is not allowed.

How does this handle the insider threat scenario, where the attacker is already inside with access?

The same way any physical surveillance does: it records and alerts on presence in zones that should be empty at a given hour. The difference an HDMI-tap unit makes for the insider case is that the insider's network access, even if it is broad, does not extend to the surveillance stack. The AI unit is not on any LAN the insider can reach. They cannot disable the inference by SSH'ing into an exposed dashboard, cannot rewrite zone definitions from their laptop, cannot pause alerts from inside the building. Control is through the cellular-backed dashboard only. That is a meaningful hardening against the motivated insider who would otherwise target the camera network first.

What is the realistic failure mode? What could still go wrong from an IP cyber security perspective?

Three concrete ones, all small. First, an attacker with physical access to the gatehouse could unplug the unit; detection coverage stops, but no data leaks. Second, a supply chain compromise of the cellular eSIM vendor could affect egress; the mitigation is that the only thing on that path is event payloads, so the blast radius is bounded. Third, the DVR itself could be compromised via its own network interface, which is a pre-existing risk the monitor-first architecture neither creates nor solves. None of these introduce a new attacker path into the corporate LAN or to the IP traffic on it. That is the specific property the architecture is designed to preserve.

Can the event payloads be routed to an on-premise sink for regulated environments, or is cloud mandatory?

They can. The default path is our system's cloud event receiver, but the unit can be configured to POST the 240 KB event payload to an on-premise HTTPS endpoint over the cellular uplink, and the uplink is cellular specifically so it does not require any path through the corporate network. For regulated environments that require event data to land on an owned server, this preserves the property that the multiview never leaves the building while still letting the event metadata and clips be ingested by an internal SIEM or video audit system.

The point, in one paragraph

Intellectual property cyber security is a data-layer program with a physical-layer blind spot. Closing that blind spot by adding AI to the cameras that watch IP-bearing rooms is the right move. Doing it with a camera-first stack reintroduces, one network hop from the IP, the exact kind of credential-holding, cloud-tunneled Linux device the rest of the program has been removing for years. The HDMI-tap architecture closes the physical gap without reopening the network one: zero LAN IPs, zero RTSP connections, zero raw multiview off-property, roughly 240 KB per delivered event over a cellular uplink that never touches the corporate wire.

If the question your IP cyber security program is asking is 'how do we watch the rooms that hold the IP without putting a new attacker-addressable endpoint next to the IP,' the answer is the input architecture, not the model.