Matthew Diakonov, Written with AI

Published April 19, 202612 min read

What Is Edge AI, Security Edition

Edge AI is the compute that sits where the guard was supposed to stand, which in practice is a few inches of HDMI cable between the DVR and the wall monitor in the leasing office.

Every top search result for this question opens with the same paragraph. Edge AI pushes inference out of the cloud. Edge AI processes data close to the sensor. Edge AI is great for privacy, bandwidth, and latency. All true. None of it tells you where the edge physically is. This guide gives you a coordinate: on an existing commercial property, the edge is the HDMI cable between the recorder and the monitor that a human was supposed to watch. That is a different definition than the IoT one, it is more specific, and it is the definition that makes retrofit edge AI actually work.

See edge AI running on a real DVR HDMI feed

4.9from 50+ properties

One HDMI input, up to 25 camera tiles, under 15W

2-minute install, no camera replacement, no RTSP

20 incidents caught in first 30 days at one Fort Worth property

Edge means on the HDMI cable, not in the cloud

What is edge AI, really

The one-line definition the NVIDIA, IBM, and Red Hat pages do not say out loud.

Edge AI is compute positioned where a human was supposed to watch

In security that place is an HDMI cable, not a cloud region

One 1920x1080 frame buffer covers up to 25 camera tiles

Under 15 watts replaces the stare of a guard who never stayed awake

Install in under 2 minutes, no camera changes

0:00 / 0:05

The textbook answer, and why it is not enough

Read NVIDIA, IBM, Red Hat, Cisco, Arm, TechTarget, HPE, Splunk, Scale, and Synopsys on this exact phrase and you get the same paragraph. Edge AI is the deployment of machine learning models on devices at the edge of the network, close to where the data is generated, rather than in a centralized cloud data center. Benefits are latency, privacy, bandwidth, and availability during disconnects. The examples given are autonomous cars, smart speakers, factory line sensors, retail-shelf cameras, and telemedicine kits.

All of that is correct. The problem is that the examples share a hidden assumption: the sensor and the compute live on the same device, bought together, in a greenfield install. A smart speaker is the microphone plus the wake-word model. A smart camera is the lens plus the person detector. In that framing, adding edge AI to a scene is the same act as buying the sensor.

Physical security breaks that assumption. An existing multifamily or commercial property has already bought its sensors, often a decade ago. Eight to twenty-four cameras, mostly analog HD-CVI or HD-TVI, wired to a hybrid DVR, feeding a monitor on a wall that nobody watches. The question the generic articles do not answer is: where does edge AI go in that scene? The answer is the spine of this page.

The physical location of the edge, in a real building

Trace the video path on any real commercial property. Cameras feed the DVR. The DVR feeds a monitor. A person, in theory, watches the monitor. The edge AI version of this story inserts one device on the cable that connects the DVR to the monitor, and runs inference on the composite frame that would have hit a human eyeball.

The edge is a coordinate on the HDMI cable

The anchor numbers that define the edge

0camera tiles per HDMI input, per unit

0Wwatts drawn by the edge unit

0 minminutes to install, HDMI plus network plus power

0fpsframes per second on the composite frame

None of those numbers scale with camera count. A 25-camera property and an 8-camera property run on the same edge unit, the same watts, the same install window. That is a different cost curve than the one a smart-camera deployment produces.

Generic edge AI definition vs. the security edition

The generic IoT framing is fine for IoT products. It is wrong for brownfield security. Here is the side-by-side.

Feature	Generic IoT	Security (HDMI edge)
Where the edge lives	On the sensor device itself	On the HDMI cable between DVR and monitor
Install assumption	Greenfield, new sensor purchase	Brownfield, existing cameras stay
Sensor count per edge unit	1, by definition	Up to 25 tile polygons per HDMI input
Integration surface	Vendor SDK, firmware rev	Video signal that already drives a monitor
Decode cost	Per-stream, per-device	One composite frame, decoded once
Deployment time	Hardware replacement cycle	Under 2 minutes per property
Failure mode during uplink loss	Depends on sensor firmware	Keeps inferring, queues alerts locally

The boot trace: what edge AI looks like in its first 2 seconds

A real edge AI definition should survive contact with a field install. Here is what the first two seconds look like on a unit the first time the HDMI cable is plugged into a 16-channel hybrid DVR on a real multifamily property.

cyrano boot trace, HDMI signal lock to first inference

Under two seconds from HDMI plug to per-tile inference. The generic edge AI articles cannot describe this part because the generic edge AI device is a sensor you already own, not a box you just plugged into the back of the DVR.

What the edge actually does, per frame

On each composite frame the edge unit pulls off HDMI, four things happen. They happen on the device, under 15 watts, with no outbound cloud call in the critical path.

Decode once

A single 1920x1080 composite frame is decoded into the shared frame buffer. Not 16 decoders. Not 25. One. The decoder budget stops scaling with camera count.

Tile polygon map

The auto-detected grid becomes polygon coordinates. Per-tile inference operates on those polygons in the shared buffer.

Per-tile classifiers

Person, vehicle, loitering, tailgating, package, restricted-area entry. Each classifier runs per tile, scheduled against the display refresh rate.

Event rollup and alert

Detections are rolled up into track ids, dwell seconds, and zone events. A thumbnail plus metadata is emitted over WhatsApp or SMS within seconds of the real-world event. No video leaves the property.

Offline continuity

If the internet uplink drops, inference keeps running, alerts queue on the device, and the queue drains when connectivity returns. The edge stays awake even when the property is cut off.

Before: unwatched monitor. After: edge AI at the coordinate.

The clearest way to see what edge AI means for a specific building is to compare the same cable before and after the device is plugged in.

A DVR in a closet drives a wall monitor in the leasing office. After 5pm the office is empty. The monitor shows 16 cameras in a 4x4 grid that no one is watching. Incidents happen. A tenant complains the next morning. An operator scrubs through 14 hours of footage to find the 30 seconds that matter.

HDMI cable driving nothing but photons
Zero eyeballs on the feed from 5pm to 8am
Post-hoc scrub, hours per incident
No alert, no real-time response

The five things every edge AI device has to do here

Minimum viable edge AI for a retrofit security install

1. Accept the signal that already exists

No RTSP, no ONVIF, no new VLAN. The only signal guaranteed to exist on any property with cameras is the HDMI feed to the wall monitor. The edge device must accept it.

2. Self-detect layout

Every DVR has its own multiview preference. The edge device must detect 2x2, 3x3, 4x4, 5x5 or custom layouts automatically and adapt when the operator switches views.

3. Map tiles to camera identity

A person detected on tile[02] is meaningless unless tile[02] is labeled 'front-entry.' The device must OCR the DVR's on-screen camera labels, or let the installer map them in the UI in under a minute.

4. Run inference per tile at display refresh

The device schedules classifiers (person, vehicle, loitering, tailgating, package, restricted-area entry) across all active tiles at the 30 fps budget of the composite frame.

5. Emit alerts that a human can act on in seconds

The output is a thumbnail plus compact metadata over WhatsApp or SMS, not a stream. The latency from real-world event to operator notification is seconds, not minutes.

The anchor fact: 20 incidents in 30 days at one Fort Worth property

One Class C multifamily property in Fort Worth had 16 existing analog HD-CVI cameras, a hybrid DVR from 2019, a wall monitor in the leasing office, and no on-site guard. The only change was a single unit on the HDMI cable. In the first 30 days the edge unit caught 20 distinct incidents, including a break-in attempt, and the property renewed after the month was up. That is the concrete shape of edge AI when you stop defining it abstractly and start defining it by its position on an actual cable.

Anchor deployment

Fort Worth, TX, Class C multifamily, 125 units

16 existing cameras, unchanged
Hybrid DVR from 2019, unchanged
One unit inserted on the HDMI cable to the leasing-office monitor
Install time: under 2 minutes (HDMI, network, power)
First month: 0 incidents caught, including a break-in attempt
Outcome: renewed after 30 days

What this form of edge AI gives you, in a checklist

If the definition of edge AI is 'compute where the guard was supposed to stand,' you get...

Real-time alerts on existing cameras, no camera replacement
No RTSP negotiation, no ONVIF firmware updates, no VLAN work
Inference that survives internet outages, queuing alerts locally
Forensic video still lives on the existing DVR, untouched
Privacy: no video leaves the property, only thumbnails plus metadata
One edge unit covers up to 25 tiles, the same cost from 4 cameras to 25
Install under 2 minutes, no contractor, no IT team
$450 hardware, $200/month, measurable against one month of a guard salary

Where generic edge AI literature came from, and why it stopped short

The canonical edge AI literature was written by chip vendors and cloud vendors for an audience of IoT product teams. NVIDIA sells Jetson modules. Arm sells NPU IP. IBM and Red Hat sell edge orchestration platforms. Cisco sells gear that moves data between edge and cloud. Each of them has an interest in framing edge AI as a class of new greenfield devices that their customers will buy, deploy, and manage. The framing is correct for that market. It simply stops one layer short of the buildings where the largest number of cameras in the world are already installed.

Those buildings do not need a new edge device to replace the camera. They need a new edge device to replace the human who was supposed to watch the monitor. That is a different product, with a different installation surface, aimed at a different buyer. It is still edge AI by every technical definition. It is just edge AI positioned at a coordinate the generic articles never mention.

The generic sources that define edge AI, all of which skip the HDMI coordinate

NVIDIAIBMRed HatCiscoArmHPESplunkSynopsysScale ComputingTechTarget

Want to see edge AI at this exact coordinate?

Book a 15-minute walkthrough. We plug a unit into a DVR HDMI output on a live property and show what per-tile inference looks like at 30 fps, under 15 watts, in real time.

Book a 15-minute demo →

Frequently asked questions

What is edge AI, in one sentence that is not from a glossary?

Edge AI is machine learning inference that runs on hardware sitting at the same physical location as the data source, not in a cloud region. The textbook definition (NVIDIA, IBM, Red Hat, Cisco) stops there and then pivots to autonomous cars and smart speakers. In the security context the more useful framing is that the edge is wherever an attentive human would have watched. For a multifamily or commercial property that place is a wall monitor in the leasing office or lobby, driven by a DVR or NVR over HDMI. Edge AI in security is the compute that replaces the stare of the guard who was never actually going to watch that monitor for eight hours straight.

How is this definition different from the NVIDIA, IBM, or Red Hat definition of edge AI?

The canonical definition assumes the sensor and the inference live on the same device. A smart speaker is the microphone and the wake-word model. A smart camera is the lens and the person detector. That framing works for greenfield IoT deployments where you buy new hardware. It does not describe the existing installed base of physical security, which is tens of millions of analog HD-CVI or HD-TVI cameras wired to DVRs that know nothing about AI. In that world the edge is not the camera, it is a few inches of HDMI cable between the DVR and the monitor. Our system sits exactly there, which is why one $450 unit running at about 15 watts can make an entire existing camera system intelligent without touching the cameras or the network.

What are the concrete specs of the edge in this definition?

One HDMI input at 1920 by 1080 resolution, 30 frames per second. That is roughly 62 million pixels per second that the edge unit decodes, once, into a single frame buffer. Inside that frame the DVR has laid out up to 25 camera tiles in a multiview grid. The unit auto-detects the grid (2x2, 3x3, 4x4, 5x5, or a custom layout) within about 800 milliseconds of HDMI signal lock, maps tile polygons, OCRs the per-tile camera labels from the DVR's on-screen display, and begins per-tile inference. Power draw is roughly 15 watts. Install is under two minutes. Cost is $450 hardware, $200 per month software. The edge is that unit.

Why is the HDMI cable the useful coordinate, and not the RTSP stream from each camera?

Because on a real commercial property the RTSP streams are usually not reachable. Half the cameras are analog BNC on a hybrid DVR with no IP address. The IP cameras that exist are on a VLAN the property manager cannot change without calling the installer who set them up six years ago and is no longer in business. The NVR's ONVIF profile is from 2018 and authenticates in a way a modern edge device does not speak. The one place every camera feed reliably appears is the HDMI output driving the wall monitor. That signal is guaranteed to exist because a human was originally supposed to watch it. Defining the edge as that HDMI point sidesteps every RTSP and ONVIF integration problem at once.

Does this mean edge AI has to happen on the monitor cable, or can it still happen on the camera?

Both are valid forms of edge AI. A smart IP camera with on-board analytics is edge AI in the strict sense, its inference runs on the camera itself. The problem with that form is that it requires buying new cameras. A Class B or Class C multifamily property already has 8 to 24 cameras installed, most bought between 2015 and 2022. Replacing them costs $50,000 to $150,000 per property. The HDMI edge answer brings edge AI to that existing system for $450 hardware per property and a 2-minute install. The two forms are not mutually exclusive. They target different buyers: smart-camera edge AI targets greenfield installs, HDMI edge AI targets brownfield retrofits.

Is it actually accurate to call the unit edge AI, or is that marketing?

It is accurate. The unit meets all of the technical criteria the canonical definitions list. Inference runs locally on silicon inside the device, on the same LAN as the source. Video never leaves the property. Alerts are produced with seconds of latency, not the hundreds of milliseconds an upload round-trip would add. When the internet uplink fails the device keeps running inference, queues alerts on disk, and flushes when connectivity returns. Nothing is routed through a cloud GPU. The only outbound payload is a short thumbnail plus event metadata. By the NVIDIA definition, by the IBM definition, by the Arm definition, it is edge AI. It is also a different physical placement of that edge than those definitions illustrate.

How does the first 90 seconds of a real edge AI boot actually look on this device?

HDMI signal lock at about 800 milliseconds, EDID read from the source DVR, layout detection within the first second, tile polygon map produced, per-tile OCR of the DVR's on-screen camera labels in the next two to three seconds, model warm-start in parallel, first per-tile inference run complete by about five seconds, first real events surfaced to the dashboard within 10 to 20 seconds depending on motion in the scene. Within 90 seconds the unit has collected enough scene statistics to classify normal versus unusual motion for the loitering and restricted-area detectors. Within an hour the false-positive rate is at steady state. All on-device. No scene data uploaded to any training pipeline.

What does the bandwidth profile look like in this form of edge AI?

The uplink from the property carries only alert payloads. A typical alert is a sub-100KB JPEG thumbnail plus a small metadata blob (timestamp, tile label, event class, dwell seconds, track id). A busy multifamily property might generate 40 to 80 alerts per day, peaking during move-in and move-out. The total outbound data per property per day is typically under 10MB. Compare that to cloud-recorded video surveillance, where 16 cameras at 2Mbps each is 32Mbps sustained outbound, about 345GB per day. The bandwidth story alone is why properties with flaky uplinks can deploy edge AI and cannot deploy cloud recording.

What happens when the HDMI source changes its layout, for example when the operator clicks a camera to fullscreen?

The unit detects the layout change from the next frame, remaps the tile polygons, and keeps running per-tile inference against the new geometry without a reinitialization. During a fullscreen drill-in the entire frame becomes one tile and inference runs on that one tile at higher effective resolution. When the operator returns to multiview, the grid snaps back and polygons are re-mapped. No frames are lost to the swap. This is a real field behavior the generic edge AI articles cannot describe because they do not model the human operator who is still occasionally touching the DVR UI.

Where has this definition of edge AI actually been deployed, and what did it catch?

At one Class C multifamily property in Fort Worth, a single edge AI unit caught 20 security incidents including a break-in attempt within the first 30 days of install. The property had 16 existing analog HD-CVI cameras wired to a hybrid DVR from 2019, a wall monitor in the leasing office that nobody watched in the evenings, and no on-site guard. The HDMI edge unit was the only change. Everything else (cameras, cabling, DVR, network) was already in place. That is the concrete shape of edge AI when the definition is 'compute where the guard was supposed to stand.'

Related guides

Architecture

Edge AI Device: The HDMI Integration Pattern

Why the input port of an edge AI device matters more than its TOPS rating, and how HDMI-in sidesteps RTSP and ONVIF entirely.

Read

Compute

Edge AI Computing: Why the Unit of Compute Is a Tile, Not a Stream

The tile-scheduling architecture that lets one unit run inference on up to 25 camera feeds from a single HDMI input.

Read

Bandwidth

Edge AI vs Cloud Security Cameras: Bandwidth

What the monthly uplink looks like when inference runs on the property versus in a cloud region, with the numbers.

Read

Edge AI is the compute that sits where the guard was supposed to stand, which in practice is a few inches of HDMI cable between the DVR and the wall monitor in the leasing office.

The textbook answer, and why it is not enough

The physical location of the edge, in a real building

The edge is a coordinate on the HDMI cable

The anchor numbers that define the edge

Generic edge AI definition vs. the security edition

The boot trace: what edge AI looks like in its first 2 seconds

What the edge actually does, per frame

Decode once

Tile polygon map

Per-tile classifiers

Event rollup and alert

Offline continuity

Before: unwatched monitor. After: edge AI at the coordinate.

The five things every edge AI device has to do here

Minimum viable edge AI for a retrofit security install

1. Accept the signal that already exists

2. Self-detect layout

3. Map tiles to camera identity

4. Run inference per tile at display refresh

5. Emit alerts that a human can act on in seconds

The anchor fact: 20 incidents in 30 days at one Fort Worth property

Fort Worth, TX, Class C multifamily, 125 units

What this form of edge AI gives you, in a checklist

Where generic edge AI literature came from, and why it stopped short

Frequently asked questions

Related guides

Edge AI Device: The HDMI Integration Pattern

Edge AI Computing: Why the Unit of Compute Is a Tile, Not a Stream

Edge AI vs Cloud Security Cameras: Bandwidth

Comments (••)

Comments ()