Matthew Diakonov, Written with AI

Published April 19, 202613 min read

Edge AI Computing Architecture

Every edge AI computing guide describes N cameras as N decoders as N pipelines. For a property with 25 cameras, the unit of compute is a tile, not a stream.

Cisco, IBM, NVIDIA, Red Hat, and Flexential all describe edge AI as an IoT endpoint problem: one sensor, one model, one inference pipeline. That framing works for a retail shelf or a factory line. It does not describe the real shape of edge AI computing for multifamily security, which starts at 8 cameras and scales to 48. This guide is about the architectural choice that makes the difference: decode a single composite HDMI frame once, partition it into tile polygons, and schedule inference per tile. The compute ceiling stops being per-camera and becomes per-display. The unit cost at the property collapses.

See tile-scheduled edge AI running on a live DVR

4.9from 50+ properties

Tile as the compute unit, not the stream

1 HDMI frame buffer covers up to 25 camera tiles

Compute ceiling bounded by display refresh, not camera count

$450 hardware, $200/month, 2-minute install

Tile-scheduled edge AI computing

One HDMI frame, many tile polygons, one shared inference loop.

One decode, not N decodes

Tile polygons mapped from the DVR multiview

Per-tile inference in a shared frame buffer

Compute ceiling is display refresh, not camera count

25 tiles, 1 unit, $450 hardware

0:00 / 0:05

The framing every edge AI computing article is missing

Open the top search results for the phrase edge ai computing and you get the same paragraphs. Edge AI pushes inference out of the cloud. Quantize your model. Prune redundant weights. Distill into a smaller network. Reduce latency. Protect privacy. Save bandwidth. All true. All necessary. None of it explains how a property with 16 cameras schedules its compute.

The IoT framing those articles inherit assumes the sensor count is 1. One camera, one microphone, one accelerometer. The edge device runs one inference pipeline on one stream. The field that uses edge AI computing most aggressively in 2026 is physical security, where the sensor count is 8 to 48 and the data is always video. The architecture a security property needs is not a smaller 5G endpoint. It is a scheduler that decodes one composite frame and runs inference on polygons inside it.

That architectural choice is the spine of this guide. The rest of the page lays out the math, the tile geometry, the compute ceiling, and what it means for the unit economics of an edge AI deployment across a real property portfolio.

The compute shape, from HDMI input to per-tile inference

One decode, tile polygons, shared inference loop

The numbers, on one 1080p HDMI input

0tile polygons per unit, per HDMI input

0Mmillion pixels per second, decoded once

0fpsframes per second on the composite

0Wwatts drawn by the edge unit

Those four numbers do not scale with camera count. A 25-camera property and an 8-camera property on the same DVR consume the same pixel budget, the same decoder slot, the same wattage. Cost per tile falls linearly as the property fills up the multiview.

Tile geometry, layout by layout

The multiview layouts a DVR actually ships, and what the tile dimensions are

2x2 layout, 4 cameras

Each tile is about 960x540. Subjects at typical parking-lot distances (6 to 12 meters) take up 120 to 220 pixels of the tile height. Person, vehicle, and loitering detectors run with margin to spare. The scheduler is almost idle.

3x3 layout, 9 cameras

Each tile is about 640x360. Subjects take up 80 to 150 pixels of tile height. Still well above the detector's minimum subject size. This is the layout most 8-to-12-channel DVRs default to on installations.

4x4 layout, 16 cameras

Each tile is about 480x270. Subjects take up 60 to 110 pixels. The detection margin narrows but remains above threshold for person and vehicle. This is the most common layout on 16-channel hybrid DVRs in Class B and Class C multifamily.

5x5 layout, 25 cameras

Each tile is about 384x216. Subjects take up 50 to 90 pixels. This is the practical ceiling for person detection at typical property distances. Past this, tiles get too small for reliable classification without sacrificing recall at a rate the operator notices.

Fullscreen drill-in, layout_id shift

When the operator clicks a camera to fullscreen, the layout_id changes. The unit detects the new layout, remaps polygons, and keeps running per-tile inference on the one active tile until the view changes back. No re-initialization, no lost frames.

A trace of the first two seconds on a freshly plugged-in unit

Below is a redacted boot trace from a unit attached to a 16-channel hybrid DVR on a 125-unit property, captured the first time the HDMI cable was connected. It shows layout detection, tile mapping, and the transition into per-tile inference.

cyrano boot, first 2s after HDMI signal lock

Two compute shapes, side by side

Per-stream edge AI vs tile-scheduled edge AI on the same 16-camera property

Box ingests 16 RTSP streams. Hardware decoder allocates 16 slots (or time-slices one across 16 streams at reduced fps). Memory carries 16 ring buffers. The install begins with 16 RTSP URLs, 16 credentials, a VLAN request, and a firmware-update plan. Every stream has an independent reconnect timer. Per-camera retry loops collide with the inference scheduler. First event ready 20 to 90 minutes after arrival on site, if the network plan holds.

16 decoders, 16 ring buffers, 16 reconnect timers
RTSP + ONVIF + credential rotation per camera
VLAN or firmware change often required
First event 20 to 90 minutes after arrival
Per-camera failure modes in production

What the tile-scheduled compute model buys you, field by field

One decoder slot, 25 tiles, zero stream sessions

A bounded decode budget on Jetson-class silicon (NVDEC or equivalent) covers the full multiview in one pass. No time-slicing, no dropped frames, no stream session lifecycle. The decode cost is O(1) in camera count.

Compute ceiling bounded by display refresh

30 fps on the composite is the ceiling, whether the DVR has 4 or 25 cameras in the multiview. The inference scheduler never has to decide between streams.

No credential surface to manage

RTSP URLs, ONVIF usernames, camera firmware, VLAN plans, all outside the unit's dependency graph. The one input is HDMI. The one output is an event payload.

Auto-detected layout, auto-mapped polygons

2x2, 3x3, 4x4, 5x5, and the fullscreen drill-in are all recognized within a frame of the layout change. Tile polygons remap without reinitialization.

OCR on the DVR's on-screen text gives tile identity

Most DVRs burn camera labels into the multiview. The unit reads those labels during boot to assign tile-to-camera identity. Manual override is available from the install UI.

Per-tile event payload, ready for retrieval

Each event carries tile index, zone label, dwell seconds, event class, track id, and a cropped thumbnail. The payload is keyed on tile, not stream, which is why it lives on one device and survives a layout change.

Constant watt budget, 15W under full load

Inference on 25 tiles in one shared buffer is cheaper in watts than 25 concurrent stream decoders. The unit runs off PoE or a small brick. The thermal envelope is small enough for a closet next to the DVR.

One BOM for a 4-camera site and a 25-camera site

Because the compute ceiling is the HDMI input, the hardware is identical across site sizes up to 25 tiles. A portfolio-wide rollout stockpiles one SKU.

What happens during a layout change, scheduled frame by frame

Operator drills into camera 7, fullscreen, and back

01 / 05

Frame 0: 4x4 multiview, 16 tile polygons mapped

Inference is running on all 16 tiles. Per-tile events are flowing. Memory footprint is one frame buffer, one model working set, 16 polygon masks.

Edge AI computing approaches, side by side

Feature	Per-stream edge box or smart-camera replace	Our system tile-scheduled edge
Unit of compute	One stream per inference pipeline	One tile per inference pipeline, one decode for all tiles
Compute ceiling	Camera count	Display refresh on one HDMI input
Install time	2 to 6 hours per site (RTSP + ONVIF + VLAN)	Under 2 minutes per site
Credential surface	Per-camera username, password, firmware	None
Works on 2012 analog DVR	No	Yes
Works on 2024 Hikvision NVR	Sometimes, after ONVIF auth alignment	Yes
Tiles covered per unit	Typically 4 to 16	Up to 25
Watts at full load	25 to 60W per box	Under 15W
Unit hardware price	$600 to $2,500	$450
Per-camera monthly software	$10 to $30	~$8 at 25 tiles
First event ready after arrival	20 to 90 minutes	Under 2 seconds from signal lock

Two assumptions about edge AI computing that tile scheduling breaks

Assumption: the edge is about smaller models on more devices

The canonical edge story is quantize a big model down, ship it to thousands of endpoints. In multifamily security, the property is the endpoint, and it already has 16 video inputs on one display. One box that schedules 16 tile polygons on one frame buffer is a better fit than 16 tiny boxes on 16 cameras that still need to be installed. The model does not get smaller, it gets scheduled against polygons inside the existing frame.

Assumption: more TOPS on a smart camera equals better edge AI

A camera with 4 TOPS on-board is doing its own job. When the property has 16 of them, those 16 workloads are never scheduled together. No common frame, no cross-camera track correlation, no layout awareness. Tile-scheduled edge AI computing on the DVR's composite is the only architecture that gives you track_id continuity from front-entry to elevator to hallway in a single compute session.

What an installer actually does, end to end

From unit out of the box to first per-tile event

Locate the DVR or NVR HDMI output driving the leasing-office monitor
Insert HDMI splitter; one leg continues to the monitor, one leg goes to the unit
Plug the unit into the wall or into PoE
Connect the unit to the property LAN or use the cellular fallback
Wait under a second for HDMI signal lock and automatic layout detection
Confirm the unit has read the on-screen tile labels from the DVR multiview
Override any tile labels that the DVR does not expose in OCR-able text
Draw zone polygons on tiles where loitering, tailgating, or restricted-area rules apply
Confirm WhatsApp or SMS alert destination and alert sensitivity per zone
First per-tile event ready; the install is done in under 2 minutes

DVRs and NVRs the tile-scheduled model already runs against

Hikvision DS series

Dahua XVR and NVR

Lorex LNR and LHD

Swann DVR and NVR

Uniview NVR

Night Owl DVR

Annke NVR

Reolink NVR

Amcrest NVR

Q-See and rebrands

The tile-scheduled compute model is device-agnostic. Any DVR or NVR driving a multiview over HDMI is a supported source, which covers most consumer and property-grade recorders installed in Class B and Class C multifamily in the last decade.

A property-side operator and an embedded-systems reviewer. Our system pays neither.

Who wrote this

Our system Security

Product team, edge AI for property DVRs

Ships the tile-scheduled edge compute described on this page. 50+ multifamily property deployments against existing DVR HDMI out.

Operations reviewer

Portfolio director, 12-property Class B portfolio

Reviewed the install and BOM claims on this page against actual site rollouts, including the claim that a 4-camera site and a 25-camera site have the same hardware SKU.

Embedded systems reviewer

Edge inference on Jetson-class hardware

Reviewed the decode-once, tile-schedule architecture claims against NVDEC pixel budgets and on-device inference latency on Orin-class silicon.

1 decode

“The insight that flipped the architecture was not a new model and not a new accelerator. It was realizing the DVR was already decoding every camera into one composite frame for the wall monitor. If the edge device ingested that composite instead of re-decoding 16 streams from RTSP, the compute shape changed from per-stream to per-tile, the install time changed from hours to minutes, and the BOM changed from 16 SKUs to one. That is what edge AI computing for security looks like when you stop treating the problem like 5G IoT.”

Our system field notes, multifamily portfolio, 2025 to 2026

See tile-scheduled edge AI compute on a live DVR

Book a 15-minute demo. We will plug a unit into a production DVR, walk the layout detection, show per-tile inference running against a 16-tile 4x4 multiview, and pull an event payload with tile id, zone label, and track id attached.

Book a demo →

Edge AI Computing for Security: Frequently Asked Questions

What does edge AI computing actually mean in a security camera context?

Edge AI computing means the machine learning inference happens on a device at the property, not in a cloud region. For security cameras the inference is typically computer vision: person, vehicle, loitering, tailgating, package, restricted-area entry. The model was trained in a data center but the matrix multiplies that produce detections run on local silicon (a Jetson-class NPU, a Coral TPU, or an embedded GPU) inside a box on the same LAN as the DVR. The generic edge AI framing you see on Cisco, IBM, NVIDIA, and Red Hat pages is correct as far as it goes, it just stops before the interesting engineering question in security, which is what shape the compute workload takes. A camera system is not a 5G IoT endpoint streaming one sensor. It is 4 to 25 high-bitrate video feeds that all have to be looked at at once. The architecture choice for how to schedule that compute is where the real decisions get made.

Why does the tile-vs-stream distinction matter for edge AI cost and performance?

Because the dominant cost item in real-time video AI is not inference, it is decoding. Each H.264 or H.265 stream an edge box ingests consumes a hardware decoder slot and a ring buffer of memory. On a Jetson Orin Nano the NVDEC has a bounded decode budget, measured in pixels per second. A 16-camera RTSP ingest at 1080p 15fps puts you against that budget immediately, then you pay the cost of managing 16 asynchronous decoders, 16 stream reconnect timers, 16 credential rotations, and 16 per-stream failure modes. Our system's architecture takes one HDMI frame buffer at 1080p 30fps, decodes it once, and treats the tile polygons inside it as the inference workload. The per-camera decoder cost collapses to one. The stream-management complexity collapses to zero. The total pixels-per-second of inference is bounded by the display refresh of the wall monitor, not by the camera count. That is a different compute shape from what the generic edge AI literature describes.

What are the concrete dimensions of a tile on a our system device?

On a 1080p HDMI input (1920x1080) in a 5x5 multiview layout, each tile is about 384x216 pixels before adjustments for on-screen borders, timestamps, and camera labels. In a 4x4 layout, tiles are 480x270. In a 3x3 layout, tiles are 640x360. The our system auto-detects layout from the HDMI frame and maps tile polygons accordingly. Person and vehicle detection at typical parking-lot and entry distances work well at these tile sizes because the targets are already large relative to the tile, not small relative to a full 1080p scene. This differs from direct-stream AI where each camera is analyzed at native resolution and the model has to handle a much wider subject-size distribution. In the tile model, the range narrows, which is one reason inference cost is stable across layouts.

If the edge compute is bounded by the HDMI input rather than the camera count, what is the ceiling?

The ceiling on a single unit is tiles per HDMI input rather than cameras per device. The current limit is 25 tiles per HDMI input per unit. Past that, a property plugs in a second unit to a second DVR HDMI output or to a second DVR altogether. The unit itself consumes under 15 watts, takes under 2 minutes to install, runs inference at about 30 frames per second against the composite input, and emits alerts with thumbnails over WhatsApp or SMS within seconds of the event. The unit price is $450 hardware, $200 per month software. No RTSP, no ONVIF, no network reconfiguration, no camera replacement. The compute ceiling and the install ceiling are both tied to HDMI inputs, not camera count, which is how a portfolio of 4-camera and 20-camera properties ends up with the same per-site BOM.

How is model accuracy on a tile different from model accuracy on a direct stream?

The subjects are larger relative to the frame, which is an accuracy advantage for small-object detection in low light. The tile has less absolute pixel detail than a 4K RTSP pull, which is an accuracy cost for fine-grained tasks like license plate OCR or face recognition at distance. For the detection classes our system ships (person, vehicle, loitering, tailgating, package, restricted-area entry) the tile-sized subject is well within the resolution the model needs. The relevant tradeoff is that tile-based edge AI computing is optimized for the security events a property manager acts on (was someone in the back parking lot for 90 seconds at 2am) rather than for forensic reconstruction that demands native-resolution footage. The native-resolution footage is already preserved on the DVR, the edge compute adds the real-time awareness layer on top of it.

Why do the usual edge AI guides not talk about this architecture?

Because the usual edge AI guides are written for IoT vendors, not for security integrators. The canonical edge AI customer in the Cisco or IBM framing is a factory line with one sensor, a retail shelf with one camera, or a telemedicine kit with one medical feed. One sensor, one model, one endpoint. That framing predates the real-world multifamily security camera problem, which starts at 8 cameras and scales to 48. Edge AI computing for security camera systems is a multi-tile scheduling problem dressed up as an IoT problem. The right architecture answer is not more TOPS on a smarter camera, it is a different decomposition of the workload: decode once, partition the frame, run inference per tile. That is the gap between the generic edge AI literature and what actually runs on a real property.

What about newer IP cameras with on-board AI, aren't those already the edge?

Yes, an IP camera with built-in analytics is an edge AI device in the strict sense. The problem is that those cameras are a greenfield install. A Class B or Class C multifamily property already has 8 to 24 cameras wired, most of them analog HD-CVI or HD-TVI going to a hybrid DVR, bought between 2015 and 2022. Ripping all of those to install new IP cameras with on-board AI costs $50k to $150k per property. A unit brings edge AI computing to that existing system without the rip-and-replace, for $450 hardware per property and a 2-minute install. The strategic difference is that the industry's preferred answer is a new camera, and the operator's need is AI on the system they already paid for.

Where does the inference actually run, and what happens when the internet goes out?

Inference runs on silicon inside the unit on the same LAN as the DVR. The HDMI frame buffer never leaves the device. The only outbound traffic is the alert payload, which is a short thumbnail plus metadata (timestamp, zone label, event class, track id, dwell seconds). When the internet goes out, the device keeps running inference, keeps writing events to the on-device index, and queues alerts. When connectivity restores, the queue drains. This matters because the generic cloud-AI architecture fails hard when the property's uplink drops, which happens often on older properties with a single ISP. Edge AI computing in the tile-scheduling model keeps the safety layer live through outages.

What is the practical upper bound on cameras per property with this model?

One unit covers up to 25 tiles from one HDMI output. Most multifamily DVRs expose a single HDMI output driving a wall monitor in the leasing office, so one unit is sufficient through 25 cameras. For larger sites, an HDMI splitter, a second DVR HDMI output, or a second DVR handles the 26th camera and beyond with a second unit. The relevant math for a property manager is not TOPS per dollar, it is tiles-covered per dollar per month: one unit at $450 hardware plus $200 per month software, divided across up to 25 tiles, is $18 per tile hardware and $8 per tile per month in steady state. No comparable direct-stream edge AI device is even close on that axis at that install complexity.

What does the first hour of compute look like on a freshly installed unit?

Within about 90 seconds of the unit receiving its first HDMI signal, it has detected the multiview layout, mapped tile polygons, registered per-tile camera identifiers (read from the DVR's on-screen-display text or assigned by the installer in the UI), and begun publishing events. Within the first 10 minutes it has collected enough scene statistics to run the loitering and restricted-area detectors cleanly. Within the first hour it has resolved normal vs unusual motion patterns well enough that the false-positive rate has dropped to steady state. All of this happens on the device. No scene data is uploaded to any training pipeline. The model that runs on the device was trained against a general multifamily dataset in advance, not against this property's specific feed.

Adjacent edge compute and property-security topics that use the same tile-scheduled architecture.

Related guides

Edge AI

Edge AI Device for Security Cameras: The HDMI Integration Pattern

The physical install pattern that makes the tile-scheduled compute model work: one HDMI splitter, one edge unit, no camera replacement.

Read

Architecture

Edge AI vs Cloud AI for Security Cameras: Bandwidth, Latency, Cost

Why cloud AI fails at 16 cameras and what the bandwidth and latency math looks like at a real multifamily property.

Read

Buyer's Guide

Edge AI Solutions for Physical Security: A Practical Buyer's Guide

The buyer-side view: what to look for, what to skip, and the three realistic deployment architectures in 2026.

Read

Every edge AI computing guide describes N cameras as N decoders as N pipelines. For a property with 25 cameras, the unit of compute is a tile, not a stream.

The framing every edge AI computing article is missing

The compute shape, from HDMI input to per-tile inference

One decode, tile polygons, shared inference loop

The numbers, on one 1080p HDMI input

Tile geometry, layout by layout

The multiview layouts a DVR actually ships, and what the tile dimensions are

2x2 layout, 4 cameras

3x3 layout, 9 cameras

4x4 layout, 16 cameras

5x5 layout, 25 cameras

Fullscreen drill-in, layout_id shift

A trace of the first two seconds on a freshly plugged-in unit

Two compute shapes, side by side

Per-stream edge AI vs tile-scheduled edge AI on the same 16-camera property

What the tile-scheduled compute model buys you, field by field

One decoder slot, 25 tiles, zero stream sessions

Compute ceiling bounded by display refresh

No credential surface to manage

Auto-detected layout, auto-mapped polygons

OCR on the DVR's on-screen text gives tile identity

Per-tile event payload, ready for retrieval

Constant watt budget, 15W under full load

One BOM for a 4-camera site and a 25-camera site

What happens during a layout change, scheduled frame by frame

Operator drills into camera 7, fullscreen, and back

Frame 0: 4x4 multiview, 16 tile polygons mapped

Edge AI computing approaches, side by side

Two assumptions about edge AI computing that tile scheduling breaks

Assumption: the edge is about smaller models on more devices

Assumption: more TOPS on a smart camera equals better edge AI

What an installer actually does, end to end

DVRs and NVRs the tile-scheduled model already runs against

Who wrote this

Edge AI Computing for Security: Frequently Asked Questions

Related guides

Edge AI Device for Security Cameras: The HDMI Integration Pattern

Edge AI vs Cloud AI for Security Cameras: Bandwidth, Latency, Cost

Edge AI Solutions for Physical Security: A Practical Buyer's Guide

Comments (••)

Comments ()