Every edge AI computing guide describes N cameras as N decoders as N pipelines. For a property with 25 cameras, the unit of compute is a tile, not a stream.
Cisco, IBM, NVIDIA, Red Hat, and Flexential all describe edge AI as an IoT endpoint problem: one sensor, one model, one inference pipeline. That framing works for a retail shelf or a factory line. It does not describe the real shape of edge AI computing for multifamily security, which starts at 8 cameras and scales to 48. This guide is about the architectural choice that makes the difference: decode a single composite HDMI frame once, partition it into tile polygons, and schedule inference per tile. The compute ceiling stops being per-camera and becomes per-display. The unit cost at the property collapses.
See tile-scheduled edge AI running on a live DVRThe framing every edge AI computing article is missing
Open the top search results for the phrase edge ai computing and you get the same paragraphs. Edge AI pushes inference out of the cloud. Quantize your model. Prune redundant weights. Distill into a smaller network. Reduce latency. Protect privacy. Save bandwidth. All true. All necessary. None of it explains how a property with 16 cameras schedules its compute.
The IoT framing those articles inherit assumes the sensor count is 1. One camera, one microphone, one accelerometer. The edge device runs one inference pipeline on one stream. The field that uses edge AI computing most aggressively in 2026 is physical security, where the sensor count is 8 to 48 and the data is always video. The architecture a security property needs is not a smaller 5G endpoint. It is a scheduler that decodes one composite frame and runs inference on polygons inside it.
That architectural choice is the spine of this guide. The rest of the page lays out the math, the tile geometry, the compute ceiling, and what it means for the unit economics of an edge AI deployment across a real property portfolio.
The compute shape, from HDMI input to per-tile inference
One decode, tile polygons, shared inference loop
The numbers, on one 1080p HDMI input
Those four numbers do not scale with camera count. A 25-camera property and an 8-camera property on the same DVR consume the same pixel budget, the same decoder slot, the same wattage. Cost per tile falls linearly as the property fills up the multiview.
Tile geometry, layout by layout
The multiview layouts a DVR actually ships, and what the tile dimensions are
2x2 layout, 4 cameras
Each tile is about 960x540. Subjects at typical parking-lot distances (6 to 12 meters) take up 120 to 220 pixels of the tile height. Person, vehicle, and loitering detectors run with margin to spare. The scheduler is almost idle.
3x3 layout, 9 cameras
Each tile is about 640x360. Subjects take up 80 to 150 pixels of tile height. Still well above the detector's minimum subject size. This is the layout most 8-to-12-channel DVRs default to on installations.
4x4 layout, 16 cameras
Each tile is about 480x270. Subjects take up 60 to 110 pixels. The detection margin narrows but remains above threshold for person and vehicle. This is the most common layout on 16-channel hybrid DVRs in Class B and Class C multifamily.
5x5 layout, 25 cameras
Each tile is about 384x216. Subjects take up 50 to 90 pixels. This is the practical ceiling for person detection at typical property distances. Past this, tiles get too small for reliable classification without sacrificing recall at a rate the operator notices.
Fullscreen drill-in, layout_id shift
When the operator clicks a camera to fullscreen, the layout_id changes. The unit detects the new layout, remaps polygons, and keeps running per-tile inference on the one active tile until the view changes back. No re-initialization, no lost frames.
A trace of the first two seconds on a freshly plugged-in unit
Below is a redacted boot trace from a Cyrano unit attached to a 16-channel hybrid DVR on a 125-unit property, captured the first time the HDMI cable was connected. It shows layout detection, tile mapping, and the transition into per-tile inference.
Two compute shapes, side by side
Per-stream edge AI vs tile-scheduled edge AI on the same 16-camera property
Box ingests 16 RTSP streams. Hardware decoder allocates 16 slots (or time-slices one across 16 streams at reduced fps). Memory carries 16 ring buffers. The install begins with 16 RTSP URLs, 16 credentials, a VLAN request, and a firmware-update plan. Every stream has an independent reconnect timer. Per-camera retry loops collide with the inference scheduler. First event ready 20 to 90 minutes after arrival on site, if the network plan holds.
- 16 decoders, 16 ring buffers, 16 reconnect timers
- RTSP + ONVIF + credential rotation per camera
- VLAN or firmware change often required
- First event 20 to 90 minutes after arrival
- Per-camera failure modes in production
What the tile-scheduled compute model buys you, field by field
One decoder slot, 25 tiles, zero stream sessions
A bounded decode budget on Jetson-class silicon (NVDEC or equivalent) covers the full multiview in one pass. No time-slicing, no dropped frames, no stream session lifecycle. The decode cost is O(1) in camera count.
Compute ceiling bounded by display refresh
30 fps on the composite is the ceiling, whether the DVR has 4 or 25 cameras in the multiview. The inference scheduler never has to decide between streams.
No credential surface to manage
RTSP URLs, ONVIF usernames, camera firmware, VLAN plans, all outside the Cyrano unit's dependency graph. The one input is HDMI. The one output is an event payload.
Auto-detected layout, auto-mapped polygons
2x2, 3x3, 4x4, 5x5, and the fullscreen drill-in are all recognized within a frame of the layout change. Tile polygons remap without reinitialization.
OCR on the DVR's on-screen text gives tile identity
Most DVRs burn camera labels into the multiview. The unit reads those labels during boot to assign tile-to-camera identity. Manual override is available from the install UI.
Per-tile event payload, ready for retrieval
Each event carries tile index, zone label, dwell seconds, event class, track id, and a cropped thumbnail. The payload is keyed on tile, not stream, which is why it lives on one device and survives a layout change.
Constant watt budget, 15W under full load
Inference on 25 tiles in one shared buffer is cheaper in watts than 25 concurrent stream decoders. The unit runs off PoE or a small brick. The thermal envelope is small enough for a closet next to the DVR.
One BOM for a 4-camera site and a 25-camera site
Because the compute ceiling is the HDMI input, the hardware is identical across site sizes up to 25 tiles. A portfolio-wide rollout stockpiles one SKU.
What happens during a layout change, scheduled frame by frame
Operator drills into camera 7, fullscreen, and back
Frame 0: 4x4 multiview, 16 tile polygons mapped
Edge AI computing approaches, side by side
| Feature | Per-stream edge box or smart-camera replace | Cyrano tile-scheduled edge |
|---|---|---|
| Unit of compute | One stream per inference pipeline | One tile per inference pipeline, one decode for all tiles |
| Compute ceiling | Camera count | Display refresh on one HDMI input |
| Install time | 2 to 6 hours per site (RTSP + ONVIF + VLAN) | Under 2 minutes per site |
| Credential surface | Per-camera username, password, firmware | None |
| Works on 2012 analog DVR | No | Yes |
| Works on 2024 Hikvision NVR | Sometimes, after ONVIF auth alignment | Yes |
| Tiles covered per unit | Typically 4 to 16 | Up to 25 |
| Watts at full load | 25 to 60W per box | Under 15W |
| Unit hardware price | $600 to $2,500 | $450 |
| Per-camera monthly software | $10 to $30 | ~$8 at 25 tiles |
| First event ready after arrival | 20 to 90 minutes | Under 2 seconds from signal lock |
Two assumptions about edge AI computing that tile scheduling breaks
Assumption: the edge is about smaller models on more devices
The canonical edge story is quantize a big model down, ship it to thousands of endpoints. In multifamily security, the property is the endpoint, and it already has 16 video inputs on one display. One box that schedules 16 tile polygons on one frame buffer is a better fit than 16 tiny boxes on 16 cameras that still need to be installed. The model does not get smaller, it gets scheduled against polygons inside the existing frame.
Assumption: more TOPS on a smart camera equals better edge AI
A camera with 4 TOPS on-board is doing its own job. When the property has 16 of them, those 16 workloads are never scheduled together. No common frame, no cross-camera track correlation, no layout awareness. Tile-scheduled edge AI computing on the DVR's composite is the only architecture that gives you track_id continuity from front-entry to elevator to hallway in a single compute session.
What an installer actually does, end to end
From Cyrano unit out of the box to first per-tile event
- Locate the DVR or NVR HDMI output driving the leasing-office monitor
- Insert HDMI splitter; one leg continues to the monitor, one leg goes to the Cyrano unit
- Plug the Cyrano unit into the wall or into PoE
- Connect the unit to the property LAN or use the cellular fallback
- Wait under a second for HDMI signal lock and automatic layout detection
- Confirm the unit has read the on-screen tile labels from the DVR multiview
- Override any tile labels that the DVR does not expose in OCR-able text
- Draw zone polygons on tiles where loitering, tailgating, or restricted-area rules apply
- Confirm WhatsApp or SMS alert destination and alert sensitivity per zone
- First per-tile event ready; the install is done in under 2 minutes
DVRs and NVRs the tile-scheduled model already runs against
The tile-scheduled compute model is device-agnostic. Any DVR or NVR driving a multiview over HDMI is a supported source, which covers most consumer and property-grade recorders installed in Class B and Class C multifamily in the last decade.
A property-side operator and an embedded-systems reviewer. Cyrano pays neither.
Who wrote this
Cyrano Security
Product team, edge AI for property DVRs
Ships the tile-scheduled edge compute described on this page. 50+ multifamily property deployments against existing DVR HDMI out.
Operations reviewer
Portfolio director, 12-property Class B portfolio
Reviewed the install and BOM claims on this page against actual site rollouts, including the claim that a 4-camera site and a 25-camera site have the same hardware SKU.
Embedded systems reviewer
Edge inference on Jetson-class hardware
Reviewed the decode-once, tile-schedule architecture claims against NVDEC pixel budgets and on-device inference latency on Orin-class silicon.
“The insight that flipped the architecture was not a new model and not a new accelerator. It was realizing the DVR was already decoding every camera into one composite frame for the wall monitor. If the edge device ingested that composite instead of re-decoding 16 streams from RTSP, the compute shape changed from per-stream to per-tile, the install time changed from hours to minutes, and the BOM changed from 16 SKUs to one. That is what edge AI computing for security looks like when you stop treating the problem like 5G IoT.”
Cyrano field notes, multifamily portfolio, 2025 to 2026
See tile-scheduled edge AI compute on a live DVR
Book a 15-minute demo. We will plug a Cyrano unit into a production DVR, walk the layout detection, show per-tile inference running against a 16-tile 4x4 multiview, and pull an event payload with tile id, zone label, and track id attached.
Book a demo →Edge AI Computing for Security: Frequently Asked Questions
What does edge AI computing actually mean in a security camera context?
Edge AI computing means the machine learning inference happens on a device at the property, not in a cloud region. For security cameras the inference is typically computer vision: person, vehicle, loitering, tailgating, package, restricted-area entry. The model was trained in a data center but the matrix multiplies that produce detections run on local silicon (a Jetson-class NPU, a Coral TPU, or an embedded GPU) inside a box on the same LAN as the DVR. The generic edge AI framing you see on Cisco, IBM, NVIDIA, and Red Hat pages is correct as far as it goes, it just stops before the interesting engineering question in security, which is what shape the compute workload takes. A camera system is not a 5G IoT endpoint streaming one sensor. It is 4 to 25 high-bitrate video feeds that all have to be looked at at once. The architecture choice for how to schedule that compute is where the real decisions get made.
Why does the tile-vs-stream distinction matter for edge AI cost and performance?
Because the dominant cost item in real-time video AI is not inference, it is decoding. Each H.264 or H.265 stream an edge box ingests consumes a hardware decoder slot and a ring buffer of memory. On a Jetson Orin Nano the NVDEC has a bounded decode budget, measured in pixels per second. A 16-camera RTSP ingest at 1080p 15fps puts you against that budget immediately, then you pay the cost of managing 16 asynchronous decoders, 16 stream reconnect timers, 16 credential rotations, and 16 per-stream failure modes. Cyrano's architecture takes one HDMI frame buffer at 1080p 30fps, decodes it once, and treats the tile polygons inside it as the inference workload. The per-camera decoder cost collapses to one. The stream-management complexity collapses to zero. The total pixels-per-second of inference is bounded by the display refresh of the wall monitor, not by the camera count. That is a different compute shape from what the generic edge AI literature describes.
What are the concrete dimensions of a tile on a Cyrano device?
On a 1080p HDMI input (1920x1080) in a 5x5 multiview layout, each tile is about 384x216 pixels before adjustments for on-screen borders, timestamps, and camera labels. In a 4x4 layout, tiles are 480x270. In a 3x3 layout, tiles are 640x360. The Cyrano auto-detects layout from the HDMI frame and maps tile polygons accordingly. Person and vehicle detection at typical parking-lot and entry distances work well at these tile sizes because the targets are already large relative to the tile, not small relative to a full 1080p scene. This differs from direct-stream AI where each camera is analyzed at native resolution and the model has to handle a much wider subject-size distribution. In the tile model, the range narrows, which is one reason inference cost is stable across layouts.
If the edge compute is bounded by the HDMI input rather than the camera count, what is the ceiling?
The ceiling on a single unit is tiles per HDMI input rather than cameras per device. The current limit is 25 tiles per HDMI input per unit. Past that, a property plugs in a second unit to a second DVR HDMI output or to a second DVR altogether. The unit itself consumes under 15 watts, takes under 2 minutes to install, runs inference at about 30 frames per second against the composite input, and emits alerts with thumbnails over WhatsApp or SMS within seconds of the event. The unit price is $450 hardware, $200 per month software. No RTSP, no ONVIF, no network reconfiguration, no camera replacement. The compute ceiling and the install ceiling are both tied to HDMI inputs, not camera count, which is how a portfolio of 4-camera and 20-camera properties ends up with the same per-site BOM.
How is model accuracy on a tile different from model accuracy on a direct stream?
The subjects are larger relative to the frame, which is an accuracy advantage for small-object detection in low light. The tile has less absolute pixel detail than a 4K RTSP pull, which is an accuracy cost for fine-grained tasks like license plate OCR or face recognition at distance. For the detection classes Cyrano ships (person, vehicle, loitering, tailgating, package, restricted-area entry) the tile-sized subject is well within the resolution the model needs. The relevant tradeoff is that tile-based edge AI computing is optimized for the security events a property manager acts on (was someone in the back parking lot for 90 seconds at 2am) rather than for forensic reconstruction that demands native-resolution footage. The native-resolution footage is already preserved on the DVR, the edge compute adds the real-time awareness layer on top of it.
Why do the usual edge AI guides not talk about this architecture?
Because the usual edge AI guides are written for IoT vendors, not for security integrators. The canonical edge AI customer in the Cisco or IBM framing is a factory line with one sensor, a retail shelf with one camera, or a telemedicine kit with one medical feed. One sensor, one model, one endpoint. That framing predates the real-world multifamily security camera problem, which starts at 8 cameras and scales to 48. Edge AI computing for security camera systems is a multi-tile scheduling problem dressed up as an IoT problem. The right architecture answer is not more TOPS on a smarter camera, it is a different decomposition of the workload: decode once, partition the frame, run inference per tile. That is the gap between the generic edge AI literature and what actually runs on a real property.
What about newer IP cameras with on-board AI, aren't those already the edge?
Yes, an IP camera with built-in analytics is an edge AI device in the strict sense. The problem is that those cameras are a greenfield install. A Class B or Class C multifamily property already has 8 to 24 cameras wired, most of them analog HD-CVI or HD-TVI going to a hybrid DVR, bought between 2015 and 2022. Ripping all of those to install new IP cameras with on-board AI costs $50k to $150k per property. A Cyrano unit brings edge AI computing to that existing system without the rip-and-replace, for $450 hardware per property and a 2-minute install. The strategic difference is that the industry's preferred answer is a new camera, and the operator's need is AI on the system they already paid for.
Where does the inference actually run, and what happens when the internet goes out?
Inference runs on silicon inside the Cyrano unit on the same LAN as the DVR. The HDMI frame buffer never leaves the device. The only outbound traffic is the alert payload, which is a short thumbnail plus metadata (timestamp, zone label, event class, track id, dwell seconds). When the internet goes out, the device keeps running inference, keeps writing events to the on-device index, and queues alerts. When connectivity restores, the queue drains. This matters because the generic cloud-AI architecture fails hard when the property's uplink drops, which happens often on older properties with a single ISP. Edge AI computing in the tile-scheduling model keeps the safety layer live through outages.
What is the practical upper bound on cameras per property with this model?
One Cyrano unit covers up to 25 tiles from one HDMI output. Most multifamily DVRs expose a single HDMI output driving a wall monitor in the leasing office, so one unit is sufficient through 25 cameras. For larger sites, an HDMI splitter, a second DVR HDMI output, or a second DVR handles the 26th camera and beyond with a second unit. The relevant math for a property manager is not TOPS per dollar, it is tiles-covered per dollar per month: one unit at $450 hardware plus $200 per month software, divided across up to 25 tiles, is $18 per tile hardware and $8 per tile per month in steady state. No comparable direct-stream edge AI device is even close on that axis at that install complexity.
What does the first hour of compute look like on a freshly installed Cyrano unit?
Within about 90 seconds of the unit receiving its first HDMI signal, it has detected the multiview layout, mapped tile polygons, registered per-tile camera identifiers (read from the DVR's on-screen-display text or assigned by the installer in the UI), and begun publishing events. Within the first 10 minutes it has collected enough scene statistics to run the loitering and restricted-area detectors cleanly. Within the first hour it has resolved normal vs unusual motion patterns well enough that the false-positive rate has dropped to steady state. All of this happens on the device. No scene data is uploaded to any training pipeline. The model that runs on the device was trained against a general multifamily dataset in advance, not against this property's specific feed.
Adjacent edge compute and property-security topics that use the same tile-scheduled architecture.
Related guides
Edge AI Device for Security Cameras: The HDMI Integration Pattern
The physical install pattern that makes the tile-scheduled compute model work: one HDMI splitter, one edge unit, no camera replacement.
Edge AI vs Cloud AI for Security Cameras: Bandwidth, Latency, Cost
Why cloud AI fails at 16 cameras and what the bandwidth and latency math looks like at a real multifamily property.
Edge AI Solutions for Physical Security: A Practical Buyer's Guide
The buyer-side view: what to look for, what to skip, and the three realistic deployment architectures in 2026.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.