Matthew Diakonov, Written with AI

Published April 19, 202614 min read

Reframe: latency, not accuracy

Power theft detection is a 60-second intervention window. Smart-meter ML runs at 15-minute reads. The window closes before the model sees anything.

Every top result for this keyword is a paper scoring an ML classifier on smart-meter time series. Nature, Frontiers, ScienceDirect, IEEE Xplore, patent filings. All of them measure accuracy and precision over consumption data, and none of them report end-to-end latency, because smart-meter data physically cannot arrive inside the window where the act is still interruptible. On the ground, power theft plays out in a 60 to 180 second approach phase between a person reaching the pedestal and the current flowing, and the alert only matters if it lands inside that window. This guide walks through the six-stage latency budget our system actually holds on a running DVR HDMI multiview, with the pre_action_zone_entry event as the deliverable.

See a live pre-action zone entry land on WhatsApp

4.9from 50+ properties

Under 60 seconds tile to WhatsApp

pre_action_zone_entry fires during approach

overlay_mask defeats the DVR-watermark argument

25 camera tiles per unit on one HDMI

Power theft is won or lost inside the approach window.

The tap takes 60 to 180 seconds. The alert has to arrive first.

Smart-meter ML: 15-minute read cycle, batch classifier

pre_action_zone_entry: pre-tap, dwell-gated, ISO 8601

Six stages, each with a budget, total under 60s

Voice-down or dispatch inside the same window

0:00 / 0:05

The SERP measures accuracy. The job measures latency.

Run a search for power theft detection. The first screen is uniform: a Nature Scientific Reports piece on deep learning for electricity theft, a Frontiers in Energy Research review of ML models, an IEEE Xplore entry on a smart power theft detection system, a ScienceDirect survey, a Google Patents filing, a Springer Nature Link article on double-connected data capture, and utility vendor blogs. Every one of them frames the problem the same way: classify a consumption time series as anomalous or not, using random forests, gradient boosting, deep CNNs, LSTMs, or autoencoders on historical kWh data.

That framing is correct for a utility trying to identify customers whose consumption does not reconcile with upstream measurements. It is wrong for anyone who needs to stop a specific act at a specific place on a specific night. The classifier is not the bottleneck. The bottleneck is that smart-meter telemetry is aggregated in 15-minute AMI intervals, the anomaly does not become visible until a few buckets later, and the ML is batch, not streaming. By the time the dashboard lights up, the copper is in the thief's van and the kWh is already billed.

The only way to fire inside the approach window is to observe the approach, not the consumption. That is what a camera zone does, and that is the gap this page covers.

The six stages of the latency budget

From the moment a person's body enters the armed polygon to the moment the ops-thread phone buzzes, there are six discrete stages, each with a budget. The total must land under 60 seconds for the intervention fork to have any remaining window. This is the path every pre_action_zone_entry event follows on the deployed our system build.

Tile to WhatsApp, stage by stage

1
Frame capture
Camera exposes a frame and ships it to the DVR. Typical, inside one frame period at 15 to 30 fps.
2
HDMI multiview
DVR composites up to 25 tiles into a single HDMI frame on the guard-monitor output. No re-encoding, pass-through level.
3
Edge inference
Our system taps the HDMI, masks DVR chrome via overlay_mask, runs per-tile person detection locally on the unit.
4
Zone plus dwell
Detection is tested against the armed polygon. dwell_seconds increments every inference tick the subject remains in zone.
5
Event queue
Payload is serialised: thumbnail, zone, dwell, ISO timestamp, overlay_mask, latency_ms. Queued for delivery.
6
WhatsApp delivery
Payload arrives on the property ops thread. End-to-end from stage one to stage six is under 60 seconds.

What the pre_action_zone_entry payload actually looks like

The anchor of this page: the exact payload that lands on the ops thread, anonymised from a live deployment. Every field has a purpose, and the combination is what makes the event both intervenable in real time and admissible in a later case file.

pre_action_zone_entry · exterior power pedestal at 02:47 local

The overlay_mask line is the one that pre-empts the “your model just detected its own DVR watermark” argument. DVR multiview output carries persistent overlays, a digital clock, a channel bug, a camera-label strip, occasionally a timecode ribbon, and each is masked out of the tile before inference runs. The mask list goes into the payload so the record shows which overlays were not contributing to the detection.

Where the approach actually happens: six zone archetypes

The SERP is full of abstractions about feeder-level energy balance and non-technical losses. On the ground, power theft lands at six specific locations. Point an existing camera at each, draw a tight polygon, set an armed window, and the pre_action_zone_entry fires during the approach instead of after the consumption.

Exterior 30A and 50A pedestals

Pool equipment bays, courtyard maintenance outlets, RV hookups, irrigation controller pedestals. A cord over a fence into a neighbouring yard is the archetype. Dwell 10 to 20 seconds, armed overnight.

Service panel room

Door into the room with main breakers and unmetered buses. After-hours entry is almost always an attempted tap. Dwell 8 seconds.

Submeter closet

Where per-unit submeters live. Jumpered submeters are the classic interior tap. Tight polygon around the closet door, 10 second dwell.

Transformer pad and meter can

Utility-side target for copper pulls and direct taps. Polygon on the pad side, 8 to 12 second dwell, armed 24 hour on unmanned sites.

House-panel common outlet

Hallway, mech-room, trash-room, and garage outlets wired to the house panel. Heaters, window ACs, mining rigs, welders. Short dwell on the wall.

Laundry or stall 240V outlet

A 240V dryer outlet after hours is a Level 2 EV charging tap waiting to happen. One resident, five neighbours, a bill the building pays.

The latency delta, in numbers

Camera-side zone events versus smart-meter ML is not a small efficiency difference. It is a different observation regime. These are the operating constants on the our system side.

0 sEnd-to-end budget, tile to WhatsApp

0 minTypical smart-meter AMI read interval

0Camera tiles per unit off one DVR HDMI

0 minPhysical install on a running DVR

Inputs, the edge AI unit, outputs

One HDMI tap off the DVR multiview is the entire capture layer. Everything downstream of the tap runs on the edge unit. Nothing leaves the site until the payload is ready for the ops thread.

One HDMI in, one ops-thread payload out, inside the approach window

What a zone configuration actually is

A single deployed zone fits on a screen. It is declarative, not a trained model. Once the polygon is drawn and the dwell threshold is set, the whole pipeline runs on the edge unit. Here is the real shape of a zone definition for a single exterior pedestal.

zones/exterior_pedestal_bay_17.yaml

Smart-meter ML vs. pre_action_zone_entry

The academic stack is right about classification. It is silent on everything that makes a power-theft event interruptible.

Meter-side ML and camera-side zone events, line by line

Same keyword, different observable, different job.

Feature	Smart-meter ML	pre_action_zone_entry (our system)
Observable signal	Aggregated kWh time series	Person in polygon with dwell
Earliest signal	Next AMI read cycle, post-consumption	Approach phase, pre-tap
Typical latency	15 minutes to hours	Under 60 seconds to WhatsApp
Intervention possible	No, consumption already occurred	Yes, voice-down or dispatch
False-fire defence	Threshold tuning per feeder	overlay_mask on DVR chrome
Named actor	None	Tile thumbnail of the person
Zero-consumption events	No signal, no kWh yet	Fires on pre-tap approach
Upstream-of-meter taps	Invisible to the submeter	Detected at the physical point
Hardware requirement	AMI rollout on every meter	Existing DVR with HDMI out

The actor path, end to end

Between the camera exposing a frame and the operator opening a two-way audio channel, six system components exchange messages. This is the same sequence every pre_action_zone_entry follows.

Frame to intervention, a single pre_action_zone_entry

The DVRs this latency path works on

Capture is at the HDMI multiview level, not the camera level. If the recorder has a monitor output, the stages above run on it. Cameras can be any age or brand, analog or IP, 1080p or 4K.

Hikvision DS-7xxx

Dahua XVR / NVR

Lorex

Amcrest

Reolink NVR

Uniview

Swann

Night Owl

Q-See

ANNKE

EZVIZ

Honeywell Performance

Bosch DIVAR

Panasonic WJ-NX series

Any DVR with HDMI out

“The whole argument is that 60 seconds from tile capture to delivered WhatsApp payload is the difference between a theft you watched happen and a theft you stopped. Every stage of the pipeline is engineered for that budget, because above it the approach window closes and the camera becomes a forensic tool instead of a preventive one.”

Our system deployment notes, 2026

Configuring the zone side to meet the latency budget

The pipeline only holds budget when the zones are scoped tightly and the dwell thresholds match the tap's real approach length. Loose zones produce noisy events. Over-long dwells push the event out of the window.

Per zone, in order

List every physical power access point (pedestals, panels, closets, 240V outlets, transformer pads)
Point an existing DVR camera at each access point or add one
Draw a tight polygon on the DVR tile around the access point itself, not the approach lane
Set dwell threshold between 8 and 30 seconds based on the real tap approach length for that zone
Set the armed window (usually overnight for exterior, 24-hour for unmanned interior rooms)
Verify DVR overlays (clock, name strip, channel bug, timecode ribbon) are in overlay_mask
Confirm end-to-end latency from zone entry to WhatsApp is under 60 seconds on the deployed unit
Define the intervention fork per zone: voice-down primary, dispatch fallback
Drill the fork with a non-theft test subject (a staff member) and time it to completion
Pair each event cluster with the next submeter or billing read for the corresponding circuit

What the operator actually does during the window

The payload does not close the loop by itself. It opens a narrow window for a human, a speaker, or a dispatcher to finish the job. Here are the three forks, played through at scale.

Inside the approach window

01 / 06

T+0 s · frame capture

Person crosses the exterior pedestal polygon. Camera exposes the frame. DVR mosaics it into the multiview.

The three things a defence attorney or a skeptical HOA will question

If an event ever leaves the property office and enters a dispute, three arguments will come up. Each has a clean rebuttal only because the payload is complete.

Challenge 1

DVR clock drift

“The DVR clock could be wrong.” Rebutted by the unit's own NTP-synced ISO 8601 timestamp, independent of the masked-out DVR on-screen clock.

Challenge 2

Watermark artefact

“The model just detected its own overlay.” Rebutted by the overlay_mask field listing which DVR elements were masked out before inference, plus the tile thumbnail showing a real human in the polygon.

Challenge 3

One-off, not intent

“Could have been a passer-by.” Rebutted by dwell_seconds on the specific event and by the cluster of events at the same polygon across the month, with consistent dwell and nighttime hours.

If the alert misses the approach window, the camera is a forensic tool, not a preventive one.

15-minute demo. We tap a running DVR's HDMI, draw a zone over a test power access point, time the whole pipeline, and show a real pre_action_zone_entry event hitting WhatsApp with a thumbnail and an ISO timestamp. End-to-end under 60 seconds, on the cameras you already own.

Book a demo →

Frequently asked questions

Why is power theft detection a latency problem, not an accuracy problem?

Because the window in which the theft is still interruptible is 60 to 180 seconds long. Watch any surveillance recording of power theft, from exterior pedestal taps to transformer-pad copper cuts to cord drops through a garage wall, and the segment from vehicle stop to tools touching power is almost always inside that band. If the detection arrives 30 seconds after the first frame of a person in the zone, a voice-down, a dispatched patrol, or a live operator call can still prevent the act. If it arrives 15 minutes later (the typical AMI smart-meter aggregation window) it cannot. The top-ranked academic papers on this keyword all score themselves on classification accuracy over smart-meter time series. None of them even report end-to-end latency, because smart-meter data physically cannot arrive inside the approach window. That is the unbridgeable gap between meter-side ML and camera-side zone events.

What exactly are our system's six latency stages and what does each cost?

Stage 1, camera frame capture: a frame is exposed and delivered to the DVR. Stage 2, DVR HDMI multiview: the DVR mosaics up to 25 tiles into one HDMI frame and drives the guard monitor. Stage 3, edge inference: our system taps the HDMI output, masks DVR chrome (clock digits, channel bug, camera-label strip) via the overlay_mask, and runs per-tile person detection locally. Stage 4, zone plus dwell pass: the detection is checked against the armed polygon and the dwell counter is incremented per inference tick while the subject remains in the zone. Stage 5, event queue: once dwell threshold trips, the payload is serialised (tile thumbnail, zone name, dwell_seconds, ISO 8601 timestamp, overlay_mask, latency_ms) and pushed to the event queue. Stage 6, WhatsApp delivery: the payload lands in the property's WhatsApp group or SMS fallback. The combined end-to-end path from tile capture to delivered message is under 60 seconds on properties running the shipped our system build.

Why can smart-meter ML not match this latency?

Three structural reasons. First, Advanced Metering Infrastructure (AMI) reads at 15-minute or hourly intervals. A consumption anomaly cannot appear on the dashboard until the next read cycle closes, which is typically multiple 15-minute buckets after the event so the averaging does not mute the signature. Second, the ML itself runs in batch over rolling windows of telemetry, not streaming. Third, and most importantly, the approach itself, the pre-tap phase in which the subject is standing at the pedestal or panel with tools in hand but has not yet drawn load, produces zero signal in any meter. No kWh has been consumed yet. There is nothing for the ML to classify. The camera zone event fires on the approach. The meter ML cannot fire until after the load has already been drawn.

What is pre_action_zone_entry and how is it different from motion detection?

pre_action_zone_entry is the event class our system emits when a human is detected inside an armed polygon and has dwelled past the configured threshold. Motion detection fires on any change in pixels anywhere in the frame and fires on wildlife, weather, light changes, and delivery vehicles. pre_action_zone_entry fires only when a person-class detection lands inside the polygon and persists for the dwell threshold, with DVR chrome masked out of the inference so clock digits and camera labels cannot produce false fires. The dwell_seconds integer is included in the payload so the operator receives not just the existence of an event but the elapsed standstill time, which is the single best indicator that the person is about to act rather than walking through.

What is the overlay_mask field and why is it in every payload?

DVR multiview output has several persistent overlays: a running digital clock, a channel-bug identifier in the corner, a camera-label text strip at the bottom of each tile, often a recording-active icon, and on some Hikvision and Dahua layouts a timecode ribbon across the top. Those overlays contain small, structured, high-contrast regions that general person-detection models can latch onto in unexpected ways. Our system masks each one out of the tile before running inference, and it records which overlays were masked in the overlay_mask field of every event payload. This serves two purposes: it prevents the detection from being triggered by its own watermark, and it pre-empts the courtroom question of whether the model was fooled by on-screen text. The same mask applies on Hikvision DS-7xxx, Dahua XVR, Lorex, Amcrest, Reolink, Uniview, Swann, and most rebranded recorders.

Can one unit really cover 25 cameras?

Yes, because the capture point is the DVR HDMI multiview output, which is already a mosaic of up to 25 tiles in 4x4, 5x5, 6x6, or 9x9 standard layouts. That single HDMI frame is the input. The per-tile inference runs on every tile in every frame, and every tile carries its own armed polygon and dwell counter. One unit tapped off one DVR covers the full camera count of that recorder. A 400-unit garden property with two 16-channel recorders uses two units and gets 32 independent zone contexts. There is no camera-firmware coupling and no ONVIF negotiation because the HDMI multiview is already composited by the DVR itself.

What does a real pre_action_zone_entry payload look like?

event.class = pre_action_zone_entry, property identifier, tile.label (camera name from DVR strip, also masked from inference), zone name (for power theft, typically exterior_pedestal_bay_N, service_panel_room, transformer_pad_side, genset_fuel_cap, ev_stall_240v, submeter_closet), zone.armed_window, dwell_seconds (integer, increments per inference tick), timestamp in ISO 8601 with timezone offset, tile thumbnail URL (480x270 crop), overlay_mask identifier (which DVR layout was applied), delivery channel (WhatsApp group or SMS), and latency_ms (capture to delivery). The payload is deliberately minimal and machine-parseable so it can drop into a case file, a utility investigation, or an ops-automation pipeline without re-parsing.

What is the intervention loop once the payload arrives?

Three forks, each designed to complete inside the approach window. Fork one, live voice-down: the operator opens a two-way audio channel through the camera or a nearby speaker and speaks to the subject (many thefts end here). Fork two, dispatch: the operator calls the on-duty responder or the local non-emergency police line with the tile thumbnail and zone name. Fork three, preventive signalling: lights come on, a siren arms, the front-gate access is flagged. The key timing constraint: each fork has to start inside the remaining portion of the 60 to 180 second window. With a 60-second detection budget, that leaves anywhere from zero to two minutes of intervention time before the physical act begins.

Does this work on analog cameras and older DVRs?

Yes. The HDMI multiview is the standardising layer. Whether the recorder is a 2018 Hikvision DS-7216 driving sixteen analog 1080p cameras or a 2024 Reolink NVR driving 4K IP cameras, the HDMI output to the guard monitor is a uniform composite that our system taps into. Physical install is under 2 minutes: HDMI in from the DVR, HDMI out to the monitor (passthrough), network, power. No camera firmware work, no credential exchange, no ONVIF plumbing.

Where do the zones actually go for power theft specifically?

Six archetypes cover most deployments. Exterior 30A or 50A pedestals for pool or courtyard or RV hookups. Service panel room doors. Submeter closet doors. House-panel common-area outlets in hallways, mechanical rooms, trash rooms, and garages. Laundry 240V outlets (used after-hours as ad-hoc Level 2 EV charging). Transformer-pad and meter-can exteriors for utility-side taps and copper pulls. Each zone is a tight polygon drawn on the DVR tile, with an armed window (often overnight, sometimes 24-hour for interior rooms), and a dwell threshold typically between 8 and 30 seconds depending on how long the approach phase of that specific tap takes.

Can the ML on smart meters contribute anything at all?

Yes, as the second half of the evidence pair. After the camera fires and the operator intervenes or records, the meter eventually closes a read cycle and reports its kWh delta. That delta, paired with the timestamped zone event, becomes the two-source chain that converts an attempted or completed theft into a recoverable and prosecutable case. Neither half alone is enough: the meter delta does not name the actor, and the camera event does not quantify the loss. Together they form a case file. The camera fires first in time, the meter closes the record second. Think of meter-side ML as the accountant and camera-side zone events as the dispatcher: both are useful, but only the dispatcher can stop the act.

What about false positives from legitimate workers?

Armed windows and zone specificity cut this to near-zero in practice. A submeter closet at 02:14 on a Tuesday is not a legitimate-worker context. A service panel room during business hours on a weekday is. Properties configure the armed_window on each zone individually: after-hours only for exterior pedestals, 24-hour for an unmanned transformer pad, business-hours masked for the electrical room if maintenance is routine there. The dwell threshold also filters out pass-throughs. A cleaner walking past the submeter closet to the mop sink does not dwell long enough to fire. A person standing in front of the submeter cabinet for 14 seconds does.

Said plainly

The academic literature on power theft detection is a real, productive field. Smart-meter ML models work. Random forests, gradient boosting, deep CNNs, LSTMs over AMI time series all produce useful classifications for a utility trying to reconcile billed consumption with measured consumption.

None of that helps when the job is stopping a specific act at a specific place on a specific night. The 15-minute AMI interval is a structural floor on meter-side latency. The approach window is 60 to 180 seconds long. The numbers do not line up.

The camera-side event is what fits. A pre_action_zone_entry payload with a tile thumbnail, a zone name, an integer dwell counter, an ISO 8601 timestamp, and an overlay_mask line is the record that arrives inside the window, names the actor, and survives the later dispute. The meter closes the loop afterward. The camera is what opens it.

Under 0s

Tile capture to WhatsApp, the latency budget the entire pipeline is built to hold.