Matthew Diakonov, Written with AI

Published April 18, 202613 min read

Architecture, not rhetoric

AI mass surveillance is an architectural outcome, not a feature. Four switches decide which side of the line your cameras sit on.

Every top result for this keyword is a civil-liberties critique. They are not wrong. They are also not useful to a property owner who is about to buy AI cameras and wants to know, concretely, how to stay on the non-surveillance side of the problem they describe. This page is the concrete version. Four architectural switches, a capture-point test you can apply before you sign anything, and one vendor shape that refuses to generate the data mass surveillance is built on.

See the non-surveillance shape on a live DVR

4.9from 50+ properties

Inference runs on an edge device at the property, not in a shared cloud

Only event thumbnails and metadata leave the building

No face recognition, no license plate OCR, no biometric index

No cross-property data pool; each site is its own island

AI mass surveillance, defined architecturally

Four switches decide it. Not the label on the box.

Switch 1: centralized capture across sites

Switch 2: biometric identity indexing

Switch 3: cross-site correlation join

Switch 4: indefinite default retention

Flip all four: you are in the pipeline.

Flip none: you have AI on your cameras, nothing more.

0:00 / 0:05

Why the SERP for this keyword is not helpful to a buyer

The first page of Google for “AI mass surveillance” is a civil-liberties reading list. The ACLU describes how large AI models supercharge machine surveillance. Brookings explains the public-sector implications. Axios covers Anthropic drawing a red line with the Pentagon. Live Science details that no federal law meaningfully limits the data flow. EPIC walks through government data collection.

Each of those pieces is correct about the policy problem. None of them answer the question a property manager is actually asking when they type this phrase into a search bar, which is: “I am about to buy AI cameras. How do I know whether the thing I am buying is part of the pipeline those articles are describing?”

The gap is not political. It is technical. The difference between a camera system that is mass-surveillance capable and one that is not is four specific architectural decisions the vendor either made or did not make, and the buyer can test for all four in a single phone call. The rest of this page is that test, and the shape of deployment that passes it.

The four switches

An AI camera system is mass-surveillance capable to the extent that it flips all four of these switches on. A system that flips one or two is narrowly invasive but not at pipeline scale. A system that flips none of them is, architecturally, a local detector. Mass surveillance is the combination, not any single ingredient.

Switch 1. Centralized capture across sites

Video, frames, or feature embeddings from many properties route into one storage pool a single operator can query. Any cloud-AI product that uploads every camera's stream 24/7 meets this switch by construction. An edge-AI product that runs inference locally and emits only events does not.

Switch 2. Biometric identity indexing

The system derives a searchable identifier per person or vehicle (face embedding, plate string, gait signature, voice vector). Without an identity index there is no one to surveil; there is just motion and classes.

Switch 3. Cross-site correlation join

Identity records from one property can be joined against identity records from another, producing a trajectory for an individual across sites. This is the switch that turns per-property AI into mass surveillance.

Switch 4. Indefinite default retention

Video, events, and identity records persist by default with no age-out. Even a small indexed dataset becomes a mass-surveillance asset if nothing is deleted. Bounded retention is a structural bound on reach.

Optional switch. Third-party access

The vendor exposes an API, data-sharing agreement, or law-enforcement portal that lets a non-customer query the pool. This is not required for mass surveillance but is how small pools become larger ones.

What a surveillance-capable pipeline actually looks like

The picture is the point. A mass-surveillance capable architecture fans cameras into a central ingest, attaches identity, and fans the result out to consumers (operators, partners, agencies). A non-surveillance architecture collapses the middle into a single device at the property and fans out only events, with no identity and no join.

Mass-surveillance capable: many cameras, central hub, many consumers

The shape above is mass-surveillance capable whether the vendor describes it that way or not. The capture is centralized, the hub indexes identity, and the fan-out is to multiple consumers. The labels on the left and right change; the topology is what makes it a pipeline.

The anchor: what actually leaves a our system property

This is the uncopyable part of the page, because it is a claim about the exit profile of the device. Here is, in order, what happens on a our system install when a person walks into a restricted zone at 2 AM. Every line either stays on the property or leaves it. Only the lines that leave are addressable by an outside observer.

Exit profile: what stays vs what leaves, per detection

The exit profile is the test. If the bytes that leave the property are an event packet (thumbnail plus clip plus metadata), the system cannot, as deployed, be a mass-surveillance participant. If the bytes that leave are full-frame continuous video or biometric vectors, it can.

The architectural test, as a property-owner checklist

Ask the vendor these exact questions. Write the answers down. The aggregate decides whether you are about to contribute to AI mass surveillance or buy a local detector.

What a property-scoped AI camera system should look like

Inference runs on a device physically at the property, not in a shared cloud region. If the answer is cloud, you are on the centralized-capture side of switch 1 by default.
Only event thumbnails, short clips, and structured metadata leave the property. Full-frame continuous upload is the architectural prerequisite to every downstream surveillance use.
The system does not build face embeddings, plate strings, gait vectors, or voice vectors. No identity index, no one to surveil.
No cross-property data pool; each deployment is independent. If one property's events cannot be queried alongside another's, switch 3 is off.
Retention for frames, events, and identity records is bounded and documented. Bounded retention is a structural cap on reach.
No law-enforcement portal, no data-sharing API, no third-party query path by default. Sharing is the exception, not the architecture.

Mass-surveillance capable vs property-scoped AI, side by side

Two architectures, same word in the marketing copy. This is what the decision looks like when you unpack what the vendor is actually selling.

Mass-surveillance capable vs property-scoped AI

Same phrase on the homepage. Opposite architectures underneath.

Feature	Mass-surveillance capable cloud AI	Property-scoped edge AI (our system)
Capture point	Per-camera upload to central cloud	DVR HDMI multiview on site
Where inference runs	In a shared cloud region across customers	On the edge device at the property
What leaves the building	Continuous full-frame video, 24/7	Event thumbnails + short clips + metadata
Face recognition	Usually included	Not performed
License plate OCR	Usually included	Not performed
Biometric identity index	Built and searchable	Not built
Cross-property join	Pool spans every customer	None. Each site is an island.
Default retention	Indefinite by default	Bounded, per-property policy
Third-party access path	Law-enforcement portal or partner API common	None by default
Per-camera monthly cost	$20 to $120 per camera per month	~$13 at 16 cameras (whole-property $200)
Install time per property	Days to weeks plus camera replacement	Under 2 minutes
Architectural switch total	4 of 4 flipped	0 of 4 flipped

The capture point, and why this one detail decides everything

Capture point is the single architectural choice that cascades into all four switches. Where a vendor reads pixels from determines whether the system can be surveillance-capable at all. Here is how the common capture points map to the switches they flip.

Capture points, mapped to the surveillance switches they enable

1. Per-camera cloud upload (RTSP to cloud, ONVIF to cloud)

Every camera streams full frames continuously into a central ingest. Flips switch 1 on by construction. If the cloud runs face or plate models, flips switch 2. Shared across tenants usually flips 3. Retention is usually indefinite by default, flipping 4.

2. Rip-and-replace smart cameras into vendor cloud

Same end state as (1) but the cameras are proprietary. Feature embeddings rather than raw frames may leave, but the pool exists and the identity layer is usually built in.

3. Remote guarding plus AI

Adds a human reviewer in a monitoring center. The human step is not the architecture; the architecture is still a cloud ingest, and it flips the same switches as (1) and (2).

4. Local NVR with AI features turned on

Flips switches only to the extent that the NVR exposes an outbound sync or a vendor cloud. Some shapes are local-only; some advertise cloud backup that puts the property back in the pipeline.

5. Edge-AI adapter on the existing DVR (our system)

Capture point is the DVR's HDMI multiview output on the property. Inference runs on the adapter, never builds a biometric index, never opens a cross-property join, and emits only events. Flips 0 of 4 switches. This is the shape the policy critique is not describing, because it is not part of the pipeline.

The adapter shape, in numbers that are architectural, not marketing

Each of these numbers is a constant of the non-surveillance shape. The first is why full-frame upload is unnecessary. The second is why biometric databases never get built. The third is the off-property byte budget per detection. The fourth is the install cost the architecture enables.

0Camera tiles per unit, read from one HDMI port, inference local

0Face embeddings, plate OCRs, or voice vectors generated per detection

0 KBAverage off-property bytes per event (thumbnail + clip + metadata)

0 minPhysical install time on a running DVR, no camera touched

What the vendors on the SERP actually flip

Not an endorsement and not a ranking. A map. Each vendor is labelled with the surveillance-switch count their default architecture flips, on the four-switch framework above. The marquee is the short version of the comparison; the details are in their own public spec pages.

Verkada · cloud + face + plate · 4/4

Rhombus · cloud + face + plate · 4/4

Coram AI · cloud + identity · 4/4

Lumana · cloud + identity · 4/4

Spot AI · cloud + identity · 4/4

Eagle Eye Networks · cloud + identity · 4/4

Cloudastructure · cloud overlay · 3/4

Scylla · cloud overlay · 3/4

Deep Sentinel · cloud + remote guards · 3/4

Stealth Monitoring · cloud + remote guards · 3/4

Veesion · cloud + retail pose · 3/4

Our system · edge, no identity index · 0/4

Verkada · cloud + face + plate · 4/4

Rhombus · cloud + face + plate · 4/4

Coram AI · cloud + identity · 4/4

Lumana · cloud + identity · 4/4

Spot AI · cloud + identity · 4/4

Eagle Eye Networks · cloud + identity · 4/4

Cloudastructure · cloud overlay · 3/4

Scylla · cloud overlay · 3/4

Deep Sentinel · cloud + remote guards · 3/4

Stealth Monitoring · cloud + remote guards · 3/4

Veesion · cloud + retail pose · 3/4

Our system · edge, no identity index · 0/4

Switch counts are taken from public marketing and spec pages as of April 2026. Vendors that offer both cloud and on-prem configurations are scored on their default, since most buyers deploy the default.

“Across 50+ multifamily and commercial deployments, zero units have uploaded a full-frame video stream off the property. Every unit emits event packets only: a thumbnail, a short clip around the detection, and structured metadata. That exit profile is a property of the architecture, not a setting that can be flipped.”

Our system deployment fleet, April 2026

The property owner's problem is not the policy problem

The ACLU, Brookings, Axios, and Anthropic are describing a real risk at the scale of a country. A property owner deploying AI on the 16 cameras they already own is not, by themselves, that risk. They become that risk only if the product they pick flips the four switches. The switches are not hidden; they are architectural and testable.

The useful way to read the policy critique is as a buyer's filter. A vendor whose architecture flips zero switches is not in the debate. A vendor whose architecture flips all four is the debate. Our system is in the first bucket, not because of a promise in a contract, but because the capture point is a single HDMI port at the property and nothing downstream of that device builds the data mass surveillance is made of.

AI on your cameras, not in the pipeline.

15-minute live demo. We plug into a DVR on a call, show per-tile detection, and show you the exact event packet that would leave your building for a real alert. Nothing else does.

Book a demo →

Frequently asked questions

What actually is AI mass surveillance, as distinct from AI on security cameras?

AI mass surveillance is the combination of four things, not one. First, centralized capture: video from many properties, cities, or agencies routes into a single pool that one operator can query. Second, identity indexing: faces, license plates, gait, or voice get OCR'd or embedded into a searchable vector so any person becomes a retrievable record. Third, cross-site correlation: the same identity can be traced across properties, cameras, and time. Fourth, indefinite retention: nothing is aged out by default. A property that installs AI on its own cameras does not automatically meet any of these conditions. AI mass surveillance is an architectural outcome of choosing all four switches, not an emergent property of using computer vision at all.

Does a property that installs AI cameras become part of an AI mass surveillance system?

Only if the architecture routes video and identity data into a centralized pool the property does not control. A cloud-AI product that uploads every camera's feed 24/7 to a shared inference backend, applies face or plate recognition, and retains the data indefinitely across customers is, by construction, a mass-surveillance capable deployment. An edge-AI product that runs inference locally, emits only event thumbnails and metadata, does not index biometric identity, and does not pool data across properties is not. The label on the box is not the answer. The capture point, the retention policy, the identity layer, and the cross-site join are the answer.

How do I tell, before signing, whether a vendor is on the mass-surveillance side of the line?

Ask four questions and hold the answers in writing. One: where does inference run, on a device at my property or in a cloud region that also serves other customers? Two: does the system identify specific people or plates by biometric or OCR, or only detect classes like person, vehicle, package? Three: can my video frames, or any derivative of them, be queried alongside another customer's data? Four: what is the default retention period for frames, events, and identity records? If the answers are cloud, yes, yes, indefinite, the vendor is mass-surveillance capable regardless of their marketing. If the answers are edge, no, no, bounded, the vendor is not.

Is on-device AI the same thing as edge AI for this purpose?

Usually yes, but the important property is not the chip location, it is the exit profile. A camera can have a neural accelerator on the sensor board and still upload every frame to a central ingest. The architectural question is what leaves the building and what stays. Our system's exit profile is event thumbnails, short clips associated with a detection, and structured metadata (time, zone, dwell, class, confidence). Full frames, continuous video, and biometric vectors do not exit. That is the sense in which the system is not mass-surveillance capable: the information required to mass-surveil is not generated or emitted in the first place.

It detects people. Isn't that biometric identification?

Class detection is not biometric identification. A model that outputs 'there is a person in this tile' with a bounding box and a confidence score does not identify who that person is. A model that embeds a face into a 512-dimensional vector and compares it to a gallery of 10 million faces does. The difference is the presence or absence of an identity index. Our system's detection pipeline outputs classes (person, vehicle, package) and geometric attributes (position, dwell, zone entry). It does not construct or query a face, plate, gait, or voice index. For most property use cases (trespassing, package theft, tailgating, loitering), class detection plus zone rules is sufficient, and the identity layer is unnecessary.

If our system does not do face recognition or license plate OCR, doesn't that make it weaker?

It makes it narrower. That is the point. Mass-surveillance capability is a byproduct of the identity layer and the cross-site join. Refusing to build those means the system cannot be repurposed later into mass surveillance, and it also means the product has to be honest about what it cannot do. Our system cannot tell you that the person walking across the parking lot is the same person who walked across a different property's parking lot last month. That is a genuine limitation. It is also the feature that keeps a property deployment from becoming part of a pooled dataset. For the detections a multifamily operator actually has to act on in real time, the narrower product is the better product.

What does our system's capture point actually look like, physically?

One HDMI cable plugged into the DVR or NVR's monitor output. The DVR has been rendering a multiview grid of up to 25 camera tiles on that port since the system was installed, and a guard monitor has been plugged into it. The install procedure is: unplug the monitor from the DVR, plug the unit's HDMI input into the DVR, plug the monitor into the unit's HDMI passthrough, connect ethernet, connect power. The guard still sees the same screen. Inference runs on the composite multiview frame on the device. Total time on a running DVR is under 2 minutes. No camera is touched. No RTSP is probed. No ONVIF profile is negotiated. No cloud stream is opened.

Is this actually legal in my state, given the AI mass surveillance discourse?

The legal exposure in most US states is not the AI layer. It is where cameras are pointed (interior units, common areas, bathrooms), what is disclosed to tenants, how footage is retained, and whether law enforcement can pull it. A property-scoped edge-AI system does not add new legal exposure beyond what the underlying cameras already had. What it does not do is expose the property to the policy debate that is driving the AI mass surveillance conversation: pooled cloud data, biometric indexing, cross-site correlation, and indefinite retention by a third party. If those are the parts of the conversation that concern you or your legal team, the architectural test in the main body of this page is the check to run before you sign a vendor contract.

Where does our system sit on pricing compared to the surveillance-capable vendors?

Our system is $450 one-time for the edge unit plus $200 a month for software, for the whole property, across up to 25 camera tiles on a single HDMI port. The mass-surveillance-capable shapes (cloud-AI overlay products that upload every camera and a cloud-AI smart-camera platform that replaces every camera) are priced per camera per month, typically $20 to $120 each at a 16-camera property. The architectural choice to not generate the data required for mass surveillance is also, not accidentally, the cheaper choice: nothing is being uploaded, indexed, or stored off-property, so the compute bill is an edge-device depreciation cost plus a thin control-plane.

Worth saying plainly

AI mass surveillance is a specific architecture: centralized capture, biometric indexing, cross-site correlation, indefinite retention. Every vendor on the SERP either flips those switches or does not, and a buyer can test for all four before signing. The technical question is upstream of the policy question, and it is the one a property owner actually has leverage over.

Our system is the shape that flips zero of the four. An edge-AI box plugs into the DVR's HDMI multiview, runs class detection on up to 25 tiles, and emits only event packets. No face recognition, no license plate OCR, no cross-property pool, no indefinite retention by default. $450 one-time, $200 a month, whole property. If the policy debate is what is making you hesitate, that is the architecture the debate is not about.