M
Matthew Diakonov
12 min read
For property ops, security integrators, and on-call managers

An intent alert is the second step. Most AI camera pipelines never reach it.

"Person detected at camera 4" is an object alert. It tells you what was seen and stops there. The phone buzzes for the maintenance vendor at 11 AM, for the tenant returning from the gym at 1 AM, for the FedEx driver at 3 PM, and after about two weeks the operator stops looking at it.

An intent alert decides whether what was seen is worth interrupting anyone for. It runs a six-input decision (class, zone, time, dwell, badge, posture), routes the event to LOW THREAT (a daily digest) or HIGH THREAT (a phone call to on-call ops), and silences the rest. This page walks through how that second step is structured, what each of the six inputs costs, and where the failure modes are.

Direct answer, verified 2026-05-06

AI security camera intent alerts classify the inferred threat level of a detected behavior as either LOW THREAT (logged to a daily digest) or HIGH THREAT (paged to on-call ops with a 10-second clip), based on six inputs computed at the frame the detection fires on: object class, zone, time window, dwell duration, badge state, and posture. Pipelines that stop at object class produce alert fatigue by week three. Pipelines that route through the intent classifier fire roughly 50 to 100 times less often, and the events that survive are the ones the operator actually wants to see.

Two versions of the same detection

The toggle below shows what the alerting layer emits for the exact same object detection. The object-only version is what most pipelines ship. The intent-classified version is what an operator actually wants on their phone.

Object alert vs. intent alert, on the same person at the same camera

Detection: class person, confidence 0.91, camera mailroom_north. The alerting layer fires every time the detector exceeds the confidence threshold and a person enters frame. The operator sees the same event whether it is a tenant collecting a package at noon, a delivery driver during the configured delivery window, or an unknown person at the parcel shelf at 02:00.

  • One alert per person frame above threshold
  • No zone, no time, no dwell, no badge, no posture
  • Fires for every legitimate visit and every theft attempt at identical priority
  • Operator filters by hand, then stops filtering by week three
  • Recall is fine, signal-to-noise is the problem

The six inputs the intent classifier actually reads

Five of the six are computed for free on top of detection. The sixth costs one extra inference per detected person. None of them require new hardware or new cameras. Here is what each one is and where it comes from.

Input 1, Object class

What was seen.

Person, vehicle, animal, environmental motion (rain, leaves, headlight sweep, shadow shift), or camera artifact. Comes straight from the per-frame detector. Animal and environmental classes get silenced outright at most zones. Person and vehicle continue to the rest of the pipeline.

Input 2, Zone

Where on the property the bounding box centroid landed.

Drawn once at install time as polygons on the multiview tile grid: lobby, mailroom, parcel shelf, rear gate, loading bay, stairwell, parking, perimeter. Cheap to compute (one centroid plus a polygon test). The zone is what turns "a person was here" into "a person was at the parcel shelf".

Input 3, Time window

Which slot of the property's schedule the detection sits in.

Property-level config. Typical windows: business hours, delivery hours, after-hours, weekend, holiday. A person at the parcel shelf inside the delivery window is normal. The same person an hour after that window closes is HIGH THREAT.

Input 4, Dwell duration

How long the tracked object stayed in the zone.

Accumulated across frames against a per-zone threshold (15 seconds in a stairwell, 30 to 60 seconds in a parking area). A person walking through a parking lot in 8 seconds is not loitering. A person dwelling 90 seconds at the rear gate on a 30-second threshold is HIGH. Dwell is what catches the slow approach the DVR motion engine missed.

Input 5, Badge state

Did the access control system identify a known badge nearby?

Optional, only present if there is an access control feed. Values: tenant, vendor, unknown, n_a. A person at the rear gate with a tenant badge state is dropped. The same person with badge state unknown is LOW or HIGH depending on dwell. If you do not have access control, this input is absent and the classifier proceeds without it.

Input 6, Posture

What the body looks like inside the bounding box.

A small classifier over the cropped detection. Outputs: walking, leaning, crouching, hands-on-door, masked. The expensive input of the six (one extra inference per detected person), but the only one that catches a tampering posture in real time. Hands-on-door at a non-public entry is an HIGH THREAT override regardless of the rest of the pipeline.

The intent classifier as actual code

Below is the shape of the intent decision the production classifier runs for each detection. Real source is more careful about edge cases (track stitching, dwell ring buffers, time zone handling), but the routing logic looks like this. The four sections inside classify() are the only thing that decides whether your phone rings.

intent-classifier.ts

Section 1 silences noise classes outright. Section 2 lets posture override the rest of the pipeline (a hands-on-door posture at a non-public entry ignores the time window). Section 3 is the class plus zone plus time fire table. Section 4 is the dwell escalation: a person who dwells past 2x the zone's threshold gets promoted from LOW to HIGH automatically. The intent layer is small on purpose. Most of the work is in the configuration (zones, time windows, dwell thresholds), not in the classifier.

How a single detection becomes an intent alert

The pipeline below is one detection from frame to phone. There are seven hops; only the last two are user visible. Latency end to end is sub-second on the inference side, and one to three seconds on the messaging hop.

Intent alert flow, single detection

CameraRecorderEdge unitIntent classifierPagerOn-call opsvideo over coax / IPHDMI multiview tile gridobject detection per tiletrack + dwell + posture inferenceDetection {class, zone, time, dwell, badge, posture}decide: drop / low / highif HIGH: 10s clip + SMS + voice callphone rings, thumbnail in chat threadif LOW: append to daily digest

Note that the recorder's own motion engine is not in this diagram. The intent classifier reads the multiview the recorder draws to its monitor, runs detection on the composite, and emits its own events. The recorder keeps recording on its own retention schedule.

What the intent classifier emits, traced from a real ops session

The trace below is paraphrased from the intent classifier running at a 180-unit Class C multifamily property in Fort Worth. Thirty minutes, roughly 40 detections from the per-frame detector. Two HIGH lines is what the operator's phone actually buzzed on. Everything else was silenced or logged to the digest.

cyrano-intent /trace last 30m

The two HIGH lines are the only events the operator's phone rang on. Everything else either silenced (drop) or logged to LOW for the digest. None of these decisions are expressible on the recorder's motion stream alone, because the recorder does not know zone, time window, dwell, badge state, or posture.

Where the intent classifier can be wrong

An intent classifier is a routing decision over a stream of detections, and like any router it can mis-route. The two failure modes are not symmetric and they hurt in different ways.

  1. False HIGH (page on a normal event). The painful one for the operator. Most common causes: a stale time window (delivery hours got moved but config did not), a missing known-vendor entry, or a posture classifier confused by an unusual carry (a tenant carrying a guitar case looks geometrically similar to hands-on-door). Tunable by reviewing the last 30 days of HIGH pages and adding a known-vendor or known-badge entry for each repeated false-HIGH source. Target: under 1 false-HIGH per camera per week.
  2. False LOW (real event lands in the digest). The dangerous one because it shows up only in the after-the-fact review. Most common causes: a zone polygon drawn too tight (the bounding box centroid falls outside the zone even though the body is inside), a dwell threshold set too high for a stairwell, or a posture classifier that missed a masked or crouched posture in low light. Tunable by reviewing the daily digest each Monday and promoting any incident that should have paged into a tightened rule.

A working intent layer is one where the false-HIGH rate sits below the operator's tolerance for staying engaged and the false-LOW rate is audited weekly. Neither number is zero. Calling either zero is the lie an object-only pipeline tells when it ships without intent classification at all.

Want the intent classifier on your existing recorder?

15-minute call. We will walk through the six-input decision tree on a recording from your own building, show what would have paged versus what would have been silenced, and what install looks like (one HDMI cable, two minutes on site).

Frequently asked questions

What is an intent alert on an AI security camera?

An intent alert is a notification that fires only when an observed behavior is classified as either LOW THREAT or HIGH THREAT, where the classification is computed from six inputs at the moment of detection: object class (person, vehicle, animal), the zone the detection landed in (parcel shelf, lobby, rear gate, loading bay, stairwell), the time window (business hours, delivery hours, after-hours), the dwell duration (how long the tracked object stayed), the badge state (known tenant or vendor versus unknown), and the posture (walking through versus tampering at a door). A 'person detected' message is not an intent alert. A 'person at the parcel shelf, after the delivery window, dwell 18 seconds, no badge, posture upright' is. The intent layer is the difference between getting paged and getting paged at the right moment.

Why do most AI camera alert pipelines fail at month two?

Because they stop at object detection. The vendor ships a model that fires every time a person enters frame, the operator gets 200 alerts in the first week, half are tenants going to do laundry, the operator stops checking by week three, and the system silently degrades into a recording archive. Adding an intent layer is what fixes this. Once the same detection runs through a class plus zone plus time plus dwell plus badge plus posture filter, the volume drops by roughly 50 to 100 times and the events that survive are the ones the operator actually wants to see. The product config we run on this site has a single line that summarizes it: alerts that earn the page versus alerts that train the operator to ignore them.

How is intent different from object class?

Object class answers what was seen. Intent answers whether it matters. A vehicle in the loading bay is a class. A vehicle in the loading bay at 03:00 with a 4-minute dwell and no scheduled vendor is intent. A person in the lobby is a class. A person at the rear gate, badge unknown, posture leaning into the lock plate, dwell 12 seconds, is intent. The intent classifier never overrides the object detector, it routes its output. If you have ever seen an AI camera page on a delivery driver dropping a package during the configured delivery window, you have seen what happens when there is no intent layer.

What inputs does the intent classifier actually read at decision time?

Six. (1) The object class from the per-frame detector (person, vehicle, animal, environmental, artifact). (2) The zone the bounding box centroid lands in, computed against zone polygons drawn at install time (lobby, mailroom, parcel shelf, rear gate, loading bay, stairwell, parking, perimeter). (3) The time window, from a property-level config (business hours, delivery hours, after-hours, weekend, holiday). (4) The dwell duration, accumulated across frames against a per-zone dwell threshold (typically 15 seconds for a stairwell, 30 to 60 seconds for a parking area). (5) The badge state from the access control feed if one is wired in (known tenant, known vendor, unknown). (6) The posture, derived from a small classifier over the bounding-box crop (upright walking, leaning, crouching, hands-on-door, masked). Five of the six are zero-cost on top of the existing detection. The sixth (posture) costs one extra inference per detected person.

What does LOW THREAT mean in practice, and where do those events go?

LOW THREAT is the bucket for events that the operator should know happened but does not need to interrupt anything for. Examples: an unknown person walking through the parking lot during a normal commute window, a vehicle in the rear lot at 04:00 that is plausibly a delivery, a tenant returning home through the mailroom at midnight without a badge swipe (badge battery dead). LOW THREAT events land in a daily digest a regional manager scrolls once or twice a day. They do not page. They do not SMS. They get reviewed in batch. If a pattern shows up across the digest (same unknown person, three days running), it gets promoted to a rule by hand. This is where 90 to 95 percent of total event volume sits on a typical multifamily property.

What does HIGH THREAT mean, and what triggers a phone call?

HIGH THREAT is the bucket the operator phone rings on. The trigger pattern is at least one of: (1) class plus zone plus time match a configured fire rule, for example 'person at parcel shelf after the delivery window' or 'vehicle in loading bay outside business hours'; (2) posture is on the elevated list (hands-on-door at a non-public entry, crouched at a window, masked at the gate); (3) dwell exceeds twice the configured threshold for the zone (a person standing at the rear gate for 90 seconds on a 30-second threshold); (4) two or more LOW THREAT events from the same track chain into a HIGH (an unknown person crossed the perimeter then dwelled at the stairwell). The page is an SMS plus a voice call to the on-call manager with a 10-second clip cropped from the triggering camera.

Can an intent alert be wrong, and what is the failure mode?

Yes, and it is worth being honest about which failure mode is which. False HIGH (a normal event that pages) is the painful one for the operator. The most common cause is a stale time window (delivery hours that changed but the config did not), a missing known-vendor entry, or a posture classifier confused by an unusual carry (a tenant carrying a guitar case looks geometrically similar to a person leaning into a door). False LOW (a real event that landed in the digest instead of the call) is the more dangerous one because it shows up only in the after-the-fact review. The most common cause is a zone polygon that is drawn too tight, so the bounding box centroid falls outside the zone even though the body is in it. Both are tunable. The right metric to track is the seven-day rolling false-HIGH rate; below 1 per camera per week is the tolerance most ops teams hold for staying engaged with the stream.

Does the intent classifier need cloud compute to run?

No. The whole pipeline is small enough to run on the same on-device accelerator that does the per-frame detection. A typical edge unit handles up to 25 camera tiles in parallel, runs object detection at composite frame rate, accumulates dwell across a small ring buffer, looks up zone, time, and badge state from local config, runs posture inference on the bounding-box crops that matter, and emits the classified event in well under a second end to end. The reason this matters for property operators: zero footage leaves the building, the alert latency does not depend on the cellular uplink, and the cost is a one-time hardware spend plus a flat monthly software fee, not a per-camera cloud inference bill. The Cyrano edge unit specifically is $450 one-time and $200 per month from month two, plugs into the recorder over HDMI, and supports up to 25 feeds.

Can I retrofit intent alerts onto an existing DVR or NVR without replacing cameras?

Yes. The recorder is already drawing every camera onto a multiview composite for the wall display. An edge unit reads that HDMI output, splits it into per-tile crops using a fixed tile-grid template captured at install (2x2, 3x3, 4x4, or 5x5), and runs the detection plus intent stack on each tile. Cameras stay. Coax stays. Recorder stays. The only new piece of hardware is the edge box, and it plugs in with one HDMI cable. Install on site is roughly two minutes. The intent layer is purely software on top of the detection layer the box was going to run anyway.

How do I know if my current 'AI alerts' are actually intent alerts or just object alerts?

Three checks. First, look at one alert and ask whether it tells you both what happened and why it was worth interrupting you. If it just names a class (person, vehicle), it is object only. Second, count the events in the last seven days. If the count is above 100 per camera per week and most are tenants doing normal things, the system is firing on class without an intent filter. Third, ask the vendor what inputs the alerting layer reads. If the answer is only 'a confidence threshold on the detector', you have object alerts. If the answer enumerates zone, time, dwell, badge, or posture, you have intent alerts. Pipelines that skip the intent step look fine on a demo and degrade by month two.

🛡️CyranoEdge AI Security for Apartments
© 2026 Cyrano. All rights reserved.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.