"Unknown object" is not a classification — it's an admission. It tells the path planner that the perception system detected something but couldn't make a confident determination about what it is, how fast it's moving, or whether it presents an immediate hazard. The conservative response to that admission is a full stop, which is correct from a safety standpoint. But in a working warehouse with 10–15 robots sharing the same aisles with human pickers, errant carts, dropped packaging, stretch wrap on the floor, and partial LiDAR returns off glossy surfaces — if every ambiguous detection triggers a stop, you've built a system that's safe and useless.
We've spent a lot of engineering time on the calibration problem: how to give the pipeline enough classification confidence to make intelligent responses to real obstacles, without creating false-positive detection chains that bring throughput to a crawl. Here's what we've learned.
Why "Unknown" Is Overused in Practice
The unknown classification is overused for two distinct reasons, and the fix is different for each.
Reason 1: under-represented training classes. Most AMR perception models are trained heavily on the high-frequency objects — people, forklifts, stationary rack structures. But warehouses contain a long tail of lower-frequency objects: pallet jacks, hand trucks, empty slip sheets, cardboard bale segments, mezzanine support columns at unusual angles, autonomous tuggers from different manufacturers, mobile workstations, maintenance equipment. When the model encounters these objects, confidence on all known classes drops below threshold and the detection falls into the "unknown" bucket.
Reason 2: LiDAR return artifacts creating partial point sets. Glossy floor surfaces, safety netting, transparent objects, and objects with complex geometry all create partial or spurious point sets that don't cleanly match any trained class signature. A pallet wrap semi-transparent to LiDAR can produce a point set that looks like "something" at certain approach angles and nothing at others. The classifier sees incomplete evidence and — correctly — can't make a high-confidence assignment.
The two reasons call for different engineering responses. Reason 1 is a training data problem: you need more labeled examples of the underrepresented classes. Reason 2 is a modeling problem: you need a classifier architecture that can reason about partial evidence rather than requiring a full signature match.
The Classification Tier Approach
The highest-leverage change we've made to classification handling is replacing the binary "known/unknown" output with a four-tier output schema:
- Tier 1 — High-confidence classified: Confidence > 0.85 on a known class. Normal response: treat as the assigned class (e.g., "person at 3.2m, moving at 0.8 m/s eastward").
- Tier 2 — Low-confidence classified: Best-match confidence 0.50–0.85. Something was detected and the most likely class is known, but certainty is low. Response: apply classified-class behavior with increased safety margins (larger virtual bubble, shorter re-evaluation interval).
- Tier 3 — Unknown, static: No class confidence above threshold, object is not moving. Response: yield around it if path allows; short hold and re-evaluate if path doesn't allow yield. Do not treat as equivalent to "unknown mobile."
- Tier 4 — Unknown, mobile: No class confidence above threshold, object shows velocity. Response: full stop and re-evaluate. This is the case that actually warrants the most conservative behavior.
The critical insight is that Tier 3 and Tier 4 are not the same situation. An unclassified static object on a warehouse floor is most likely a piece of debris, a small equipment item, or a packaging fragment. An unclassified moving object could be a person, an unstaffed vehicle, or a powered piece of equipment acting outside its normal pattern. The hazard profile is completely different, and treating them identically produces the wrong response for one of them.
Tiering the unknown category this way — by adding motion state as a second axis alongside classification confidence — significantly reduces unnecessary full stops from static unknowns while keeping the conservative response where it actually matters: mobile unknowns.
Calibrating Classification Thresholds Per-Facility
The thresholds that define each tier are not universal. The right threshold for a facility running a mix of human pickers and forklifts in narrow 2.4m aisles is different from the right threshold for an automated dark warehouse with only robot traffic. Higher human density requires lower confidence thresholds for the "person" class — you want to classify a partial human signature as "person, low-confidence" rather than "unknown" so the planner applies the right safety bubble even when the confidence is marginal.
We expose thresholds as configurable parameters per obstacle class rather than a single global threshold:
# Example per-class confidence thresholds (mobvynt perception config)
classification_thresholds:
person:
tier1_min: 0.80 # Reduced from default 0.85 for high human-density facilities
tier2_min: 0.45 # Lower bound for low-confidence classification
forklift:
tier1_min: 0.85
tier2_min: 0.55
static_cart:
tier1_min: 0.75
tier2_min: 0.40
unknown_mobile_velocity_threshold: 0.15 # m/s — below this, treat as static for tier assignment
The unknown_mobile_velocity_threshold parameter deserves specific attention. A low value means many objects get classified as "mobile" and trigger Tier 4 responses. Too low and you're treating slow-drifting cardboard on an imperfectly level floor as a mobile unknown. Too high and you're letting a slowly-moving pedestrian or a coasting pallet jack fall into Tier 3 instead of Tier 4. We default this to 0.15 m/s and tune up or down based on the facility floor quality and whether unattended equipment is present in the space.
The False Positive Cost: Measuring It Honestly
A false-positive unknown classification — detecting something as Tier 3 or Tier 4 when there's actually nothing there, or when what's there is a known static element like a rack upright — has a direct throughput cost. Each unnecessary stop or yield maneuver adds 8–35 seconds of latency depending on the response type and the clearance check duration.
We track false-unknown rate as an operational metric: the number of unknown-class detections per 100 robot-minutes of operation, broken down by Tier 3 and Tier 4. A well-tuned deployment in an ambient warehouse runs at roughly 1–3 Tier 3 events per 100 robot-minutes and under 0.5 Tier 4 events per 100 robot-minutes. Newly deployed robots in unfamiliar facilities often start at 8–15 Tier 3 events per 100 robot-minutes as the system encounters the full range of that facility's object diversity — and that rate should fall over the first few weeks as the classifier adapts to facility-specific object distributions.
We're not claiming these numbers represent a hard safety-throughput tradeoff that one tuning approach resolves cleanly. The goal isn't to minimize unknown detections — some genuinely unknown objects are real hazards and the system should stop. The goal is to minimize unnecessary unknown classifications: cases where the system has enough information to make a better-than-unknown determination but didn't because the threshold was set too conservatively or the classification architecture collapsed ambiguous point sets into the unknown bucket by default.
When to Add New Classes vs. Tune Thresholds
There's a temptation to handle every new obstacle type by tuning the unknown threshold down — lowering the bar for what counts as "classified" so fewer objects fall into Tier 3 or Tier 4. That approach has limits. If an autonomous tugger shows up repeatedly as a low-confidence forklift, the right fix is to add a tugger class to the model with representative training data, not to lower the forklift confidence threshold until the tugger gets misclassified as "forklift" with high confidence. A forklift response and a tugger response have different speed, size, and motion profiles. Getting the class right matters for the response to be appropriate.
The discipline is: tune thresholds within a tier to adjust sensitivity on known classes; add new training classes when an object type is genuinely under-represented and the behavior response should differ from existing classes. Conflating these two levers produces perception behavior that's hard to reason about and hard to audit when something goes wrong.