Operations June 28, 2024

Why Warehouse Robots Keep Getting Lost

By Aaliyah Washington

Autonomous mobile robot stopped in a warehouse aisle blocked by a misplaced pallet

Here is the thing nobody says during the pilot: every AMR deployment we have been part of worked great the first week. The warehouse was staged for the demo. Pallets in their marked positions. Forklifts on predictable routes. Pick density low enough that the aisles stayed clear. The robot hit every slot on time and the operations manager was happy.

Then the facility went back to normal operations. Pallets migrated six inches from their marked spots. A seasonal pallet flow change moved an entire staging area. A new hire parked a cart in aisle 7 every morning at 7 AM. Within two weeks, the average mission completion rate had dropped from 94% to 71%, and the maintenance team was spending 40 minutes per shift manually clearing robot stalls.

We have seen this exact pattern more than once. It is not a hardware problem. It is not even primarily a software problem. It is a map problem — and the map problem is more fundamental than most people realize.

What "Getting Lost" Actually Means

When an AMR stalls in an unexpected location, operators typically describe it as the robot "getting confused" or "losing its position." The localization framing is misleading. In most modern AMR deployments, the robot knows exactly where it is — AMCL or a similar Monte Carlo localization algorithm is maintaining a reasonably accurate pose estimate against the base occupancy map. The robot is not lost in the GPS sense.

What has actually happened is that the robot's occupancy grid — the representation of the world it is navigating against — no longer matches the world that exists. The static occupancy grid was built during initial SLAM mapping and records the facility as it was on mapping day. Every pallet that has moved since then, every temporary partition, every cart parked in a new spot exists in the real world but not in the robot's map. From the planner's perspective, the path ahead is clear. From the robot's sensor readings, there is something in the way.

The costmap2D layer in Nav2 handles this with an obstacle inflation radius — it marks detected obstacles in the local costmap and tries to route around them. This works reasonably well for genuinely transient obstacles. Where it breaks down is when obstacles are semi-permanent (that staging pallet has been there for six weeks, but it never made it into the static map), densely clustered (three obstacles within a two-meter aisle segment), or moving (a forklift reversing at low speed in the same corridor the robot is trying to traverse).

The resulting behavior from the robot's perspective is a series of plan failures, recovery behaviors, and eventual stall — which looks to the operator like the robot "got lost."

The Map Debt Problem

In the facilities we have worked in directly, the operational cost of maintaining a static map is routinely underestimated before deployment. The initial SLAM mapping session gets scheduled, completed, and the resulting map gets loaded into the fleet management system. Then the facility changes, as facilities do.

A mid-sized ambient fulfillment center — say 150,000 square feet with 18 AMRs — will typically have 8 to 12 meaningful layout changes per month during normal operations: seasonal product flow reconfigurations, new storage locations for velocity items, temporary overflow staging from receiving. Each of these changes creates delta between the map and reality. Depending on where the change falls relative to the robot's regular paths, the impact ranges from a slight increase in replanning frequency to a complete aisle becoming unnavigable.

Correcting this requires bringing a robot offline, running a new SLAM session in that zone, validating the updated map segment, merging it into the fleet map, and reloading across all robots. At most operations we have seen, a thorough map update takes between 2 and 4 hours of engineering time including validation. Do it once a month and it is a manageable overhead. Do it every time a forklift driver moves a pallet to a slightly different spot — which is every shift — and it is operationally impossible.

The result is map debt: a growing gap between what the map says and what is actually there. Operators learn to compensate by reducing robot speed in problem zones, marking certain aisles as no-go, or assigning a staff member to clear robot stalls during peak hours. These compensations add up to a real throughput penalty that never shows up in the pilot metrics.

Where the Failure Actually Lives

It is worth being precise about what part of the navigation stack is failing, because the solution depends on it.

The SLAM localization layer is mostly fine. Robots running AMCL or a graph-SLAM variant can maintain adequate pose estimates in changed environments as long as there are sufficient stable landmarks (walls, fixed shelving, structural columns) for scan matching. Localization degrades in environments where most features are mobile, but in typical fulfillment centers, the structural layout provides enough stable reference.

The planner layer — whether that is Nav2's NavFn or DWA local planner — is also doing roughly what it is supposed to do given its inputs. The problem is that its inputs are wrong. It is planning against a costmap that contains a combination of static map data (accurate at mapping time, increasingly stale thereafter) and local obstacle detections (current but with no memory and no classification).

The gap is at the interface between perception and planning: the system has no way to distinguish between a misplaced pallet that has been there for six weeks (stable enough to treat as a static obstacle and update the map) and a forklift that drove in two seconds ago (moving obstacle that needs trajectory prediction). Both show up as the same kind of costmap inflation in the standard Nav2 stack. The robot treats the reversing forklift as if it might stay forever and treats the six-week pallet as if it might disappear any moment. Neither assumption is right.

What Needs to Be Different

We are not saying the static occupancy map is wrong to use. It is still the right foundation for localization and for representing the parts of the facility that genuinely do not change. The wall is always where the map says it is. The structural columns are where the map says they are. That stable layer should stay.

What needs to be different is the dynamic obstacle layer that sits on top of it. A robot navigating a live warehouse needs the ability to classify what it is seeing — is this a person? A pallet? A forklift in motion? — and to predict where that object will be in the next 500 to 1000 milliseconds. With that classification and short-horizon trajectory estimate, the planner can make a genuinely better decision than "inflate a cost radius and see if A* can find a path around it."

Specifically: a robot that knows it is looking at a reversing forklift can predict the forklift's trajectory and wait for it to pass rather than attempting a path that will fail anyway. A robot that knows the large rectangular object blocking aisle 7 is a stationary pallet (no motion in the last 30 scan frames) can treat it as a semi-permanent obstacle, route around it confidently, and even flag it to the operations system for map update rather than attempting increasingly desperate local replanning.

This is the problem we built Mobvynt's perception layer to address. The underlying architecture is two-stage: an obstacle classifier that runs on each sensor frame and produces typed, tracked obstacle detections, feeding into a planner that uses those typed detections to make temporally-aware routing decisions. The net result is that robots spend significantly less time in recovery behaviors and significantly more time making forward progress — even in facilities that have not had their static map updated in months.

The Honest Caveat

Better dynamic obstacle handling does not eliminate map maintenance. It reduces the frequency and urgency of map updates by making the robot more resilient to map delta, but it does not make static maps irrelevant. If a facility undergoes a major layout reconfiguration — an entire warehouse zone gets reorganized — you still need to update the base map. The localization layer depends on having structural features roughly where the map says they are.

What it does change is the operational calculus. Map updates become a scheduled maintenance activity rather than an emergency response to robot stalls. The facility team can plan a monthly or quarterly map refresh on a shift with low robot traffic rather than scrambling to push an update because three robots are stalled on the pick floor during peak hours.

That shift alone — from reactive map updates to scheduled map maintenance — tends to recover most of the throughput penalty that operators were compensating for with reduced speeds and human stall-clearing. It does not require new hardware. It requires a perception layer that actually understands what it is looking at.

Why Warehouse Robots Keep Getting Lost

What "Getting Lost" Actually Means

The Map Debt Problem

Where the Failure Actually Lives

What Needs to Be Different

The Honest Caveat

More from Mobvynt

The Hidden Cost of Pre-Mapped Facility Grids

LiDAR vs Stereo Vision for AMR Obstacle Detection

How We Detect a Reversing Forklift in Under 80ms