Path Planning March 14, 2025

AMR Fleet Coordination Without a Centralized Planner

Multiple robot paths in warehouse top-down abstract coordination view

Most multi-robot fleet systems are organized around a centralized traffic manager: a server process that knows the position and planned path of every robot, assigns priority in conflict zones, and tells each robot when it can proceed. This works. It is also a single point of failure that introduces network latency into every navigation decision, and it scales poorly to robot counts above 20 to 30 in environments with high aisle density.

We took a different approach with MV Bridge's fleet coordination layer. Rather than routing every inter-robot decision through a central authority, each robot shares a lightweight event stream with its neighbors and makes its own avoidance decisions locally. This post describes how it works, where it is better than centralized coordination, and where it is not.

The Problem with Centralized Fleet Management in High-Density Environments

A centralized fleet traffic manager needs to know the real-time position and intended path of every robot to resolve conflicts. In a facility running 20 robots, the manager is receiving 20 position updates at whatever the reporting frequency is — typically 2 to 5 Hz — processing conflict detection across all robot path pairs, issuing priority decisions, and distributing those decisions back to each robot. At 5 Hz per robot with 20 robots, that is 100 incoming position updates per second and potentially dozens of conflict resolution decisions per second during peak periods.

The latency of this loop matters. A robot traveling at 1.0 m/s that is waiting for a priority decision from the traffic manager has already moved 200 mm from its last reported position by the time the decision arrives at 200 ms round-trip latency. In a 1.2 m wide aisle with two robots approaching each other, 200 ms is the difference between a smooth coordinated slowdown and a hard stop.

More critically: if the central traffic manager becomes unavailable — network interruption, server load spike, software fault — the robots lose the ability to make coordinated decisions. The safe fallback is to stop all robots and wait for reconnection, which is unacceptable during peak picking operations. The less safe fallback is to have each robot navigate independently, which can cause deadlocks or near-misses in tight corridors.

Neither outcome is acceptable as the default failure mode for a production logistics fleet.

The Event-Sharing Model

MV Bridge's coordination layer is built on a peer-to-peer obstacle event broadcast. Each robot running Mobvynt publishes its detected obstacle events — including detections of other robots — to a local MQTT broker on the facility network. Every other robot on the same network subscribes to all robot event topics and incorporates those detections into its own obstacle layer.

The event message structure is minimal:

{
  "robot_id": "mvnt-r04",
  "timestamp_ms": 1741916040000,
  "event_type": "obstacle_detected",
  "obstacle": {
    "class": "amr",
    "centroid_x_m": 12.4,
    "centroid_y_m": 6.8,
    "velocity_x_ms": -0.72,
    "velocity_y_ms": 0.1,
    "bbox_w_m": 0.65,
    "bbox_l_m": 0.90,
    "confidence": 0.91
  },
  "ttl_ms": 800
}

The ttl_ms field is the time-to-live for this detection event. When a receiving robot ingests this event, it adds the obstacle to its costmap with a confidence decay function set to expire the detection after ttl_ms milliseconds unless refreshed by a new event from the same reporting robot. This means that if a robot stops broadcasting (network loss, robot offline), its obstacle events age out of every other robot's costmap within 800 ms rather than persisting indefinitely.

Each robot also broadcasts a periodic self-position event — not for collision avoidance (the robot detects other robots via its own LiDAR), but for the planner's planning horizon. When a robot knows the planned paths of nearby robots 2 to 3 seconds ahead, it can route to avoid predicted future positions rather than just reacting to current positions.

Local Decision Making: Priority Without a Coordinator

With each robot having a real-time picture of nearby robots' positions, velocities, and obstacle detections, inter-robot conflict resolution can happen locally. Our priority model uses a deterministic tie-breaking rule: the robot with the lower ID number has right-of-way in a head-on aisle conflict. This is simple, stateless, requires no coordination, and produces consistent behavior.

The priority rule is intentionally simple because complex local negotiation protocols tend to fail in ways that are hard to predict at system scale. A priority scheme that requires robots to exchange negotiation messages introduces its own latency and failure modes. The deterministic rule means no negotiation: robot 03 always yields to robot 02 in a head-on scenario. Both robots know this without communicating, because both know their own ID and both can see the other robot's ID in its broadcast events.

This is not a complete multi-robot planning solution. It does not globally optimize fleet throughput — a centralized planner that knows all robot paths can make globally optimal assignments that a local priority model cannot. We are not saying local coordination is better than centralized planning in every metric. We are saying it is significantly more resilient to the failure modes that matter in production operations — network interruption, server load, robot count scaling — and that the throughput difference is smaller than it appears in theoretical analysis once you account for the latency overhead of centralized coordination in real network conditions.

Deadlock Detection and Resolution

One scenario where local priority rules can fail is a cyclic deadlock: robot A is blocked by robot B, which is blocked by robot C, which is blocked by robot A. A deterministic priority rule resolves head-on conflicts but does not automatically break cycles.

We handle this with a deadlock detector that runs independently on each robot. Each robot monitors its own mission progress: if it has been in a stopped or near-stopped state for more than a configurable threshold (default: 8 seconds) without completing forward progress, it declares a potential deadlock and broadcasts a deadlock event to the fleet. Any robot receiving a deadlock event from a neighbor checks whether it is also stalled. If two or more robots in a proximity cluster are simultaneously reporting stall states, the robot with the highest ID number initiates a reverse-and-reroute behavior — backing up 0.5 to 1.0 m to break the geometric lock and replanning with the new positions of all involved robots included in its costmap.

In practice, deadlocks in our test environments occurred at a rate of approximately 0.3 to 0.8 per 100 robot-missions in high-density configurations (8+ robots in an aisle zone). The auto-resolution behavior resolved deadlocks within 6 to 12 seconds in all observed cases without requiring operator intervention.

What This Architecture Does Not Handle Well

The event-sharing model assumes reasonably reliable network connectivity between robots. In facilities with significant wireless coverage dead zones — thick concrete walls, metal shelving creating RF shadows — robots may lose sight of each other's event broadcasts for periods long enough that their shared obstacle pictures diverge. We mitigate this with the TTL expiry (stale events age out conservatively) and with the robot's own LiDAR-based detection of other robots, which provides a local ground truth that does not depend on network. But in environments with very poor wireless coverage, the local LiDAR detection alone is the fallback and the planner's forward planning horizon shrinks to what the robot can directly observe.

The architecture also does not globally optimize task assignment. MV Bridge handles collision avoidance and traffic coordination; it does not tell a robot which pick slot to go to next. Task assignment remains in the fleet management system or WMS. The two systems interact through the Mobvynt obstacle event layer, but they are not the same function.

For facilities with very large robot counts — 50 or more robots in a single zone — the MQTT event broadcast volume can become a bottleneck. Each robot broadcasting at 5 Hz means 250 events per second in a 50-robot zone. We have not tested above 32 robots in a single zone at production frequency; at that scale, the architecture may need a broker cluster rather than a single broker instance. That is an infrastructure question, not a protocol limitation, but it is worth knowing before scaling up.