What GTFS-RT Actually Carries
Most agencies that have deployed GTFS-RT are using it primarily as a passenger information tool — feeding arrival predictions to trip-planning apps or digital signage. That's a legitimate use, but it represents a fraction of what the feed actually contains. A properly maintained GTFS-RT implementation broadcasts three distinct feed types: TripUpdates, VehiclePositions, and ServiceAlerts. Of these, VehiclePositions and TripUpdates together contain signal that, when processed correctly, gives a planning analyst a surprisingly detailed picture of real-time demand pressure across the network.
TripUpdates carry stop-level delay data: scheduled vs. actual departure times at each stop in the sequence. VehiclePositions carry latitude/longitude, speed, and — where AVL systems are integrated with AFC and APC — occupancy status. The occupancy field is specified in the GTFS-RT standard as an enum ranging from EMPTY through CRUSHED_STANDING_ROOM_ONLY. A lot of agencies have this wired up but aren't actually reading the occupancy data downstream for planning purposes.
The Gap Between Data Publication and Data Utilization
Consider a mid-size transit authority running a fleet of 180 buses with onboard APC units. The APCs count boardings and alightings at every stop, and those counts are aggregated into the occupancy enum that flows into the GTFS-RT VehiclePositions feed. The feed updates every 15–30 seconds. Over a single weekday, that's on the order of 2–5 million position records, most of them containing an occupancy status.
What happens to those records? In practice, at most agencies: they're consumed by the CAD/AVL dispatcher display, and then either discarded or archived in a format that's difficult to query. NTD Section 30 reporting requires historical ridership data, but it typically comes from APC aggregates processed separately by the operations department — not from the real-time stream. The planning department may never touch the GTFS-RT data at all.
This is the gap we're describing — not a technology failure, but a workflow gap. The data is being generated; the demand signal is there; it's just not being routed to the people who need it for corridor analysis and service planning decisions.
Extracting Demand Signal: A Practical Approach
The core analytical technique is reconstructing load profiles from VehiclePositions + TripUpdates over time. Here's the general approach:
- Archive the feed at ingestion: GTFS-RT feeds are ephemeral — they reflect the current state of the network. You need to ingest and store the feed at consistent intervals (15–30 second polling is standard) to build a historical record. Most agencies do not do this systematically; if you're starting from scratch, you'll need either a custom archiver or a platform that handles this for you.
- Join VehiclePositions to GTFS schedule via trip_id: Each vehicle position record carries a
trip_idthat links back to the GTFS static schedule. This gives you the planned route, direction, and stop sequence — essential context for interpreting where the vehicle is in its run. - Interpolate stop-level load from occupancy transitions: If occupancy goes from
MANY_SEATS_AVAILABLEat stop 12 toSTANDING_ROOM_ONLYat stop 14, you have a load event — a corridor segment where demand is exceeding comfortable capacity. This isn't APC-precision data, but it's directionally accurate for planning purposes. - Aggregate to time-of-day and day-of-week bins: Single-day snapshots are noise. The signal emerges when you stack 60–90 days of data and look at consistent patterns: which stops regularly see load spikes in the AM peak, which corridors are chronically underloaded at midday.
TripUpdate Delay Patterns as Latent Demand Indicator
There's a less obvious signal in TripUpdates that's worth understanding. Stop-level delays — measured as departure_time - scheduled_departure_time — tend to correlate with dwell time, which in turn correlates with boarding volumes. A bus running 4–6 minutes late by stop 8 of a 22-stop run isn't usually experiencing traffic; it's experiencing dwell time accumulation at high-demand stops.
This is not a perfect proxy for boardings — dwell time is also driven by wheelchair lift operations, fare disputes, operator breaks, and traffic signal timing. But when you see consistent 3–5 minute positive delays accumulating at the same 3–4 stops every weekday morning over multiple months, that's a demand pattern worth investigating further with APC data or field observation.
We're not saying TripUpdate delay data replaces APC counts for load analysis. It doesn't. APC data is better for precise ridership accounting and NTD reporting. What delay pattern analysis gives you is a quick, low-cost screening tool to prioritize where to focus your detailed APC analysis — particularly useful when APC coverage is incomplete or APC data quality is inconsistent across the fleet.
Occupancy Data Quality: Know Your Limitations
Before you build planning decisions on GTFS-RT occupancy data, you need to audit its quality in your specific fleet. Occupancy status in GTFS-RT is only as good as the sensor or algorithm producing it. Agencies derive occupancy from several sources:
- APC-to-occupancy translation: Real-time boardings minus alightings running total, mapped to the occupancy enum. This is reasonably accurate when APC sensors are well-calibrated, but sensor drift is common and re-calibration schedules vary.
- Weight sensors: Some newer bus models use vehicle suspension load sensors. Generally accurate but can miscount when passengers cluster near doorways.
- Manual operator entry: Still common in smaller fleets. Operators enter an occupancy level at each timepoint stop. Notoriously inconsistent — operators tend toward the middle of the range and don't update frequently enough for real-time use.
Before using occupancy data for planning analysis, run a validation check: compare occupancy-derived load estimates against APC counts for the same trips over a 2–4 week period. If the correlation is reasonable (broadly consistent directionally even if not numerically precise), the occupancy data is usable for screening analysis. If you're seeing occupancy showing FEW_SEATS_AVAILABLE on trips that APC shows as 30% loaded, the occupancy pipeline is broken somewhere and you shouldn't use it.
Building a Continuous Demand Monitoring Workflow
The goal isn't a one-time analysis — it's an ongoing monitoring system that surfaces anomalies and emerging demand patterns without requiring manual queries. A practical continuous monitoring stack for a mid-size agency might look like this:
- A GTFS-RT archiver running on a lightweight cloud instance, polling the feed every 20 seconds and writing to columnar storage (Parquet works well for this)
- A weekly batch job that computes route-level and stop-level load metrics: average peak-hour load factor, frequency of occupancy threshold crossings, delay accumulation by stop
- A simple dashboard or report that flags routes where load factors have shifted more than 15–20% from the 90-day rolling average
This doesn't require expensive commercial software if your team has data engineering capacity. What it does require is organizational commitment to treating the GTFS-RT feed as planning infrastructure, not just a passenger information service. The agencies that do this well have a planner or analyst with explicit ownership of the GTFS-RT data pipeline — someone whose job description includes keeping the archiver running and the monitoring metrics current.
For agencies without that internal capacity, a platform that handles ingestion, archiving, and demand surface computation from GTFS-RT feeds delivers the same analytical output without requiring in-house data engineering. Either path gets you to the same place: planning decisions informed by what's actually happening on the network today, not what a travel demand model predicted would happen five years ago.
The agencies that get the most out of GTFS-RT for planning treat the feed as continuous demand intelligence — not a real-time display for passengers.
What to Do With This Analysis
Once you have a functioning GTFS-RT demand monitoring workflow, the outputs feed directly into service planning decisions: headway adjustments on overloaded corridors, span extensions where late-evening load factors indicate unmet demand, stop consolidation where dwell time analysis shows multiple underutilized stops in close proximity. The loop from data to decision is shorter than it is with traditional annual NTD-based planning cycles — and that matters when ridership patterns are shifting as rapidly as they have in the post-pandemic period.
Start small: pick one corridor where you have both GTFS-RT occupancy data and APC counts, validate the correlation, and build your first load profile. The methodology generalizes from there.