Why "Fusing" Data Is Harder Than It Sounds
The word "fusion" gets used a lot in transportation data contexts without much explanation of what it actually involves. If you've encountered it in vendor presentations or research literature, you might have come away with the impression that multimodal data fusion is either trivially easy (just load the data into a database and join the tables) or so technically complex that only data scientists should worry about it. Neither is quite right. The actual process sits in between — it requires careful attention to a specific set of problems that arise whenever you try to combine data from different sources that weren't designed to work together.
This article is a non-technical explanation of what data fusion means in the context of Mobvynt's demand surface platform, and more broadly in the urban mobility analytics field. I'll explain what problems arise, how we address them, and where honest uncertainty lives in the process.
The Three Core Problems in Multimodal Fusion
1. Temporal Alignment
Transit ridership data (from APC units or AFC systems), bike-share trip records, and intersection count data are collected at different time resolutions and with different reference timestamps. APC data is typically recorded per trip — a boarding and alighting count associated with each stop event, timestamped to when the vehicle visited that stop. GBFS bike-share data is updated in real-time based on station availability, with individual trip records (where available) timestamped to trip start and end. Intersection camera count data may be aggregated to 15-minute or hourly bins by the intersection controller before being exported.
To combine these into a single demand picture, you need to put them all on the same temporal basis. This sounds simple — just convert everything to the same time resolution — but it involves choices that affect what patterns you can and can't see. If you aggregate to 1-hour bins to match the coarsest data source (intersection counts), you lose the finer temporal structure in the transit data. If you try to disaggregate the intersection count bins to 15-minute intervals, you're making assumptions about within-hour distribution that may not be well-supported by the data.
The practical answer is usually to work at a resolution consistent with the planning question: for corridor analysis and demand surface mapping, hourly or 3-hour temporal bins are usually sufficient. For real-time operations analysis, you want to stay closer to the native resolution of each data source and use temporal joins with appropriate window tolerances rather than forcing all data into the same bin.
2. Spatial Alignment
Each data source has its own spatial reference: transit stop locations (from GTFS), bike-share station or vehicle GPS coordinates (from GBFS), intersection locations (from the city's signal asset management database), and population/employment data (from ACS and LEHD at census geography). These don't naturally align to a common spatial grid — and the spatial errors in each source are different in character.
GTFS stop locations are generally accurate but occasionally have known positioning errors (a stop listed in the feed at the wrong side of an intersection, or a stop that's been relocated but whose GTFS coordinates haven't been updated). GBFS GPS coordinates for dockless vehicles are accurate to the vehicle's GPS uncertainty, which is typically 3–10 meters in open sky conditions but can be significantly worse in urban canyons. Intersection coordinates from city asset databases can be defined variously as the center of the intersection, the stop bar, or an arbitrary reference point associated with the signal controller — none of which is quite the same thing.
Snapping all of these to an H3 grid at resolution 8 or 9 is the practical solution. The H3 cell is large enough that GPS uncertainty and minor coordinate errors don't matter — a vehicle that's 8 meters off from its true position will still map to the same H3 resolution 9 cell. The grid provides a consistent spatial reference that absorbs the positional variance in the underlying data.
3. Normalization: Making Different Quantities Comparable
This is the least obvious problem and the most analytically consequential. Transit boardings, bike-share trip starts, and intersection turning movement counts measure different things. You can't simply add them together and call the result "mobility demand." The quantities need to be normalized before they can be meaningfully combined.
The normalization approach depends on what question you're answering. For demand surface mapping, Mobvynt's approach is to convert each data source to a comparable indicator of person-trip demand within an H3 cell and time window. Transit boardings translate directly to person-trips. Bike-share trip starts and ends are also person-trips. Intersection turning movement counts need to be converted using an assumed vehicle occupancy rate to get person-trips, and then broken down by mode (vehicles vs. pedestrians vs. cyclists) based on available count data or regional assumptions.
Each of these normalization steps introduces uncertainty. Vehicle occupancy assumptions vary by corridor type, time of day, and regional context. The pedestrian and cyclist shares of intersection counts depend on whether the intersection count data includes those modes (many older intersection systems don't). We document these assumptions explicitly in our methodology brief, because a demand surface is only as reliable as its normalization assumptions — and those assumptions should be visible to the planners using the output.
What the Fused Demand Surface Tells You (and What It Doesn't)
A properly constructed multimodal demand surface shows the distribution of person-trip demand across a geographic area over a defined time window — a map of where people are actually moving, combining all observable mobility modes. That's genuinely useful for:
- Identifying high-demand areas not currently served well by transit: Where the demand surface shows concentrated person-trip activity but current transit service is infrequent or absent, you have a candidate for route expansion or BRT investment.
- Understanding how different modes relate spatially: Where bike-share activity concentrates near transit stops, that's a first/last mile connection pattern worth supporting with infrastructure. Where bike-share and transit activity are geographically separate, bike-share is likely substituting for short transit trips rather than feeding the fixed-route network.
- Screening corridors for detailed analysis: The demand surface provides a data-driven first filter for the long list of potential investments in an LRTP or TIP development process.
What the fused demand surface does not tell you:
- Why people are traveling: Trip purpose — commute, shopping, healthcare, education — requires survey data or inference from land use. The demand surface tells you where trips are happening, not why.
- Who is traveling: Mode share and trip volume data is aggregate; it doesn't provide demographic characteristics of travelers without overlay of survey or census data. Equity analysis requires pairing the demand surface with demographic data — the demand surface alone doesn't reveal whether high-demand areas are serving or excluding transit-dependent populations.
- Future demand: The demand surface is a current-conditions picture. Forecasting requires additional modeling; the demand surface provides the most up-to-date observed conditions baseline for that modeling, but the forecast itself depends on demand model assumptions that aren't embedded in the surface.
Data Quality: The Honest Constraint
Multimodal data fusion is only as good as the data going in. The practical reality for most transit agencies is that data quality is uneven across sources: GTFS schedule data is generally clean and well-maintained; GTFS-RT occupancy data quality varies by fleet and APC calibration; GBFS feeds from different operators have different update frequencies and different levels of trip data sharing; intersection camera APIs differ significantly in the completeness and format of the data they expose.
A demand surface built on incomplete or degraded data still produces a spatially coherent picture, but with variable confidence across different parts of the study area. At Mobvynt, we track data completeness per H3 cell and per time window, and surface that completeness information alongside the demand estimates. A cell with high demand and high data completeness is a reliable signal; a cell with high demand but low completeness (because, say, the relevant APC units were offline for much of the analysis period) needs to be flagged for independent verification.
Transparency about data quality isn't an admission of failure — it's a methodological obligation when the analysis is being used to inform capital investment decisions that will affect communities for decades. Planners deserve to know where the analysis is confident and where it's making inferences from incomplete data.
Fusing data from different sources doesn't create certainty — it creates a more complete picture with visible uncertainty. That's the honest value proposition of multimodal demand analysis.