Harnessing household travel survey with smart card data to generate spatiotemporally-diverse activity schedules for transit users

Khoa D. Vo et al.

Transportation Research Part B: Methodological2026https://doi.org/10.1016/j.trb.2025.103388article
AJG 4ABDC A*
Weight
0.37

Abstract

• Two-stage framework fuses survey and smart-card (SC) data to generate full daily activity schedules. • Latent-variable design preserves key distributions from both data sources. • Optimally constructed latent space increases diversity of synthesized activity patterns. • Seoul case study produces 2.92M unique schedules: over 80 times the survey sample. • Synthesized schedules validated using external cellular trace data. Current activity-based models (ABMs) rely on household travel survey (HTS) data to generate daily activity schedules for transit users. However, HTS suffers from limited sampling, resulting in low spatiotemporal diversity. Smart card (SC) data offer broader transit coverage but lack sociodemographic, non-transit trips, and trip-level details, making integration with HTS challenging. This study introduces a novel two-stage data fusion framework that combines detailed but sparse HTS data with high-coverage SC data to generate complete, diverse, and up-to-date activity schedules for transit users. In Stage 1, the framework learns a latent class structure to align the spatiotemporal characteristics of transit trips across datasets and estimates a fused joint distribution over all attributes except the spatiotemporal details of non-transit trips. Stage 2 imputes these missing spatiotemporal details to complete full trip chains. A key innovation is the construction of a latent space with optimal complexity that preserves key statistical properties while enhancing the diversity of synthesized activity patterns. The framework ensures scalability by decomposing the fusion task into analytically tractable sub-problems. The model properties are first validated in a controlled experiment. Further validation using data from 3.4 million SC users in Seoul, South Korea, shows that the fused population closely aligns with external cellular signaling data and significantly outperforms HTS alone – generating up to 2.92 million unique synthetic schedules (an 82.8 × increase over HTS). In sum, the proposed method lays the groundwork for integrating diverse data sources into ABMs, enhancing their ability to generate diverse synthetic mobility patterns, including underrepresented segments.

1 citation

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1016/j.trb.2025.103388

Or copy a formatted citation

@article{khoa2026,
  title        = {{Harnessing household travel survey with smart card data to generate spatiotemporally-diverse activity schedules for transit users}},
  author       = {Khoa D. Vo et al.},
  journal      = {Transportation Research Part B: Methodological},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1016/j.trb.2025.103388},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Harnessing household travel survey with smart card data to generate spatiotemporally-diverse activity schedules for transit users

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.37

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.16 × 0.4 = 0.06
M · momentum0.53 × 0.15 = 0.08
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.