Abstract | Although automatically collected human travel records can accurately capture the time and location of human
movements, they do not directly explain the hidden semantic structures behind the data, e.g., activity types. This work
proposes a probabilistic topic model, adapted from Latent Dirichlet Allocation (LDA), to discover representative and
interpretable activity categorization from individual-level spatiotemporal data in an unsupervised manner. Specifically,
the activity-travel episodes of an individual user are treated as words in a document, and each topic is a distribution
over space and time that corresponds to certain type of activity. The model accounts for a mixture of discrete and
continuous attributes---the location, start time of day, start day of week, and duration of each activity episode. The
proposed methodology is demonstrated using pseudonymized transit smart card data from London, U.K. The results
show that the model can successfully distinguish the three most basic types of activities---home, work, and other. As
the specified number of activity categories increases, more specific subpatterns for home and work emerge, and both
the goodness of fit and predictive performance for travel behavior improve. This work makes it possible to enrich
human mobility data with representative and interpretable activity patterns without relying on predefined activity categories or heuristic rules.
|