Incorporating Mobile Activity Tracking Data In A Transit Agency: Collecting, Comparing, And Trip Mode Inference

TitleIncorporating Mobile Activity Tracking Data In A Transit Agency: Collecting, Comparing, And Trip Mode Inference
Publication TypeConference Paper
Year of Publication2017
AuthorsTim Scully, John Attanucci, Jinhua Zhao
Conference NameTransportation Research Board 96th Annual Meeting
KeywordsGIS, GPS, Machine Learning, Mobile Activity Tracking, Mode Inference, Smartphone Surveys, Survey

The near ubiquity of smartphones has the potential to transform how researchers, companies, and public transit agencies understand travel behavior. This research analyzes how an emerging class of automatically-collected data based on smartphone GPS and sensor information – referred to here as mobile activity-tracking data – can be used in a transit agency to better understand travel behavior. Through a collaboration with Transport for London, multiple weeks of mobile activity-tracking data of London residents was collected between 2015 and 2016 using an application called Moves. Using this case study, this paper discusses the benefits of this new data and how it compares with other data at TfL and elsewhere and examines the process of collecting the data.

Using the resulting data, this paper then compares the resulting trip records from the mobile activity tracking data with those form the automatic fare card data collected during the same period and same individuals. By comparing mobile activity tracking with an established, well-researched data source like AFC, we observe that while the trip match rate between the two data sources is high (68%) but not perfect. Next, the paper proposes a probabilistic framework to identify between motorized trip modes using mobile activity tracking data and and the public transit network. Specifically, the model uses both spatial characteristics, such as distance to public transit network, and trip characteristics such as speed in order to identify the trip mode as bus, rail, subway, or non-public transit. Using logistic regression, classification tree, and random forest, this model achieves an accuracy of 90%, 91%, and 92% respectively.