Mobility Sensing & Prediction – MIT JTL-Transit Lab

For a century, transportation agencies have relied on costly and unreliable manual data collection systems. These approaches have hampered the effective planning, management, and evaluation of mobility services, ultimately reducing efficiency and threatening quality customer service.

The development of Information and Communication Technology (ICT), however, has transformed what was once a data-starved arena into a data-rich environment for planners and managers.

At JTL, we utilize automatic data sources, such as smart-card transactions, GPS-based vehicle locations, cell phone records, and mobility apps to estimate and predict travel demand, explore behavioral patterns, quantify service reliability and evaluate transportation system performance as a whole.

Impacts of remote work on vehicle miles traveled and transit ridership in the USA
Yunhan Zheng
Shenhao Wang
Lun Liu
Jim Aloisi
Jinhua Zhao
Nature Cities

(
2024
)
Remote work's potential as a sustainable mobility solution has garnered attention, particularly due to its widespread adoption during the COVID-19 pandemic. Our study systematically examines the impacts of remote work on vehicle-miles traveled (VMT) and transit ridership in the United States from April 2020 to October 2022. We find that using the pre-pandemic levels as the baselines, a mere 1% decrease in on-site workers corresponds to a 0.99% reduction in state-level VMT and a 2.26% drop in Metropolitan Statistical Area (MSA)-level transit ridership. Notably, a 10% decrease in on-site workers compared to the pre-pandemic level could yield a consequential annual reduction of 191.8 million metric tons (10%) in CO2 emissions from the transportation sector, alongside a substantial $3.7 billion (26.7%) annual loss in transit fare revenues within the contiguous US. These findings offer policymakers crucial insights into how different remote work policies can impact urban transport and environmental sustainability as remote work continues to persist.
The Mobility Pattern of Dockless Bike Sharing: A Four-month Study in Singapore
Xiaohu Zhang
Yu Shen
Jinhua Zhao
Transportation Research Part D

(
2021
)
Many cities around the world have adopted dockless bike-sharing programs with the hope that this new ser- vice could enhance last-mile public transit connections. However, our understanding of the travel patterns using dockless bike sharing is still limited. To advance the knowledge on the new service, this study inves- tigates mobility patterns of dockless bike sharing in Singapore using a four-month dataset. An exploratory spatiotemporal analysis is conducted to show daily travel patterns, while community detection of networks is used to explore the spatial clusters emerged from cycling behaviors. A series of Poisson regression models are then estimated to characterize the generation, attraction and resistance factors of bike trips in different periods of a day. The proposed regression model, which considers built environment factors of origin and destination simultaneously, is proved to be effective in deciphering mobility. The empirical findings shed light on policy implications in sustainable transportation planning.
Predictive decision support platform and its application in crowding prediction and passenger information generation
Peyman Noursalehi
Haris N. Koutsopoulos
Jinhua Zhao
Transportation Research Part C

(
2021
)
Demand for public transport has witnessed a steady growth over the last decade in many densely populated cities around the world. However, capacity has not always matched this increased demand. As such, passengers experience long waiting times and are denied boarding during the peak hours. Crowded platforms and the subsequent customer dissatisfaction and safety issues have become a serious concern. The COVID-19 pandemic has dramatically reduced passengers’ willingness to board crowded trains, causing a surge in demand for real-time crowding information. In this paper, we propose a real-time predictive decision support platform which addresses both, operations control and customer information needs. The system provides crowding predictions on trains and platforms, communicates this information to passengers, and takes into account their response to it. It is demonstrated through a case study that providing predictive information to passengers can potentially reduce denied boarding and lead to better utilization of train capacity.
Bundled Mobility Passes in Chicago: Consumer Preference and Revenue Implications
Apaar Bansal
Jinhua Zhao
Transportation Research Board 99th Annual Meeting

Washington, D.C.
,

(
2020
)
Competition provided by “new” mobility services to public transit has often soured the relationship between the two transportation players. This paper proposes bundled mobility passes between public transit, bikesharing, and Transportation Network Companies (TNCs), as a potential framework in which the popularity of new mobility can be tapped to increase public transit revenue and pass sales while at the same time enabling public institutions to regulate these services more effectively. 1467 employees in the Chicago area answered a stated preference (SP) survey to gauge preferences towards a hypothetical bundled “Superpass” offered by the Chicago Transit Authority (CTA). The bundled mobility pass would include a CTA bus and rail pass, a bikeshare pass, a fixed number of shared ridehail rides per month, and could potentially be added on to an existing commuter rail pass for a discounted price. A discrete choice model was created to estimate Superpass demand under different scenarios. This analysis found that the CTA, bikeshare operator, and TNC operator can all increase either the number of passes they sell or the number of rides they provide to the market. They can all also increase their revenue or at least remain revenue neutral. This result shows that there is room for mutual benefit across all stakeholders through partnership in mobility bundles. This paper ends with five key recommendations for policymakers regarding bundled fare products, including the need to conduct innovative fare policy pilots.
Deep Neural Networks for Choice Analysis: Extracting Complete Economic Information for Interpretation
Shenhao Wang
Qingyi Wang
Jinhua Zhao
Transportation Research Part C: Emerging Technologies

(
2020
)
While deep neural networks (DNNs) have been increasingly applied to choice analysis showing high predictive power, it is unclear to what extent researchers can interpret economic information from DNNs. This paper demonstrates that DNNs can provide economic information as complete as classical discrete choice models (DCMs). The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution, and heterogeneous values of time. Unlike DCMs, DNNs can automatically learn utility functions and reveal behavioral patterns that are not prespecified by domain experts, particularly when the sample size is large. However, the economic information obtained from DNNs can be unreliable when the sample size is small, because of three challenges associated with the automatic learning capacity: high sensitivity to hyperparameters, model non-identification, and local irregularity. The first challenge is related to the statistical challenge of balancing approximation and estimation errors of DNNs, the second to the optimization challenge of identifying the global optimum in the DNN training, and the third to the robustness challenge of mitigating locally irregular patterns of estimated functions. To demonstrate the strength and challenges, we estimated the DNNs using a stated preference survey from Singapore and a revealed preference data from London, extracted the full list of economic information from the DNNs, and compared them with those from the DCMs. We found that the economic information either aggregated over trainings or population is more reliable than the disaggregate information of the individual observations or training, and that larger sample size, hyperparameter searching, model ensemble, and effective regularization can significantly improve the reliability of the economic information extracted from the DNNs. Future studies should investigate the requirement of sample size, better ensemble mechanisms, other regularization and DNN architectures, better optimization algorithms, and robust DNN training methods to address DNNs’ three challenges to provide more reliable economic information for DNN-based choice models.
Deep Neural Networks for Choice Analysis: Architecture Design with Alternative-Specific Utility Functions
Shenhao Wang
Baichuan Mo
Jinhua Zhao
Transportation Research Part C

(
2020
)
Whereas deep neural network (DNN) is increasingly applied to choice analysis, it is challenging to reconcile domain-specific behavioral knowledge with generic-purpose DNN, to improve DNN’s interpretability and predictive power, and to identify effective regularization methods for specific tasks. To address these challenges, this study demonstrates the use of behavioral knowledge for designing a particular DNN architecture with alternative-specific utility functions (ASU-DNN) and thereby improving both the predictive power and interpretability. Unlike a fully connected DNN (F-DNN), which computes the utility value of an alternative k by using the attributes of all the alternatives, ASU-DNN computes it by using only k's own attributes. Theoretically, ASU-DNN can substantially reduce the estimation error of F-DNN because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility can cause ASU-DNN to exhibit a larger approximation error. Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN over the whole hyperparameter space in a private dataset collected in Singapore and a public dataset available in the R mlogit package. The alternative-specific connectivity is associated with the independence of irrelevant alternative (IIA) constraint, which as a domain-knowledge-based regularization method is more effective than the most popular generic-purpose explicit and implicit regularization methods and architectural hyperparameters. ASU-DNN provides a more regular substitution pattern of travel mode choices than F-DNN does, rendering ASU-DNN more interpretable. The comparison between ASU-DNN and F-DNN also aids in testing behavioral knowledge. Our results reveal that individuals are more likely to compute utility by using an alternative’s own attributes, supporting the long-standing practice in choice modeling. Overall, this study demonstrates that behavioral knowledge can guide the architecture design of DNN, function as an effective domain-knowledge-based regularization method, and improve both the interpretability and predictive power of DNN in choice analysis. Future studies can explore the generalizability of ASU-DNN and other possibilities of using utility theory to design DNN architectures.
Capacity-Constrained Network Performance Model for Urban Rail Systems
Baichuan Mo
Zhenliang Ma
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Record

(
2020
)
This paper proposes a general Network Performance Model (NPM) for urban rail systems performance monitoring using smart card data. NPM is a schedule-based network loading model with strict capacity constraints and boarding priorities. It distributes passengers over the network given origin-destination (OD) demand, operations, route choice, and effective train capacity. A Bayesian simulation-based optimization method for calibrating the effective train capacity is introduced, which explicitly recognizes that capacity may be different at different stations depending on congestion levels. Case studies with data from the Mass Transit Railway (MTR) network in Hong Kong are used to validate the model and illustrate its applicability. NPM is validated using left behind survey data and exit passenger flow extracted from smart card data. The use of NPM for performance monitoring is demonstrated by analyzing the spatial-temporal crowding patterns in the system and evaluating dispatching strategies.
Discovering Latent Activity Patterns from Transit Smart Card Data: A Spatiotemporal Topic Model
Zhan Zhao
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Part C

(
2020
)
Although automatically collected human travel records can accurately capture the time and location of human
movements, they do not directly explain the hidden semantic structures behind the data, e.g., activity types. This work
proposes a probabilistic topic model, adapted from Latent Dirichlet Allocation (LDA), to discover representative and
interpretable activity categorization from individual-level spatiotemporal data in an unsupervised manner. Specifically,
the activity-travel episodes of an individual user are treated as words in a document, and each topic is a distribution
over space and time that corresponds to certain type of activity. The model accounts for a mixture of discrete and
continuous attributes—the location, start time of day, start day of week, and duration of each activity episode. The
proposed methodology is demonstrated using pseudonymized transit smart card data from London, U.K. The results
show that the model can successfully distinguish the three most basic types of activities—home, work, and other. As
the specified number of activity categories increases, more specific subpatterns for home and work emerge, and both
the goodness of fit and predictive performance for travel behavior improve. This work makes it possible to enrich
human mobility data with representative and interpretable activity patterns without relying on predefined activity categories or heuristic rules.
Is Ridesourcing More Efficient than Taxis?
Hui Kong
Xiaohu Zhang
Jinhua Zhao
Applied Geography

(
2020
)
Ridesourcing services such as Uber, Lyft, and DiDi are purported to be more efficient than traditional taxis because they can match passengers with drivers more effectively. Previous studies have compared the efficiency of ridesourcing and taxis in several cities. However, gaps still exist regarding the measurement and comparison between the two modes, and the reasons for the higher efficiency of ridesourcing have not been empirically examined. This paper aims to measure, compare, and explain the efficiency and variation of DiDi and taxis. The case study is conducted in Chengdu, China. We use Vehicle occupancy rate (VOR) as the efficiency measure–the percentage of time that a vehicle is occupied by a fare-paying passenger. We measure the VORs of DiDi and taxis and their spatial and temporal variations using the trip origin-destination data from DiDi and the trajectory data for taxis. The VOR patterns between DiDi and taxis are compared and contrasted, and the underlying factors that affect the difference are examined: more efficient driver-rider matching algorithm, larger scale of ridesouricng services, and the number of taxi trips per capita. Results show that the overall VOR of DiDi is six percentage points higher than taxis on the weekday and 12 percentage points higher on the weekend. However, the VOR of taxis is slightly higher than DiDi during the weekday morning peak in downtown areas. Regression models reveal that the more efficient matching and the greater scale of DiDi drivers enlarge the VOR gap between DiDi and taxis, while the number of taxi trips per capita reduce the gap. The findings have implications for both business operation and transportation policies in terms of service design, service coordination, and location-specific regulations.
Unexpected Bus Operator Absence and Extraboard Scheduling – MBTA Case Study
Qingyi Wang
Haris Koutsopoulos
Nigel Wilson
Transportation Research Board 99th Annual Meeting

Washington, D.C.
,

(
2020
)
Improving service reliability and reducing cost have always been prioritized by transit agencies and workforce planning is related to both performance metrics. An important workforce planning function is the management of the extraboard operators who cover for absent drivers. Despite its importance, extraboard planning is an understudied area, in part due to the lack of detailed and reliable data. In this paper, using data from HASTUS Daily at the MBTA, we investigate open work caused by operator absence and how it affects extraboard scheduling. Using k-means clustering, the representative time-of-day absence profiles are identified, and a logistic regression model is estimated to classify each day into the identified clusters and predict the time-of-day absence distribution by combining clustered profiles and classification results. The daily total absent hours are modelled by negative binomial regression. An integer optimization program is formulated to analyze the impact of wrong predictions on scheduling. Key findings are: 1) Time-of-day absence patterns follow regular service schedules well. 2) There is a large variation in the number of extraboard operators needed from week to week, resulting in inherent inefficiencies. 3) Time-of-day profile alignment error is around 26% on average. 4) The average error in predicting daily total absent hours using negative binomial regression is around 22% (19h) for weekdays 32% (21h) for weekends. 5) Optimal extraboard assignment is much more sensitive to the total number of hours than the time-of-day distribution of absences.
Automated Information Extraction From Textual Data: Application In Transit Disruption Management
Peyman Noursalehi
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Board 99th Annual Meeting

Washington, D.C.
,

(
2020
)
Despite rapid advances in automated text processing, many related tasks in transit and other transportation agencies are still performed manually. For example, incident management reports are often manually processed and subsequently stored in a standardized format for later use. The information contained in such reports can be valuable for many reasons: identification of issues with response actions, underlying causes of each incident, impacts on the system, etc. In this paper, we develop a comprehensive, pragmatic automated framework for analyzing rail incident reports to support a wide range of applications and functions, depending on the constraints of the available data. The objectives are twofold: a) extract information that is required in the standard report forms (automation), and b) extract other useful content and insights from the unstructured text in the original report that would have otherwise been lost/ignored (knowledge discovery). The approach is demonstrated through a case study involving analysis of 23,728 records of general incidents in the London Underground (LU). The results show that it is possible to automatically extract delays, impacts on trains, mitigating strategies, underlying incident causes, and insights related to the potential actions and causes, as well as accurate classification of incidents into predefined categories.
Modeling Epidemic Spreading through Public Transit using Time-Varying Encounter Network
Baichuan Mo
Kairui Feng
Yu Shen
Clarence Tam
Daqing Li
Yafeng Yin
Jinhua Zhao
Transportation Research Part C

(
2020
)
Passenger contact in public transit (PT) networks can be a key mediate in the spreading of infectious diseases. This paper proposes a time-varying weighted PT encounter network to model the spreading of infectious diseases through the PT systems. Social activity contacts at both local and global levels are also considered. We select the epidemiological characteristics of coronavirus disease 2019 (COVID-19) as a case study along with smart card data from Singapore to illustrate the model at the metropolitan level. A scalable and lightweight theoretical framework is derived to capture the time-varying and heterogeneous network structures, which enables to solve the problem at the whole population level with low computational costs. Different control policies from both the public health side and the transportation side are evaluated. We find that people’s preventative behavior is one of the most effective measures to control the spreading of epidemics. From the transportation side, partial closure of bus routes helps to slow down but cannot fully contain the spreading of epidemics. Identifying ”influential passengers” using the smart card data and isolating them at an early stage can also effectively reduce the epidemic spreading.
Machine-learning-augmented analysis of textual data: application in transit disruption management
Peyman Noursalehi
Haris N Koutsopoulos
Jinhua Zhao
IEEE Open Journal of Intelligent Transportation Systems

(
2020
)
Despite rapid advances in automated text processing, many related tasks in transit and other transportation agencies are still performed manually. For example, incident management reports are often manually processed and subsequently stored in a standardized format for later use. The information contained in such reports can be valuable for many reasons: identification of issues with response actions, underlying causes of each incident, impacts on the system, etc. In this paper, we develop a comprehensive, pragmatic automated framework for analyzing rail incident reports to support a wide range of applications and functions, depending on the constraints of the available data. The objectives are twofold: a) extract information that is required in the standard report forms (automation), and b) extract other useful content and insights from the unstructured text in the original report that would have otherwise been lost/ignored (knowledge discovery). The approach is demonstrated through a case study involving an analysis of 23,728 records of general incidents in the London Underground (LU). The results show that it is possible to automatically extract delays, impacts on trains, mitigating strategies, underlying incident causes, and insights related to the potential actions and causes, as well as accurate classification of incidents into predefined categories.
Predictive decision support for real-time crowding prediction and information generation
Peyman Noursalehi
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Board 98th Annual Meeting

Washington, D.C.
,

(
2019
)
This paper proposes a predictive decision support platform for urban rail systems. It provides predictive information of crowding on trains and at stations. Additionally, it generates information on the expected likelihood of being able to board upcoming trains, which can be communicated to passengers. Using this information, passengers can make better-informed decisions as to which train to board.

The proposed decision support platform comprises two components: demand prediction, and an on-line simulation. The first module provides short-term (e.g., 15 minutes) prediction of the number of passengers arriving at each station and their destinations.

Subsequently, the on-line simulation models the interactions of trains and passengers for the duration of prediction horizon. It is assumed that the predictive crowding information are displayed at platforms, influencing passengers’ boarding decisions. The system incorporates the predicted passenger response to the information in its crowding predictions. Outputs from the simulation include the predicted crowding of trains and platforms during that time period, and expected number of passengers who will experience denied boarding.

The decision support platform was tested on a subset of the London Underground transit network. Aggregate automatic fare collection data were used for developing the predictive demand models. The results show the accuracy of denied boarding and platform crowding predictions. The value of providing train crowding information to the passengers waiting on platforms is also discussed. It is shown that as they become more responsive to the information, the number of left-behind passengers decrease.
Home-work Carpooling for Social Mixing
Federico Librino
Elena Renda
Giovanni Resta
Paolo Santi
Fabio Duarte
Carlo Ratti
Jinhua Zhao
Transportation

Washington, D.C.
,

(
2019
)
Shared mobility is widely recognized for its contribution in reducing carbon footprint, trafﬁc congestion, parking needs and transportation-related costs in urban and suburban areas. In this context, the use of carpooling in home-work commute is particularly appealing for its potential of lessening the number of cars and kilometers traveled, consequently reducing major causes of trafﬁc in cities. Accordingly, most of the carpooling algorithms are optimized for reducing total travel time, cost, and other transportation-related metrics. In this paper, we analyze carpooling from a new perspective, investigating the question of whether it can be used also as a tool to favor social integration, and to what extent social beneﬁts should be traded off with transportation efﬁciency. By incorporating traveler’s social characteristics into a recently introduced network-based approach to model ride-sharing opportunities, we deﬁne two social-related carpooling problems: how to maximize the number of rides shared between people belonging to different social groups, and how to maximize the amount of time people spend together along the ride. For each of the problems, we provide corresponding optimal and computationally efﬁcient solutions. We then demonstrate our approach on two datasets collected in the city of Pisa, Italy, and Cambridge, US, and quantify the potential social beneﬁts of carpooling, and how they can be traded off with traditional transportation-related metrics. When collectively considered, the models, algorithms, and results presented in this paper broaden the perspective from which carpooling problems are typically analyzed to encompass multiple disciplines including urban planning, public policy, and social sciences.
Demand Management of Congested Public Transport Systems: A Conceptual Framework and Application Using Smart Card Data
Anne Halvorsen
Haris Koutsopoulos
Zhenliang Ma
Jinhua Zhao
Transportation

(
2019
)
Transportation Demand Management (TDM), long used to reduce car traffic, is receiving attention among public transport operators as a means to reduce congestion in crowded public transportation systems. Though far less studied, a more structured approach to Public Transport Demand Management (PTDM) can help agencies make informed decisions on the combination of PTDM and infrastructure investments that best manage crowding. Automated fare collection (AFC) data, readily available in many public transport agencies, provide a unique platform to advance systematic approaches for the design and evaluation of PTDM strategies. The paper discusses the main steps for developing PTDM programs: a) problem identification and formulation of program goals; b) program design; c) evaluation; and d) monitoring. The problem identification phase examines bottlenecks in the system based on a spatiotemporal passenger flow analysis. The design phase identifies the main design parameters based on a categorization of potential interventions along spatial, temporal, modal, and targeted user group parameters. Evaluation takes place at the system, group, and individual levels, taking advantage of the detailed information obtained from smart card transaction data. The monitoring phase addresses the longterm sustainability of the intervention and informs potential changes to improve its effectiveness. A case study of a pre-peak fare discount policy in Hong Kong’s MTR network is used to illustrate the application of the various steps with focus on evaluation and analysis of the impacts from a behavioral point of view. Smart card data from before and after the implementation of the scheme from a panel of users was used to study policy-induced behavior shifts. A cluster analysis inferred customer groups relevant to the analysis based on their usage patterns. Users who shifted their behavior were identified based on a change point analysis and a logit model was estimated to identify the main factors that contribute to this change: the amount of time a user needed to shift his/her departure time, departure time variability, fare savings, and price sensitivity. User heterogeneity suggests that future incentives may be improved if they target specific groups.
Value of Demand Information in Autonomous Mobility-on-Demand Systems
Jian Wen
Neema Nassir
Jinhua Zhao
Transportation Research Part A

(
2019
)
Effective management of demand information is a critical factor in the successful operation of autonomous mobility-on-demand (AMoD) systems. This paper classifies, measures and evaluates the demand information for an AMoD system. First, the paper studies demand information at both individual and aggregate levels and measures two critical attributes: dynamism and granularity. We identify the trade-offs between both attributes during the data collection and information inference processes and discuss the compatibility of the AMoD dispatching algorithms with different types of information. Second, the paper assesses the value of demand information through agent-based simulation experiments with the actual road network and travel demand in a major European city, where we assume a single operator monopolizes the AMoD service in the case study area but competes with other transportation modes. The performance of the AMoD system is evaluated from the perspectives of travelers, AMoD operators, and transportation authority in terms of the overall system performance. The paper tests multiple scenarios, combining different information levels, information dynamism, and information granularity, as well as various fleet sizes. Results show that aggregate demand information leads to more served requests, shorter wait time and higher profit through effective rebalancing, especially when supply is high and demand information is spatially granular. Individual demand information from in-advance requests also improves the system performance, the degree of which depends on the spatial disparity of requests and their coupled service priority. By designing hailing policies accordingly, the operator is able to maximize the potential benefits. The paper concludes that the strategic trade-offs of demand information need to be made regarding the information level, information dynamism, and information granularity. It also offers a broader discussion on the benefits and costs of demand information for key stakeholders including the users, the operator, and the society.
Detecting Pattern Changes in Individual Travel Behavior: A Bayesian Approach
Zhan Zhao
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Part B

(
2018
)
Although stable in the short term, individual travel patterns are subject to changes in the long term. The ability to detect such changes is critical for developing behavior models that are adaptive over time. We define travel pattern change as "abrupt, substantial, and persistent changes in the underlying pattern of travel behavior" and develop a methodology to detect such changes in individual travel patterns. We specify one distribution for each of the three dimensions of travel behavior (the frequency of travel, time of travel, and origins/destinations), and interpret the change of the parameters of the distributions as indicating the occurrence of the pattern change. A Bayesian method is developed to estimate the probability that a pattern change occurs at any given time for each behavior dimension. The proposed methodology is tested using pseudonymized smart card records of 3,210 users from London, U.K. over two years. The results show that the method can successfully identify significant changepoints in travel patterns. Compared to the traditional generalized likelihood ratio (GLR) approach, the Bayesian method requires less predefined parameters and is more robust. The methodology presented in this paper is generalizable and can be applied to detect changes in other aspects of travel behavior and human behavior in general.
Individual mobility prediction using transit smart card data
Zhan Zhao
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Part C

89
,

(
2018
)
For intelligent urban transportation systems, the ability to predict individual mobility is crucial for personalized traveler information, targeted demand management, and dynamic system operations. Whereas existing methods focus on predicting the next location of users, little is known regarding the prediction of the next trip. The paper develops a methodology for predicting daily individual mobility represented as a chain of trips (including the null set, no travel), each defined as a combination of the trip start time t, origin o, and destination d. To predict individual mobility, we first predict whether the user will travel (trip making prediction), and then, if so, predict the attributes of the next trip (t, o, d) (trip attribute prediction). Each of the two problems can be further decomposed into two subproblems based on the triggering event. For trip attribute prediction, we propose a new model, based on the Bayesian n-gram model used in language modeling, to estimate the probability distribution of the next trip conditional on the previous one. The proposed methodology is tested using the pseudonymized transit smart card records from more than 10,000 users in London, U.K. over two years. Based on regularized logistic regression, our trip making prediction models achieve median accuracy levels of over 80%. The prediction accuracy for trip attributes varies by the attribute considered—around 40% for t, 70-80% for o and 60-70% for d. Relatively, the first trip of the day is more difficult to predict. Significant variations are found across individuals in terms of the model performance, implying diverse travel behavior patterns.
Identifying Hidden Visits from Sparse Call Detail Record Data
Zhan Zhao
Haris Koutsopoulos
Jinhua Zhao
Working paper

(
2018
)
Despite a large body of literature on trip inference using call detail record (CDR) data, a fundamental understanding of their limitations is lacking. In particular, because of the sparse nature of CDR data, users may travel to a location without being revealed in the data, which we refer to as hidden visits. The existence of hidden visits hinders our ability to extract reliable information about human mobility and travel behavior from CDR data. In this study, we propose a data fusion approach to obtain labeled data for statistical inference of hidden visits. In the absence of complementary data, this can be accomplished by extracting labeled observations from more granular cellular data access records, and extracting features from voice call and text messaging records. The proposed approach is demonstrated using a real-world CDR dataset of 3 million users from a large Chinese city. Logistic regression, support vector machine, and AdaBoost are used to develop classication models for hidden visit inference, and the test results show signicant improvement over the naive no-hidden-visit rule, which is an implicit assumption adopted by most existing studies. Based on the proposed model, we estimate that over 10% of the displacements extracted from CDR data involve hidden visits. The proposed data fusion method oers a systematic statistical approach to inferring individual mobility patterns based on telecommunication records. The method can be generalized to t other types of large-scale data as well.
Real time transit demand prediction capturing station interactions and impact of special events
Peyman Noursalehi
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Part C

(
2018
)
Demand for public transportation is highly affected by passengers’ experience and the level of service provided. Thus, it is vital for transit agencies to deploy adaptive strategies to respond to changes in demand or supply in a timely manner, and prevent unwanted deterioration in service quality. In this paper, a real time prediction methodology, based on univariate and multivariate state-space models, is developed to predict the short-term passenger arrivals at transit stations. A univariate state-space model is developed at the station level. Through a hierarchical clustering algorithm with correlation distance, stations with similar demand patterns are identified. A dynamic factor model is proposed for each cluster, capturing station interdependencies through a set of common factors. Both approaches can model the effect of exogenous events (such as football games). Ensemble predictions are then obtained by combining the outputs from the two models, based on their respective accuracy. We evaluate these models using data from the 32 stations on the Central line of the London Underground (LU), operated by Transport for London (TfL). The results indicate that the proposed methodology performs well in predicting short-term station arrivals for the set of test days. For most stations, ensemble prediction has the lowest mean error, as well as the smallest range of error, and exhibits more robust performance across the test days.
Measuring Regularity of Individual Travel Patterns
Gabriel Goulet-Langlois
Haris Koutsopoulos
Zhan Zhao
Jinhua Zhao
IEEE Transactions on Intelligent Transportation Systems

(
2017
)
Regularity is an important property of individual travel behavior, and the ability to measure it enables advances in behavior modeling, mobility prediction, and customer analytics. In this paper, we propose a methodology to measure travel behavior regularity based on the order in which trips or activities are organized. We represent individuals’ travel over multiple days as sequences of “travel events”—discrete and repeatable behavior units explicitly defined based on the research question and the available data. We then present a metric of regularity based on entropy rate, which is sensitive to both the frequency of travel events and the order in which they occur. The methodology is demonstrated using a large sample of transit smart card transaction records from London, UK. The entropy rate is estimated with a procedure based on the Burrows-Wheeler transform. The results confirm that the order of travel events is an essential component of regularity in travel behavior. They also demonstrate that the proposed measure of regularity captures both conventional patterns and atypical routine patterns that are regular but not matched to the 9-to-5 working day or working week. Unlike existing measures of regularity, our approach is agnostic to calendar definitions and makes no assumptions regarding periodicity of travel behavior. The proposed methodology is flexible and can be adapted to study other aspects of individual mobility using different data sources.
Mobility as A Language: Predicting Individual Mobility In Public Transportation Using N-Gram Models
Zhan Zhao
Haris Koutsopoulos
Jinhua Zhao
Transportation Research Board 96th Annual Meeting

(
2017
)
For public transportation agencies, the ability to provide personalized and dynamic passenger information is crucial for improving the efficiency of demand management and enhancing customer experience. This requires understanding and especially predicting individual travel behavior in the public transportation system, which is challenging because of the heterogeneity among passengers and the variability of their behaviors. This paper presents, to the best of our knowledge, the first attempt to predict individual spatiotemporal behavior of public transportation passengers using smartcard data. In this study, each trip is coded as a combination of trip start time, an entry station and an exit station. A passenger’s daily mobility is represented as a chain of travel decisions. We propose a new modeling framework, inspired by Bayesian n-gram models used in natural language processing, to estimate the probability distribution of the next decision in the sequence. Empirical analysis using Oyster card data from London shows promising results. It is found that the exact time of travel is most challenging to predict, but the difference between the predicted time and the true value is usually small. Model performance varies greatly across individuals for the prediction of entry and particularly exit stations. Overall, our proposed model shows significant improvement over the regular n-gram models, or Markov chain-based models in general. The improvement is even larger for weekend trips when travel behavior is flexible, irregular, and considerably less predictable.
Incorporating Mobile Activity Tracking Data In A Transit Agency: Collecting, Comparing, And Trip Mode Inference
Tim Scully
John Attanucci
Jinhua Zhao
Transportation Research Board 96th Annual Meeting

(
2017
)
The near ubiquity of smartphones has the potential to transform how researchers, companies, and public transit agencies understand travel behavior. This research analyzes how an emerging class of automatically-collected data based on smartphone GPS and sensor information – referred to here as mobile activity-tracking data – can be used in a transit agency to better understand travel behavior. Through a collaboration with Transport for London, multiple weeks of mobile activity-tracking data of London residents was collected between 2015 and 2016 using an application called Moves. Using this case study, this paper discusses the benefits of this new data and how it compares with other data at TfL and elsewhere and examines the process of collecting the data.

Using the resulting data, this paper then compares the resulting trip records from the mobile activity tracking data with those form the automatic fare card data collected during the same period and same individuals. By comparing mobile activity tracking with an established, well-researched data source like AFC, we observe that while the trip match rate between the two data sources is high (68%) but not perfect. Next, the paper proposes a probabilistic framework to identify between motorized trip modes using mobile activity tracking data and and the public transit network. Specifically, the model uses both spatial characteristics, such as distance to public transit network, and trip characteristics such as speed in order to identify the trip mode as bus, rail, subway, or non-public transit. Using logistic regression, classification tree, and random forest, this model achieves an accuracy of 90%, 91%, and 92% respectively.
Enabling Transit Service Quality Co-monitoring Through a Smartphone-Based Platform
Corinna Li
Christopher Zegras
Fang Zhao
Zhengquan Qin
Ayesha Shahid
Moshe Ben-Akiva
Francisco Pereira
Jinhua Zhao
Transportation Research Record: Journal of the Transportation Research Board

(
2017
)
The growing ubiquity of smartphones offers public transit agencies an opportunity to transform ways to measure, monitor, and manage service performance. We demonstrate the potential in a new tool for actively engaging customers in measuring satisfaction and co-monitoring bus service quality. The pilot initiative adapted a smartphone-based travel survey system, Future Mobility Sensing (FMS), to collect real-time customer feedback and objective operational measurements on specific bus trips. The system uses a combination of GPS, Wi-Fi, Bluetooth, and cellphone accelerometer data to track transit trips, while soliciting users’ feedback on trip experience. While not necessarily intended to replace traditional monitoring channels and processes, these data can complement official performance monitoring through a more customer-centric perspective in relative real-time. The pilot operated publicly for three months on Boston’s Silver Line (SL) bus rapid transit, in collaboration with the Massachusetts Bay Transportation Authority (MBTA). Seventy-six participants completed the entrance survey, half of whom actively participated, completing over 500 questionnaires while on board, at the end of a trip and/or at the end of a day. Participation was biased towards frequent SL users, who were majority White and of higher income. Indicative models of user reported satisfaction reveal some interesting relationships, but the models can be improved by fusing the app-collected data with performance characteristics obtained through the automatic vehicle location system. Broader and more sustained user engagement remains a critical future challenge.
Uncertainty in Bus Arrival Time Predictions: Treating Heteroscedasticity With a Metamodel Approach
Aidan O’Sullivan
Francisco Pereira
Jinhua Zhao
Harilaos Koutsopoulos
IEEE Transactions on Intelligent Transportation Systems

(
2016
)
Arrival time predictions for the next available bus or train are a key component of modern Traveller Information Systems (TIS). A great deal of research has been conducted within the ITS community developing an assortment of different algorithms that seek to increase the accuracy of these predictions. However, the inherent stochastic and non-linear nature of these systems, particularly in the case of bus transport, means that these predictions suffer from variable sources of error, stemming from variations in weather conditions, bus bunching and numerous other sources. In this paper we tackle the issue of uncertainty in bus arrival time predictions using an alternative approach. Rather than endeavour to develop a superior method for prediction we take existing predictions from a TIS and treat the algorithm generating them as a black box. The presence of heteroscedasticity in the predictions is demonstrated and then a meta-model approach deployed that augments existing predictive systems using quantile regression to place bounds on the associated error. As a case study this approach is applied to data from a real-world TIS in Boston. This method allows bounds on the predicted arrival time to be estimated, which give a measure of the uncertainty associated with the individual predictions. This represents to the best of our knowledge the first application of methods to handle the uncertainty in bus arrival times that explicitly takes into account the inherent heteroscedasticity. The meta-model approach is agnostic to the process generating the predictions which ensures the methodology is implementable in any system.
Individual-Level Trip Detection using Sparse Call Detail Record Data based on Supervised Statistical Learning
Zhan Zhao
Jinhua Zhao
Haris Koutsopoulos
Transportation Research Board 95th Annual Meeting

(
2016
)
Despite a large body of literature related to trip detection using Call Detail Record (CDR) data, the fundamental understanding of the limitations of the data is lacking and, particularly, its sparse nature is not well addressed in existing work. This paper develops a conceptual framework to make explicit distinction between telecommunication patterns captured by CDRs and travel patterns that are of interest to the transportation community. Motivated by the over-reliance of existing trip detection methodology on heuristics and assumptions, the authors propose to use data fusion to form labeled data for supervised statistical learning. In the absence of complementary data, this can be done by extracting labeled observation from more granular cellular data access records and extracting feature vectors from voice-call and SMS records. The proposed approach is demonstrated, using real-word CDR data from a Chinese city, through inferring whether there exists a hidden visit between two consecutive visits observed from CDR data. Logistic regression, support vector machine (SVM) and artificial neural network (ANN) are used to develop statistical classification models, and all show significant improvement over the naïve rule that assumes no hidden visit. This study provides a deeper understanding on how the authors can, and should, extract trips in human mobility from CDRs in telecommunication. The proposed data fusion approach offers a flexible and systematic way to make inference of individual mobility patterns, even when only CDR data is available.
Supervised Statistical Learning for Individual Level Trip Detection using Sparse Call Detail Record Data
Zhan Zhao
Haris Koutsopoulos
Jinhua Zhao
95th Transportation Research Board Annual Meeting

Washington, D.C.
,

(
2016
)
Despite a large body of literature related to trip detection using Call Detail Record (CDR) data, the fundamental understanding of the limitations of the data is lacking and, particularly, its sparse nature is not well addressed in existing work. This paper proposes a conceptual framework to make explicit distinction between telecommunication patterns captured by CDRs and travel patterns that are of interest to the transportation community. A process is proposed to extract trips from CDRs at the individual level. Literature review reveals that there is a lack of statistical approaches for trip detection beyond simple heuristic-based decision rules. Statistical modeling is a suitable approach to reduce bias, increase robustness, and make probabilistic estimation. In order to perform supervised statistical learning, we propose to use data fusion to form labeled data for model training. In the absence of complementary data with more frequent observations (such as GPS data), which is often the case, this can be done by extracting labeled observation from more granular cellular data access records and extracting feature vectors from voice-call and SMS records. The proposed approach is demonstrated, using real-word CDR data from a Chinese city, through inferring whether there exists a hidden visit between two consecutive visits we can directly observe from CDR data. The model results show significant improvement in terms of accurately identifying hidden visits compared to the naïve rule. The model performance can be further improved by mitigating signal noise, extracting more powerful features and accounting for individual heterogeneity. This study provides a deeper understanding on the mapping between CDRs in telecommunication and trips in human mobility, and how we can, and should, extract the later from the former. The proposed data fusion approach offers a flexible and systematic way to make inference of individual mobility patterns, even when only CDR data is available.
Clustering the Multi-week Activity Sequences of Public Transport Users
Gabriel Goulet Langlois
Haris Koutsopoulos
Jinhua Zhao
95th Transportation Research Board Annual Meeting

Washington, D.C.
,

(
2016
)
The public transport networks of dense cities such as London serve passengers with widely dierent travel patterns. In line with the diverse lives of urban dwellers, activities and journeys are combined within days and across days in diverse sequences. From personalized customer information, to improved travel demand models, understanding this type of heterogeneity among transit users is relevant to an number of applications core to public transport agencies' function. In this study, passenger heterogeneity is investigated based on a longitudinal representation of each user's multi-week activity sequence derived from smart card data. We propose a methodology leveraging this representation to identify clusters of users with similar activity sequence structure. The methodology is applied to a large sample from London's public transport network, in which each passenger is represented by a continuous 4-week activity sequence. The application reveals 11 clusters, each characterized by a distinct sequence structure. Socio-demographic information available for a small sample of users is combined to smart card transactions to analyze associations between the identied patterns and demographic attributes including passenger age, occupation, household composition and income, and vehicle ownership. The analysis reveals that signicant connections exist between the demographic attributes of users and activity patterns identied exclusively from fare transactions.
FMS-TQ: Combining Smartphone and iBeacon 4 Technologies in A Transit Quality Survey
Corinna Li
Christopher Zegras
Fang Zhao
Francisco Pereira
Kalan Vishwanath Nawarathne
Zhengquan Qin
Moshe Ben-Akiva
Jinhua Zhao
95th Transportation Research Board Annual Meeting

Washington, D.C.
,

(
2016
)
The Internet of Things (IoT) will offer transit agencies an opportunity to transform ways to measure, monitor, and manage performance. We demonstrate the potential value of two combined technologies, smartphones and iBeacons, for actively engaging customers in measuring satisfaction and co-monitoring bus service quality. Specifically, we adapt our smartphone-based survey system, Future Mobility Sensing (FMS), to connect with iBeacons for an event-driven approach to measure user-reported satisfaction before (i.e. at the stop), during (i.e., while traveling), and after (reflectively) transit trips. The system collects a combination of sensor (GPS, WiFi, GSM and accelerometer) data to track transit trips, while soliciting users’ feedback on trip experience with in-app pop-up surveys. Both bus trip data and passenger feedback are collected and uploaded onto the server at the end of each day. These data are not intended to replace traditional monitoring channels and processes, but, rather, they complement official performance monitoring through a more customer-centric perspective in relative real time. The paper presents the theoretical foundations, describes a pilot implementation of the platform in Singapore, and discusses preliminary results that demonstrate technical feasibility.