Machine Learning for Transportation

JTL’s machine learning cluster focuses on using novel machine-learning perspectives to understand travel behavior and solve transportation challenges. Moving beyond the traditional approach of using discrete choice models (DCM), we use deep neural network (DNN) to predict individual trip-making decisions and to detect changes in travel patterns.

Our studies harness insights from DCM to enrich DNN models to achieve both high predictability and interpretability. Since travel behavior is often uncertain, we model them through the synthesis of prospect theory and DNN. To examine sequential decision making under uncertainty, we apply dynamic programming and reinforcement learning algorithms. For example, we use these approaches to develop methods to rebalance fleets and develop optimal dynamic pricing for shared ride-hailing services.

Moreover, as activity patterns are important underlying factors for travel behavior, but only latently revealed in travel data, in several studies, we use graphical models and unsupervised learning methods to detect changes in activity patterns, with the goal of understanding the impacts of transit fare changes on rider groups.

  • In last-mile delivery, drivers frequently deviate from planned delivery routes because of their tacit knowledge of the road and curbside infrastructure, customer availability, and other characteristics of the respective service areas. Hence, the actual stop sequences chosen by an experienced human driver may be potentially preferable to the theoretical shortest-distance routing under real-life operational conditions. Thus, being able to predict the actual stop sequence that a human driver would follow can help to improve route planning in last-mile delivery. This paper proposes a pair-wise attention-based pointer neural network for this prediction task using drivers’ historical delivery trajectory data. In addition to the commonly used encoder–decoder architecture for sequence-to-sequence prediction, we propose a new attention mechanism based on an alternative specific neural network to capture the local pair-wise information for each pair of stops. To further capture the global efficiency of the route, we propose a new iterative sequence generation algorithm that is used after model training to identify the first stop of a route that yields the lowest operational cost. Results from an extensive case study on real operational data from Amazon’s last-mile delivery operations in the US show that our proposed method can significantly outperform traditional optimization-based approaches and other machine learning methods (such as the Long Short-Term Memory encoder–decoder and the original pointer network) in finding stop sequences that are closer to high-quality routes executed by experienced drivers in the field. Compared to benchmark models, the proposed model can increase the average prediction accuracy of the first four stops from around 0.229 to 0.312, and reduce the disparity between the predicted route and the actual route by around 15%.

  • Although researchers increasingly adopt machine learning to model travel behavior, they predominantly focus on prediction accuracy, ignoring the ethical challenges embedded in machine learning algorithms. This study introduces an important missing dimension – computational fairness – to travel behavior analysis. It highlights the accuracy-fairness tradeoff instead of the single dimensional focus on prediction accuracy in the contexts of deep neural network (DNN) and discrete choice models (DCM). We first operationalize computational fairness by equality of opportunity, then differentiate between the bias inherent in data and the bias introduced by modeling. The models inheriting the inherent biases can risk perpetuating the existing inequality in the data structure, and the biases in modeling can further exacerbate it. We then demonstrate the prediction disparities in travel behavior modeling using the 2017 National Household Travel Survey (NHTS) and the 2018-2019 My Daily Travel Survey in Chicago. Empirically, DNN and DCM reveal consistent prediction disparities across multiple social groups: both over-predict the false negative rate of frequent driving for the ethnic minorities, the low-income and the disabled populations, and falsely predict a higher travel burden of the socially disadvantaged groups and the rural populations than reality. Comparing DNN with DCM, we find that DNN can outperform DCM in prediction disparities because of DNN’s smaller misspecification error. To mitigate prediction disparities, this study introduces an absolute correlation regularization method, which is evaluated with synthetic and real-world data. The results demonstrate the prevalence of prediction disparities in travel behavior modeling, and the disparities still persist regarding a variety of model specifics such as the number of DNN layers, batch size and weight initialization. Since these prediction disparities can exacerbate social inequity if prediction results without fairness adjustment are used for transportation policy making, we advocate for careful consideration of the fairness problem in travel behavior modeling, and the use of bias mitigation algorithms for fair transport decisions.

  • Although researchers increasingly use deep neural networks (DNN) to analyze individual choices, overfitting and interpretability issues remain obstacles in theory and practice. This study presents a statistical learning theoretical framework to examine the tradeoff between estimation and approximation errors, and between the quality of prediction and of interpretation. It provides an upper bound on the estimation error of the prediction quality in DNN, measured by zero-one and log losses, shedding light on why DNN models do not overfit. It proposes a metric for interpretation quality by formulating a function approximation loss that measures thedifference between true and estimated choice probability functions. It argues that the binary logit (BNL) and multinomial logit (MNL) models are the specific cases in the model family of DNNs, since the latter always has smaller approximation errors. We explore the relative performance of DNN and classical choice models through three simulation scenarios comparing DNN, BNL, and binary mixed logit models (BXL), as well as one experiment comparing DNN toBNL, BXL, MNL, and mixed logit (MXL) in analyzing the choice of trip purposes based on theNational Household Travel Survey 2017. The results indicate that DNN can be used for choiceanalysis beyond the current practice of demand forecasting because it has the inherent utility interpretation and the power of automatically learning utility specification. Our results suggest DNN outperforms BNL, BXL, MNL, and MXL models in both prediction and interpretation when the sample size is large (≥O(104)), the input dimension is high, or the true data generating process is complex, while performing worse when the opposite is true. DNN outperformsBNL and BXL in zero-one, log, and approximation losses for most of the experiments, and the larger sample size leads to greater incremental value of using DNN over classical discrete choice models. Overall, this study introduces the statistical learning theory as a new foundation for high-dimensional data, complex statistical models, and non-asymptotic data regimes in choice analysis and the experiments show the effective prediction and interpretation of DNN for its applications to policy and behavioral analysis.

  • Researchers often treat data-driven and theory-driven models as two disparate or even conflicting methods in travel behavior analysis. However, the two methods are highly complementary because data-driven methods are more predictive but less interpretable and robust, while theory-driven methods are more interpretable and robust but less predictive. Using their complementary nature, this study designs a theory-based residual neural network (TB-ResNet) framework, which synergizes discrete choice models (DCMs) and deep neural networks (DNNs) based on their shared utility interpretation. The TB-ResNet framework is simple, as it uses a (δ, 1-δ) weighting to take advantage of DCMs' simplicity and DNNs' richness, and to prevent underfitting from the DCMs and overfitting from the DNNs. This framework is also flexible: three instances of TB-ResNets are designed based on multinomial logit model (MNL-ResNets), prospect theory (PT-ResNets), and hyperbolic discounting (HDResNets), which are tested on three data sets. Compared to pure DCMs, the TBResNets provide greater prediction accuracy and reveal a richer set of behavioral mechanisms owing to the utility function augmented by the DNN component in the TBResNets. Compared to pure DNNs, the TB-ResNets can modestly improve prediction and significantly improve interpretation and robustness, because the DCM component in the TB-ResNets stabilizes the utility functions and input gradients. Overall, this study demonstrates that it is both feasible and desirable to synergize DCMs and DNNs by combining their utility specifications under a TB-ResNet framework. Although some limitations remain, this TB-ResNet framework is an important first step to create mutual benefits between DCMs and DNNs for travel behavior modeling, with joint improvement in prediction, interpretation, and robustness.

  • Short-term demand predictions, typically defined as less than an hour into the future, are essential for implementing dynamic control strategies and providing useful customer infor- mation in transit applications. Knowing the expected demand enables transit operators to deploy real-time control strategies in advance of the demand surge, and minimize the impact of abnormalities on the service quality and passenger experience. One of the most useful applications of demand prediction models in transit is in predicting the congestion on station platforms and crowding on vehicles. These require information about the origin- destination (OD) demand, providing a detailed profile of how and when passengers enter and exit the service. However, existing work in the literature is limited and overwhelmingly focuses on forecasting passenger arrivals at stations. This information, while useful, is incomplete for many practical applications. We address this gap by developing a scalable methodology for real-time, short-term OD demand prediction in transit systems. Our proposed model consists of three modules: multi-resolution spatial feature extraction module for capturing the local spatial dependencies with a channel-wise attention block, auxiliary information encoding module (AIE) for encoding the exogenous information, and a module for capturing the temporal evolution of demand. The OD demand at time t, represented as a N × N matrix, is processed in two separate branches. In one branch we use the discrete wavelet transform (DWT) to decompose the demand into its different time and frequency variations, detecting patterns that are not visible in the raw data. In the other, three convolutional neural network (CNN) layers are utilized to learn the spatial dependencies from the OD demand directly. Instead of treating each channel of the resultant transformation equally, we use a squeeze-and-excitation layer to weight feature maps based on their contribution to the final prediction. A Convolutional Long Short-term Memory network (ConvLSTM) is then used to capture the temporal evolution of demand. The approach is demonstrated through a case study using 2 months of Automated Fare Collection (AFC) data from the Hong Kong Mass Transit Railway (MTR) system. The extensive evaluation of the model shows the superiority of our proposed model compared to the other compared methods.

  • Induction_bus_and_car

    While deep neural networks (DNNs) have been increasingly applied to choice analysis showing high predictive power, it is unclear to what extent researchers can interpret economic information from DNNs. This paper demonstrates that DNNs can provide economic information as complete as classical discrete choice models (DCMs). The economic information includes choice predictions, choice probabilities, market shares, substitution patterns of alternatives, social welfare, probability derivatives, elasticities, marginal rates of substitution, and heterogeneous values of time. Unlike DCMs, DNNs can automatically learn utility functions and reveal behavioral patterns that are not prespecified by domain experts, particularly when the sample size is large. However, the economic information obtained from DNNs can be unreliable when the sample size is small, because of three challenges associated with the automatic learning capacity: high sensitivity to hyperparameters, model non-identification, and local irregularity. The first challenge is related to the statistical challenge of balancing approximation and estimation errors of DNNs, the second to the optimization challenge of identifying the global optimum in the DNN training, and the third to the robustness challenge of mitigating locally irregular patterns of estimated functions. To demonstrate the strength and challenges, we estimated the DNNs using a stated preference survey from Singapore and a revealed preference data from London, extracted the full list of economic information from the DNNs, and compared them with those from the DCMs. We found that the economic information either aggregated over trainings or population is more reliable than the disaggregate information of the individual observations or training, and that larger sample size, hyperparameter searching, model ensemble, and effective regularization can significantly improve the reliability of the economic information extracted from the DNNs. Future studies should investigate the requirement of sample size, better ensemble mechanisms, other regularization and DNN architectures, better optimization algorithms, and robust DNN training methods to address DNNs’ three challenges to provide more reliable economic information for DNN-based choice models.

  • Induction_bus_and_car

    Whereas deep neural network (DNN) is increasingly applied to choice analysis, it is challenging to reconcile domain-specific behavioral knowledge with generic-purpose DNN, to improve DNN’s interpretability and predictive power, and to identify effective regularization methods for specific tasks. To address these challenges, this study demonstrates the use of behavioral knowledge for designing a particular DNN architecture with alternative-specific utility functions (ASU-DNN) and thereby improving both the predictive power and interpretability. Unlike a fully connected DNN (F-DNN), which computes the utility value of an alternative k by using the attributes of all the alternatives, ASU-DNN computes it by using only k's own attributes. Theoretically, ASU-DNN can substantially reduce the estimation error of F-DNN because of its lighter architecture and sparser connectivity, although the constraint of alternative-specific utility can cause ASU-DNN to exhibit a larger approximation error. Empirically, ASU-DNN has 2-3% higher prediction accuracy than F-DNN over the whole hyperparameter space in a private dataset collected in Singapore and a public dataset available in the R mlogit package. The alternative-specific connectivity is associated with the independence of irrelevant alternative (IIA) constraint, which as a domain-knowledge-based regularization method is more effective than the most popular generic-purpose explicit and implicit regularization methods and architectural hyperparameters. ASU-DNN provides a more regular substitution pattern of travel mode choices than F-DNN does, rendering ASU-DNN more interpretable. The comparison between ASU-DNN and F-DNN also aids in testing behavioral knowledge. Our results reveal that individuals are more likely to compute utility by using an alternative’s own attributes, supporting the long-standing practice in choice modeling. Overall, this study demonstrates that behavioral knowledge can guide the architecture design of DNN, function as an effective domain-knowledge-based regularization method, and improve both the interpretability and predictive power of DNN in choice analysis. Future studies can explore the generalizability of ASU-DNN and other possibilities of using utility theory to design DNN architectures.

  • Induction_bus_and_car

    It is an enduring question how to combine revealed preference (RP) and stated preference (SP) data to analyze individual choices. While the nested logit (NL) model is the classical way to address the question, this study presents multitask learning deep neural networks (MTLDNNs) as an alternative framework, and discusses its theoretical foundation, empirical performance, and behavioral intuition. We first demonstrate that the MTLDNNs are theoretically more general than the NL models because of MTLDNNs' automatic feature learning, flexible regularizations, and diverse architectures. By analyzing the adoption of autonomous vehicles (AVs), we illustrate that the MTLDNNs outperform the NL models in terms of prediction accuracy but underperform in terms of cross-entropy losses. To interpret the MTLDNNs, we compute the elasticities and visualize the relationship between choice probabilities and input variables. The MTLDNNs reveal that AVs mainly substitute driving and ride hailing, and that the variables specific to AVs are more important than the socio-economic variables in determining AV adoption. Overall, this work demonstrates that MTLDNNs are theoretically appealing in leveraging the information shared by RP and SP and capable of revealing meaningful behavioral patterns, although its performance gain over the classical NL model is still limited. To improve upon this work, future studies can investigate the inconsistency between prediction accuracy and cross-entropy losses, novel MTLDNN architectures, regularization design for the RP-SP question, MTLDNN applications to other choice scenarios, and deeper theoretical connections between choice models and the MTLDNN framework.

  • Despite rapid advances in automated text processing, many related tasks in transit and other transportation agencies are still performed manually. For example, incident management reports are often manually processed and subsequently stored in a standardized format for later use. The information contained in such reports can be valuable for many reasons: identification of issues with response actions, underlying causes of each incident, impacts on the system, etc. In this paper, we develop a comprehensive, pragmatic automated framework for analyzing rail incident reports to support a wide range of applications and functions, depending on the constraints of the available data. The objectives are twofold: a) extract information that is required in the standard report forms (automation), and b) extract other useful content and insights from the unstructured text in the original report that would have otherwise been lost/ignored (knowledge discovery). The approach is demonstrated through a case study involving an analysis of 23,728 records of general incidents in the London Underground (LU). The results show that it is possible to automatically extract delays, impacts on trains, mitigating strategies, underlying incident causes, and insights related to the potential actions and causes, as well as accurate classification of incidents into predefined categories.

  • Induction_bus_and_car

    Although automatically collected human travel records can accurately capture the time and location of human
    movements, they do not directly explain the hidden semantic structures behind the data, e.g., activity types. This work
    proposes a probabilistic topic model, adapted from Latent Dirichlet Allocation (LDA), to discover representative and
    interpretable activity categorization from individual-level spatiotemporal data in an unsupervised manner. Specifically,
    the activity-travel episodes of an individual user are treated as words in a document, and each topic is a distribution
    over space and time that corresponds to certain type of activity. The model accounts for a mixture of discrete and
    continuous attributes—the location, start time of day, start day of week, and duration of each activity episode. The
    proposed methodology is demonstrated using pseudonymized transit smart card data from London, U.K. The results
    show that the model can successfully distinguish the three most basic types of activities—home, work, and other. As
    the specified number of activity categories increases, more specific subpatterns for home and work emerge, and both
    the goodness of fit and predictive performance for travel behavior improve. This work makes it possible to enrich
    human mobility data with representative and interpretable activity patterns without relying on predefined activity categories or heuristic rules. 

  • Researchers are applying a large number of machine learning (ML) classifiers to predict travel behavior, but the results are data-specific and the selection of ML classifiers is author-specific. To obtain generalizable results, this paper provides an empirical benchmark by using 86 classifiers from 14 model families to predict the travel mode choice based on the National Household Travel Survey (NHTS) 2017 dataset. The 86 ML classifiers from 14 model families incorporate all the important ML classifiers discussed in previous studies. The large number of observations (about 800,000) in the NHTS2017 dataset enables us to analyze the effect of different sample sizes as a meta-dimension on prediction accuracy. We found that ensemble models, including boosting, bagging, and random forests, perform the best among all the classifiers, and that deep neural networks (DNNs) perform the best among all the non- ensemble models. Classical discrete choice models (DCMs) only predict at the medium or relatively low range of prediction accuracy among all the models. Particularly, mixed logit model cannot be trained in a reasonable amount of time owing to its computational difficulty in sampling. Larger sample size generally leads to higher prediction accuracy, particularly for the models with high model complexity. Overall, this study provides an empirical benchmark result for the future, and future studies can build upon our results by testing more ML classifiers on the same NHTS2017 dataset, thus yielding more comparable, replicable, and generalizable knowledge shared by the whole research community.

  • Induction_bus_and_car

    While researchers increasingly use deep neural networks (DNN) to analyze individual choices, overfitting and interpretability issues remain as obstacles in theory and practice. By using statistical learning theory, this study presents a framework to examine the tradeoff between estimation and approximation errors, and between prediction and interpretation losses. It operationalizes the DNN interpretability in the choice analysis by formulating the metrics of interpretation loss as the difference between true and estimated choice probability functions. This study also uses the statistical learning theory to upper bound the estimation error of both prediction and interpretation losses in DNN, shedding light on why DNN does not have the overfitting issue. Three scenarios are then simulated to compare DNN to binary logit model (BNL). We found that DNN outperforms BNL in terms of both prediction and interpretation for most of the scenarios, and larger sample size unleashes the predictive power of DNN but not BNL. DNN is also used to analyze the choice of trip purposes and travel modes based on the National Household Travel Survey 2017 (NHTS2017) dataset. These experiments indicate that DNN can be used for choice analysis beyond the current practice of demand forecasting because it has the inherent utility interpretation, the flexibility of accommodating various information formats, and the power of automatically learning utility specification. DNN is both more predictive and interpretable than BNL unless the modelers have complete knowledge about the choice task, and the sample size is small. Overall, statistical learning theory can be a foundation for future studies in the non-asymptotic data regime or using high-dimensional statistical models in choice analysis, and the experiments show the feasibility and effectiveness of DNN for its wide applications to policy and behavioral analysis.

  • Induction_bus_and_car

    Dynamic Pricing in Shared Mobility on Demand Service and its Social Impacts

    Transportation Research Board 97th Annual Meeting
    Washington, D.C.
    ,
    (
    2018
    )

    The authors consider a daily-level profit maximization of a shared mobility on-demand (MoD) service with request-level control. The authors use discrete choice models to describe traveler behavior, apply the assortment and price optimization framework to model the request-level dynamics, and leverage insights from dynamic programming to develop daily-level optimization problem. The authors solve this problem by designing parametric rollout policy and utilizing Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to search for optimal parameter. The authors evaluate their algorithm with a case study in Langfang, China. The authors develop a simulation system for both the MoD service operations and the city transportation dynamics, and design scenarios with varying supply size, demand size, congestion level, and fare structure. In this case study, the optimal pricing strategy generates considerably more profit than basic strategies (those without assortment or dynamic pricing) and myopic strategies (dynamic pricing at each request level), but it increases the congestion level and reduces the capacity in the transportation system.

  • Induction_bus_and_car

    Rebalancing Shared Mobility-on-Demand Systems: A Reinforcement Learning Approach

    Transportation Research Board 97th Annual Meeting
    Washington, D.C.
    ,
    (
    2018
    )

    Shared mobility-on-demand systems have very promising prospects in making urban transportation efficient and affordable. However, due to operational challenges among others, many mobility applications still remain niche products. This paper addresses rebalancing needs that are critical for effective fleet management in order to offset the inevitable imbalance of vehicle supply and travel demand. Specifically, the authors propose a reinforcement learning approach which adopts a deep Q network and adaptively moves idle vehicles to regain balance. This innovative model-free approach takes a very different perspective from the state-of-the-art network-based methods and is able to cope with large-scale shared systems in real time with partial or full data availability. The authors apply this approach to an agent based simulator and test it on a London case study. Results show that, the proposed method outperforms the local anticipatory method by reducing the fleet size by 14% while inducing little extra vehicle distance traveled. The performance is close to the optimal solution yet the computational speed is 2.5 times faster. Collectively, the paper concludes that the proposed rebalancing approach is effective under various demand scenarios and will benefit both travelers and operators if implemented in a shared mobility-on-demand system.

  • Induction_bus_and_car

    Although stable in the short term, individual travel patterns are subject to changes in the long term. The ability to detect such changes is critical for developing behavior models that are adaptive over time. We define travel pattern change as "abrupt, substantial, and persistent changes in the underlying pattern of travel behavior" and develop a methodology to detect such changes in individual travel patterns. We specify one distribution for each of the three dimensions of travel behavior (the frequency of travel, time of travel, and origins/destinations), and interpret the change of the parameters of the distributions as indicating the occurrence of the pattern change. A Bayesian method is developed to estimate the probability that a pattern change occurs at any given time for each behavior dimension. The proposed methodology is tested using pseudonymized smart card records of 3,210 users from London, U.K. over two years. The results show that the method can successfully identify significant changepoints in travel patterns. Compared to the traditional generalized likelihood ratio (GLR) approach, the Bayesian method requires less predefined parameters and is more robust. The methodology presented in this paper is generalizable and can be applied to detect changes in other aspects of travel behavior and human behavior in general.

  • Induction_bus_and_car

    For intelligent urban transportation systems, the ability to predict individual mobility is crucial for personalized traveler information, targeted demand management, and dynamic system operations. Whereas existing methods focus on predicting the next location of users, little is known regarding the prediction of the next trip. The paper develops a methodology for predicting daily individual mobility represented as a chain of trips (including the null set, no travel), each defined as a combination of the trip start time t, origin o, and destination d. To predict individual mobility, we first predict whether the user will travel (trip making prediction), and then, if so, predict the attributes of the next trip (t, o, d) (trip attribute prediction). Each of the two problems can be further decomposed into two subproblems based on the triggering event. For trip attribute prediction, we propose a new model, based on the Bayesian n-gram model used in language modeling, to estimate the probability distribution of the next trip conditional on the previous one. The proposed methodology is tested using the pseudonymized transit smart card records from more than 10,000 users in London, U.K. over two years. Based on regularized logistic regression, our trip making prediction models achieve median accuracy levels of over 80%. The prediction accuracy for trip attributes varies by the attribute considered—around 40% for t, 70-80% for o and 60-70% for d. Relatively, the first trip of the day is more difficult to predict. Significant variations are found across individuals in terms of the model performance, implying diverse travel behavior patterns.

  • Induction_bus_and_car

    Demand for public transportation is highly affected by passengers’ experience and the level of service provided. Thus, it is vital for transit agencies to deploy adaptive strategies to respond to changes in demand or supply in a timely manner, and prevent unwanted deterioration in service quality. In this paper, a real time prediction methodology, based on univariate and multivariate state-space models, is developed to predict the short-term passenger arrivals at transit stations. A univariate state-space model is developed at the station level. Through a hierarchical clustering algorithm with correlation distance, stations with similar demand patterns are identified. A dynamic factor model is proposed for each cluster, capturing station interdependencies through a set of common factors. Both approaches can model the effect of exogenous events (such as football games). Ensemble predictions are then obtained by combining the outputs from the two models, based on their respective accuracy. We evaluate these models using data from the 32 stations on the Central line of the London Underground (LU), operated by Transport for London (TfL). The results indicate that the proposed methodology performs well in predicting short-term station arrivals for the set of test days. For most stations, ensemble prediction has the lowest mean error, as well as the smallest range of error, and exhibits more robust performance across the test days.

  • Induction_bus_and_car

    The public transport networks of dense cities such as London serve passengers with widely different travel patterns. In line with the diverse lives of urban dwellers, activities and journeys are combined within days and across days in diverse sequences. From personalized customer information, to improved travel demand models, understanding this type of heterogeneity among transit users is relevant to a number of applications core to public transport agencies’ function. In this study, passenger heterogeneity is investigated based on a longitudinal representation of each user’s multi-week activity sequence derived from smart card data. We propose a methodology leveraging this representation to identify clusters of users with similar activity sequence structure. The methodology is applied to a large sample (n = 33,026) from London’s public transport network, in which each passenger is represented by a continuous 4-week activity sequence. The application reveals 11 clusters, each characterized by a distinct sequence structure. Socio-demographic information available for a small sample of users (n = 1973) is combined to smart card transactions to analyze associations between the identified patterns and demographic attributes including passenger age, occupation, household composition and income, and vehicle ownership. The analysis reveals that significant connections exist between the demographic attributes of users and activity patterns identified exclusively from fare transactions.

Team Members