|Predicting Travel Mode Choice with 86 Machine Learning Classifiers: An Empirical Benchmark Study
|Year of Publication
|Shenhao Wang, Baichuan Mo, Jinhua Zhao
|Transportation Research Board 99th Annual Meeting
Researchers are applying a large number of machine learning (ML) classifiers to predict travel behavior, but the results are data-specific and the selection of ML classifiers is author-specific. To obtain generalizable results, this paper provides an empirical benchmark by using 86 classifiers from 14 model families to predict the travel mode choice based on the National Household Travel Survey (NHTS) 2017 dataset. The 86 ML classifiers from 14 model families incorporate all the important ML classifiers discussed in previous studies. The large number of observations (about 800,000) in the NHTS2017 dataset enables us to analyze the effect of different sample sizes as a meta-dimension on prediction accuracy. We found that ensemble models, including boosting, bagging, and random forests, perform the best among all the classifiers, and that deep neural networks (DNNs) perform the best among all the non- ensemble models. Classical discrete choice models (DCMs) only predict at the medium or relatively low range of prediction accuracy among all the models. Particularly, mixed logit model cannot be trained in a reasonable amount of time owing to its computational difficulty in sampling. Larger sample size generally leads to higher prediction accuracy, particularly for the models with high model complexity. Overall, this study provides an empirical benchmark result for the future, and future studies can build upon our results by testing more ML classifiers on the same NHTS2017 dataset, thus yielding more comparable, replicable, and generalizable knowledge shared by the whole research community.