|Title||Deep Neural Networks for Choice Analysis: A Statistical Learning Theory Perspective|
|Publication Type||Journal Article|
|Year of Publication||2021|
|Authors||Shenhao Wang, Qingyi Wang, Nate Bailey, Jinhua Zhao|
|Journal||Transportation Research Part B|
Although researchers increasingly use deep neural networks (DNN) to analyze individual choices, overfitting and interpretability issues remain obstacles in theory and practice. This study presents a statistical learning theoretical framework to examine the tradeoff between estimation and approximation errors, and between the quality of prediction and of interpretation. It provides an upper bound on the estimation error of the prediction quality in DNN, measured by zero-one and log losses, shedding light on why DNN models do not overfit. It proposes a metric for interpretation quality by formulating a function approximation loss that measures thedifference between true and estimated choice probability functions. It argues that the binary logit (BNL) and multinomial logit (MNL) models are the specific cases in the model family of DNNs, since the latter always has smaller approximation errors. We explore the relative performance of DNN and classical choice models through three simulation scenarios comparing DNN, BNL, and binary mixed logit models (BXL), as well as one experiment comparing DNN toBNL, BXL, MNL, and mixed logit (MXL) in analyzing the choice of trip purposes based on theNational Household Travel Survey 2017. The results indicate that DNN can be used for choiceanalysis beyond the current practice of demand forecasting because it has the inherent utility interpretation and the power of automatically learning utility specification. Our results suggest DNN outperforms BNL, BXL, MNL, and MXL models in both prediction and interpretation when the sample size is large (≥O(104)), the input dimension is high, or the true data generating process is complex, while performing worse when the opposite is true. DNN outperformsBNL and BXL in zero-one, log, and approximation losses for most of the experiments, and the larger sample size leads to greater incremental value of using DNN over classical discrete choice models. Overall, this study introduces the statistical learning theory as a new foundation for high-dimensional data, complex statistical models, and non-asymptotic data regimes in choice analysis and the experiments show the effective prediction and interpretation of DNN for its applications to policy and behavioral analysis.