Mode choice modeling is imperative for predicting and understanding travel behavior. For this purpose, machine learning (ML) models have increasingly been applied to stated preference and traditional self-recorded revealed preference data with promising results, particularly for extreme gradient boosting (XGBoost) and random forest (RF) models. Because of the rise in the use of tracking-based smartphone applications for recording travel behavior, we address the important and unprecedented task of testing these ML models for mode choice modeling on such data. Furthermore, as ML approaches are still criticized for leading to results that are hard to understand, we consider it essential to provide an in-depth interpretability analysis of the best-performing model. Our results show that the XGBoost and RF models far outperform a conventional multinomial logit model, both overall and for each mode. The interpretability analysis using the Shapley additive explanations approach reveals that the XGBoost model can be explained well at the overall and mode level. In addition, we demonstrate how to analyze individual predictions. Lastly, a sensitivity analysis gives insight into the relative importance of different data sources, sample size, and user involvement. We conclude that the XGBoost model performs best, while also being explainable. Insights generated by such models can be used, for instance, to predict mode choice decisions for arbitrary origin–destination pairs to see which impacts infrastructural changes would have on the mode share.
CITATION STYLE
Dahmen, V., Weikl, S., & Bogenberger, K. (2024). Interpretable Machine Learning for Mode Choice Modeling on Tracking-Based Revealed Preference Data. Transportation Research Record. https://doi.org/10.1177/03611981241246973
Mendeley helps you to discover research relevant for your work.