Alleviating Data Sparsity Problems in Estimated Time of Arrival via Auxiliary Metric Learning

TitleAlleviating Data Sparsity Problems in Estimated Time of Arrival via Auxiliary Metric Learning
Publication TypeJournal Article
Year of Publication2022
AuthorsSun Y, Hu W, Zhou D, Baichuan Mo, Fu K, Che Z, Wang Z, Shenhao Wang, Jinhua Zhao, Ye J, Tang J, Zhang C
JournalIEEE Transactions on Intelligent Transportation Systems

With millions of people using ride-hailing platforms for daily travel, estimated time of arrival (ETA) has become a significant problem in intelligent transportation systems and attracted considerable attention recently. Deep learning-based ETA methods have achieved promising results using massive spatial-temporal data. However, we find that the prediction accuracy is not satisfactory in practical applications due to the prevalent data sparsity problems. Instead of focusing on the average prediction performance as many other methods, this study aims to alleviate the data sparsity problems in ETA to enhance user experience. In general, the data sparsity problems arise from two aspects. The first is the road network, where many links are only traversed by few floating cars. The second aspect is drivers, where many drivers’ trajectories are too scarce (e.g., with only 3 trip records). To alleviate the sparsity in road network, we propose a Road Network Metric Learning framework for ETA (), where an auxiliary metric learning task is used to improve the link-embedding, especially for links with insufficient data. A novel triangle loss is proposed to improve metric learning effectiveness for links. Experiments on massive real-world data show that outperforms competing methods by promoting the cold links with limited data. Furthermore, we propose a novel unified framework to Alleviate Data Sparsity problems in ETA () by extending with an additional auxiliary task for driver ID embedding. Results with extensive experiments demonstrate that can effectively alleviate the data sparsity problems caused by road network and driver sparsity.