Trip Detection using Sparse Call Detail Record Data

TitleTrip Detection using Sparse Call Detail Record Data
Publication TypeJournal Article
Year of PublicationSubmitted
AuthorsZhan Zhao, Haris N. Koutsopoulos, Jinhua Zhao
JournalIEEE Transactions on Intelligent Transportation Systems
KeywordsCall Detail Record, data fusion, elapsed time interval, hidden visit, statistical inference, supervised learning
Abstract

 

Despite a large body of literature on trip detection using call detail record (CDR) data, a fundamental understanding of their limitations is lacking. In particular, the sparse nature of CDR data is not well addressed. This study defines a process that allows physical travel patterns (important to the transportation community) to be inferred from telecommunication patterns captured by CDRs. To reduce the reliance of existing CDR-based trip detection methods on heuristics and arbitrary assumptions, we use data fusion to obtain labeled data for supervised statistical learning. In the absence of complementary data, this can be accomplished by extracting labeled observations from more granular cellular data access records, and extracting features from voice call and SMS records. The proposed approach is demonstrated using a real-world CDR dataset of 3 million users from a large Chinese city. The approach functions by inferring whether a hidden visit exists between two consecutive visits observed from CDR data. Logistic regression, support vector machine, and AdaBoost are used to develop classification models for hidden visit inference, and the test results show significant improvement over the naïve no-hidden-visit rule, which is an implicit assumption adopted by most existing studies. Based on the proposed model, we estimate that over 10% of the displacements extracted from CDR data involve hidden visits. The proposed data fusion approach offers a systematic statistical approach to inferring individual mobility patterns based on telecommunication records. The method can be generalized to fit other types of large-scale data as well.