|Title||Identifying Hidden Visits from Sparse Call Detail Record Data|
|Publication Type||Journal Article|
|Year of Publication||Submitted|
|Authors||Zhan Zhao, Haris N. Koutsopoulos, Jinhua Zhao|
|Journal||European Physical Journal: Data Science|
|Keywords||Call Detail Record; statistical inference; data fusion; hidden visit|
Despite a large body of literature on trip inference using call detail record (CDR) data, a fundamental understanding of their limitations is lacking. In particular, because of the sparse nature of CDR data, users may travel to a location without being revealed in the data, which we refer to as hidden visits. The existence of hidden visits hinders our ability to extract reliable information about human mobility and travel behavior from CDR data. In this study, we propose a data fusion approach to obtain labeled data for statistical inference of hidden visits. In the absence of complementary data, this can be accomplished by extracting labeled observations from more granular cellular data access records, and extracting features from voice call and text messaging records. The proposed approach is demonstrated using a real-world CDR dataset of 3 million users from a large Chinese city. Logistic regression, support vector machine, and AdaBoost are used to develop classication models for hidden visit inference, and the test results show signicant improvement over the naive no-hidden-visit rule, which is an implicit assumption adopted by most existing studies. Based on the proposed model, we estimate that over 10% of the displacements extracted from CDR data involve hidden visits. The proposed data fusion method oers a systematic statistical approach to inferring individual mobility patterns based on telecommunication records. The method can be generalized to t other types of large-scale data as well.