Supervised Statistical Learning for Individual Level Trip Detection using Sparse Call Detail Record Data

TitleSupervised Statistical Learning for Individual Level Trip Detection using Sparse Call Detail Record Data
Publication TypeConference Paper
Year of Publication2016
AuthorsZhan Zhao, Koutsopoulos H, Jinhua Zhao
Conference Name95th Transportation Research Board Annual Meeting
Date Published08/2015
PublisherTransportation Research Board
Conference LocationWashington, D.C.
KeywordsCall Detail Record, data fusion, hidden visit, statistical inference, supervised learning
Abstract

Despite a large body of literature related to trip detection using Call Detail Record (CDR) data, the fundamental understanding of the limitations of the data is lacking and, particularly, its sparse nature is not well addressed in existing work. This paper proposes a conceptual framework to make explicit distinction between telecommunication patterns captured by CDRs and travel patterns that are of interest to the transportation community. A process is proposed to extract trips from CDRs at the individual level. Literature review reveals that there is a lack of statistical approaches for trip detection beyond simple heuristic-based decision rules. Statistical modeling is a suitable approach to reduce bias, increase robustness, and make probabilistic estimation. In order to perform supervised statistical learning, we propose to use data fusion to form labeled data for model training. In the absence of complementary data with more frequent observations (such as GPS data), which is often the case, this can be done by extracting labeled observation from more granular cellular data access records and extracting feature vectors from voice-call and SMS records. The proposed approach is demonstrated, using real-word CDR data from a Chinese city, through inferring whether there exists a hidden visit between two consecutive visits we can directly observe from CDR data. The model results show significant improvement in terms of accurately identifying hidden visits compared to the naïve rule. The model performance can be further improved by mitigating signal noise, extracting more powerful features and accounting for individual heterogeneity. This study provides a deeper understanding on the mapping between CDRs in telecommunication and trips in human mobility, and how we can, and should, extract the later from the former. The proposed data fusion approach offers a flexible and systematic way to make inference of individual mobility patterns, even when only CDR data is available.