Reliable and Scalable Account Correlation Across Large Social Networks

Speaker : Oana Goga
Lip6/UPMC
Date: 13/11/2013
Time: 2:00 pm - 2:30 pm
Location: LINCS Meeting Room 40

Abstract

There is lot of interest and concern, both in research and industry, about the potential for correlating user accounts across
multiple online social networking sites. In this paper, we focus on the challenge of designing account correlation schemes that achieve
high reliability, i.e., low error rates, in matching accounts, even when applied in large-scale networks with hundreds of millions of user
accounts. We begin by identifying four important properties â Availability, Consistency, non-Impersonability, and Discriminability
(ACID) â that features used for matching accounts need to satisfy inorder to achieve reliable and scalable account correlation. Even
though public attributes like name, location, profile photo, and friends do not satisfy all the ACID properties, we show how it is
possible to leverage multiple attributes to build SCALABLE- LINKER, a reliable and scalable account correlator. We evaluate the performance
of SCALABLE-LINKER in correlating accounts from Twitter and Facebook, two of the largest real-world social networks. Our tests using ground
truth data about correlated accounts, show that while SCALABLE-LINKER can correlate as high as 89% of accounts (true positive rate) with
less than 1% false positive rate, when evaluated over small thousand node subsets of Facebook accounts, the true positive rate drops to 21%
(keeping the 1% false positive rate), when the evaluation scale to include all the more than billion Facebook accounts. Our findings
reflect the potential as well as the limits of reliably correlating accounts at scale using only public attributes of accounts.