Gold mining in a River of Internet Content Traffic

Speaker : Guiseppe Scavo
Alcatel Lucent Bell Labs and INRIA
Date: 02/04/2014
Time: 2:00 pm - 3:00 pm
Location: LINCS Meeting Room 40


With the advent of Over-The-Top content providers(OTTs), Internet Service Providers (ISPs) saw their portfolio of services shrink to the low margin role of data transporters. In order to counter this effect, some ISPs started to follow big OTTslike Facebook and Google in trying to turn their data into avaluable asset. In this paper, we explore the questions of whatmeaningful information can be extracted from network data, andwhat interesting insights it can provide. To this end, we tacklethe first challenge of detecting user-URLs, i.e., those links thatwere clicked by users as opposed to those objects automaticallydownloaded by browsers and applications. We devise algorithmsto pinpoint such URLs, and validate them on manually collected ground truth traces. We then apply them on a three-day long traffic trace spanning more than 19,000 residential users that generated around 190 million HTTP transactions. We find that only 1.6% of these observed URLs were actually clicked by users. As a first application for our methods, we answer the questionof which platforms participate most in promoting the Internetcontent. Surprisingly, we find that, despite its notoriety, only 11%of the user URL visits are coming from Google Search.