PhD thesis defense “Longitudinal, large scale and unbiased Internet Measurements”

Speaker : Flavia Salutari
Télécom-Paris
Date: 21/09/2021
Time: 2:00 pm - 5:00 pm
Location: Zoom + Amphi 4 chez Télécom-Paris

Abstract

Today, a world without the Internet is unimaginable. By interconnecting billions of people worldwide and by offering an uncountable number of services, it is now fully embedded in the modern society. Yet, despite technology evolution and development, its pervasiveness and heterogeneity still raise new challenges, such as security concerns, monitoring of the users’ Quality of Experience (QoE), care for transparency and fairness. Accordingly, the goal of this thesis is to shed new light on some of the challenges emerged in recent years. In particular, we provide an in-depth analysis of some of the most prominent aspects of modern Internet. A particular emphasis is given on the World Wide Web, which among all, is undoubtedly one of the most popular Internet applications, and a specific regard to its interaction with machine learning. The first part of this work studies the Quality of Experience of users’ browsing the Web, with measurements led both in the wild and in controlled environments. Our contributions follow with an original analysis of both the subjective user feedback and the objective QoE metrics, showing how hard it is to build accurate supervised data-driven models capable to predict the user satisfaction, along with an in-depth discussion of the multi-modal nature of the subjective user opinions.In the second part of this work, we analyze and discuss the fairness of state-of-the-art transformer-based language models, which are pre-trained on Web-based corpora and which are typically used to solve a wide variety of Natural Language Processing (NLP) tasks. Here, we question whether the sheer size and heterogeneity of the Web guarantee diversity in the models. The core of our contributions rests in the measure of the bias embedded in the models, that we discuss under different angles. Finally, the last part of this dissertation addresses the classification of objects generated by machines through some of the simplest state-of-the-art supervised machine learning algorithms. Through a minimally intrusive, robust and lightweight framework, we show that the different behaviors of a field of the IP packet, the IP identification (IP-ID), could be easily classified with few features having high discriminative power. We finally apply our technique to an Internet-wide census and provide an updated view of the adoption of the different implementations in the Internet.

 

Jury composition:
Isabelle CHRISMENT, Professor, LORIA Campus Scientifique (President, Reviewer)
Pedro CASAS, Senior Scientist, AIT Austrian Institute Of Technology (Reviewer)
Chadi BARAKAT, Senior Researcher, INRIA (Examinator)
Tobias HOßFELD, Professor, University of Würzburg(Examinator)
Marco MELLIA, Professor, Politecnico di Torino (Examinator)
Philippe OWEZARSKI, Director of Research, LAAS-CNRS (Examinator)
 
Mauro SOZIO, Professeur, Télécom Paris (PhD Supervisor)
Dario ROSSI, Chief Expert, Huawei Technologies France (PhD Co-Supervisor)
 
To watch the defense:
ID de réunion : 945 2685 8841 
Code secret :  340704