Design of algorithms for the production of training data

Speaker : Quentin Lutz and Élie de Panafieu
Nokia Bell Labs France
Date: 07/04/2021
Time: 11:00 am - 12:00 pm


There is a well-known saying in the supervised machine learning community: “garbage in, garbage out”. The performance of a supervised learning algorithm depends critically on the quantity and quality of training data. This training data is obtained from raw data through data labeling performed by human experts. The growing use of machine learning tools makes the design of efficient data labeling algorithms more relevant than ever.

We will present progress made in collaboration with Maya Stein (University of Chile) and Alexander Scott (Oxford University) on a problem proposed by Maria Laura Maag (Nokia). It corresponds to the reconstruction of a set partition using as few queries as possible, each query asking whether two elements belong to the same block. We characterize the optimal algorithms and analyze the optimal distribution of the number of queries.

Slides (Quentin)

Slides (Élie)