Design of algorithms for the production of training data

When

07/04/2021

11:00 am-12:00 pm

Quentin Lutz and Élie de Panafieu

Nokia Bell Labs France

Event Type

Network Theory
Working Group
Youtube

There is a well-known saying in the supervised machine learning community: “garbage in, garbage out”. The performance of a supervised learning algorithm depends critically on the quantity and quality of training data. This training data is obtained from raw data through data labeling performed by human experts. The growing use of machine learning tools makes the design of efficient data labeling algorithms more relevant than ever.

We will present progress made in collaboration with Maya Stein (University of Chile) and Alexander Scott (Oxford University) on a problem proposed by Maria Laura Maag (Nokia). It corresponds to the reconstruction of a set partition using as few queries as possible, each query asking whether two elements belong to the same block. We characterize the optimal algorithms and analyze the optimal distribution of the number of queries.

Slides (Quentin)

Slides (Élie)