Thesis defense “Contributions to the representation of multivariate time series and graphs”

Speaker : Edouard Pineau
Télécom Paris
Date: 07/12/2020
Time: 2:00 pm - 5:00 pm

Abstract

Machine learning (ML) algorithms are designed to learn models that have the ability to take decisions or make predictions from data, in a large panel of tasks like classification of images or monitoring of mechanical systems. In general, the learned models are statistical approximations of the true/optimal unknown decision models. The efficiency of a learning algorithm depends on an equilibrium between model richness, complexity of the data distribution and complexity of the task to solve from data. Nevertheless, for computational convenience, the statistical decision models often adopt simplifying assumptions about the data (e.g. linear separability, independence of the observed variables, etc.). However, when data distribution is complex (e.g. high-dimensional with nonlinear interactions between observed variables), the simplifying assumptions can be counterproductive. In this situation, a solution is to feed the model with an alternative representation of the data. The objective of data representation is to separate the relevant information with respect to the task to solve from the noise, in particular if the relevant information is hidden (latent), in order to help the statistical model. Until recently and the rise of modern ML, many standard representations consisted in an expert-based handcrafted preprocessing of data. Recently, a branch of ML called deep learning (DL) completely shifted the paradigm. DL uses neural networks (NNs), a family of powerful parametric functions, as learning data representation pipelines. These recent advances outperformed most of the handcrafted data in many domains.

In this thesis, we are interested in learning representations of multivariate time series (MTS) and graphs. MTS and graphs are particular objects that do not directly match standard requirements of ML algorithms. They can have variable size and non-trivial alignment, such that comparing two MTS or two graphs with standard metrics is generally not relevant. Hence, particular representations are required for their analysis using ML approaches. The contributions of this thesis consist of practical and theoretical results presenting new MTS and graphs representation learning frameworks.

Two MTS representation learning frameworks are dedicated to the ageing detection of mechanical systems. First, we propose a model-based MTS representation learning framework called Sequence-to-graph (Seq2Graph). Seq2Graph assumes that the data we observe has been generated by a model whose graphical representation is a causality graph. It then represents, using an appropriate neural network, the sample on this graph. From this representation, when it is appropriate, we can find interesting information about the state of the studied mechanical system. Second, we propose a generic trend detection method called Contrastive Trend Estimation (CTE). CTE learns to classify pairs of samples with respect to the monotony of the trend between them. We show that using this method, under few assumptions, we identify the true state underlying the studied mechanical system, up-to monotone scalar transform.

Two graph representation learning frameworks are dedicated to the classification of graphs. First, we propose to see graphs as sequences of nodes and create a framework based on recurrent neural networks to represent and classify them. Second, we analyze a simple baseline feature for graph classification: the Laplacian spectrum. We show that this feature matches minimal requirements to classify graphs when all the meaningful information is contained in the structure of the graphs.

Here’s the streaming link: https://safran-group.webex.com/safran-group-fr/j.php?MTID=mfda124766b7a93fc255f3da5a14ab4ec