Pattern Matching, text generation and sequence complexity

Speaker : Philippe Jacquet & Dimitrios Milioris
Inria & Nokia
Date: 21/04/2021
Time: 2:00 pm - 3:00 pm
Location: Zoom + LINCS

Abstract

Pattern matching is a powerful tool used on many kinds of data as long as they stand on a linear support such as texts or other sequences. It can also be extended to regular supports standing in any Euclidian space. It constitutes the engine of the most powerful compression algorithms on texts and also the foundation of many universal predictors over time sequences. Recently pattern matching has been applied in short text classification under the name “Joint Complexity” in a perspective of Accelerated Artificial Intelligence. One of the interesting aspects of pattern matching is that it is based on extremely strong foundations in information theory and in the asymptotic analysis of algorithms. In particular when the source of sequence is well defined and parametrized. We will investigate the special case of Markov source models and show how they are “good enough” to model natural language texts. We will present some small games which illustrate this fact, in parallel with the description of deeper theoretical results.