Speaker : | Léo Laugier |
Télécom-Paris | |
Date: | 08/11/2022 |
Time: | 3:00 pm - 6:00 pm |
Location: | Zoom + Amphi 4 chez Télécom-Paris |
Abstract
Natural Language Processing is motivated by applications where computers should gain a semantic and syntactic understanding of human language. Recently, the field has been impacted by a paradigm shift. Deep learning architectures coupled with self-supervised training have become the core of state-of-the-art models used in Natural Language Understanding and Natural Language Generation. Sometimes considered as foundation models, these systems pave the way for novel use cases. Driven by an academic-industrial partnership between the Institut Polytechnique de Paris and Google AI Research, the present research has focused on investigating how pretrained neural Natural Language Processing models could be leveraged to improve online interactions.
This thesis first explored how self-supervised style transfer could be applied to the toxic-to-civil rephrasing of offensive comments found in online conversations. In the context of toxic content moderation online, we proposed to fine-tune a pretrained text-to-text model (T5) with a denoising and cyclic auto-encoder loss.
Then, a subsequent work investigated the human labeling and automatic detection of toxic spans in online conversations. We released a new labeled dataset to train and evaluate systems, which led to a shared task at the 15th International Workshop on Semantic Evaluation.
Finally, we developed a recommender system based on online reviews of items, taking part in the topic of explaining users’ tastes considered by the predicted recommendations. The method uses textual semantic similarity models to represent a user’s preferences as a graph of textual snippets, where the edges are defined by semantic similarity.
The jury is composed as follows:
- Mr. Benoît Sagot, Research Director, INRIA, France (Examiner, President)
- Ms. Serena Villata, Tenured Researcher, CNRS, France (Reviewer)
- Mr. François Yvon, Research Director, CNRS, France (Reviewer)
- Mr. Ion Androutsopoulos, Professor, Athens University of Economics and Business, Greece (Examiner)
- Ms. Marine Carpuat, Assistant professor, University of Maryland, United States (Examiner)
- Mr. Slav Petrov, Distinguished Scientist, Google AI Research, United States (Examiner)
- Mr. Thomas Bonald, Professor, Télécom Paris, France (Ph.D. supervisor)
- Mr. Lucas Dixon, Research scientist, Google AI Research, France (Ph.D. co-supervisor)