Speaker : | Fabien Mathieu |
LINCS | |
Date: | 10/06/2020 |
Time: | 2:00 pm - 3:00 pm |
Abstract
Searching for documents is a task that everyone faces on a regular basis, especially when looking for a relevant Internet page, an e-mail, or a document on an Intranet. An effective search relies on a precise and well-organized search engine.
The majority of current techniques combine a keyword search with structural information (ontologies, relationships between elements) in order to order the documents in a corpus by relevance.
In this talk, we present a new light navigation engine called Gismo (Generic Information Search with a Mind of its Own). Gismo exploits only the textual content of documents and does not require ontology, metadata, or pre-learning. It is thus possible to use it on any corpus without making assumptions about the type of documents or their language. The model chosen and the algorithms used allow Gismo to be extremely fast even on large corpora. Finally, Gismo allows you to find, sort and organize documents by theme and relevance, making it a navigation engine and not a simple search engine.
Gismo is available as an open-source Python module.