This thesis exploits partial system knowledge to design more efficient reinforcement learning (RL) algorithms for three problems: admission control (1), electricity storage optimization (2), and the acceleration of bias function computation (3).
For (1), the system is modeled as an M/M/c/S queue with m job classes. We propose a model-based algorithm, named UCRL-AC, with a finite-time regret bound dominated by O(S\log T + \sqrt{mT \log T}), where T is the total running time. UCRL-AC exploits the queuing structure by learning the arrival rates.
For (2), we design an RL algorithm that minimizes energy and demand charges by controlling a battery. The knowledge of the battery dynamics allows an efficient offline exploration, which enables fast training with minimal data. The algorithm is tested on real-world data.
For (3), we show that for a fixed policy, the bias function computation can be accelerated through the knowledge of eigenvalues of the transition probability matrix.
Composition du jury:
- Urtzi Ayesta, IRIT (rapporteur)
- Giovanni Neglia, Centre Inria d’Université Côte d’Azur (rapporteur)
- Johanne Cohen, LISN – Université Paris-Saclay (examinatrice)
- Bruno Gaujal, Inria Grenoble (examinateur)
- Alain Jean-Marie, Inria Montpellier (examinateur)
- Lorenzo Maggi, NVIDIA (examinateur)
- Ana Buši?, Inria Paris (directrice de thèse)
- Jiamin Zhu, IFP Energies Nouvelles (co-encadrante de thèse)
- Tristan Charrier, AMIAD (membre invité, superviseur DGA)