Speaker : | François Durand |
Nokia Bell Labs France | |
Date: | 07/10/2020 |
Time: | 11:00 am - 12:00 pm |
Location: | Paris-Rennes Room (EIT Digital) |
Abstract
Many reinforcement learning problems can be seen from the point of view of stochastic approximation. Unfortunately, classic stochastic approximation algorithms, such as Robbins-Monro, may have an infinite asymptotic variance. The class of “zap” algorithms aim at solving that problem. We then examine the application of zap algorithms to reinforcement learning, with the example of zap Q-learning.