|Nokia Bell Labs
|2:00 pm - 3:00 pm
|Paris-Rennes Room (EIT Digital)
Unimodal functions arise whenever the expected value f(x) of the metric to be optimized has a single peak as a function of the available control variable x.
The exact shape of f is often not known in advance, hence it can only be learned via sampling.
For this purpose, we exploit the formalism of multi-armed bandits to design a learning algorithm that i) converges in probability to the optimal arm maximizing f and ii) does not require to know the learning horizon in advance, i.e., it is “anytime”. Moreover, it adapts naturally to the scenario where iii) prior knowledge on the location of the peak is available, even if the prior is inaccurate.
We also present an efficient heuristic that can use explicitly the prior belief on the location of the optimal arm. We demonstrate the applicability of our approaches to a basic resource allocation problem for cloud computing, where the orchestrator aims at adapting the resources allocated to a client to its unknown and dynamic requests.