BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//wp-events-plugin.com//7.2.3.1//EN
BEGIN:VEVENT
UID:705@lincs.fr
DTSTART;TZID=Europe/Paris:20220502T100000
DTEND;TZID=Europe/Paris:20220502T130000
DTSTAMP:20220426T074500Z
URL:https://www.lincs.fr/events/phd-thesis-defense-model-based-reinforceme
 nt-learning-for-dynamic-resource-allocation-in-cloud-environments/
SUMMARY:PhD Thesis defense "Model-Based Reinforcement Learning  for Dynamic
 Resource Allocation in  Cloud Environments"
DESCRIPTION:The emergence of new technologies (Internet of Things\, smart
 cities\, autonomous vehicles\, health\, industrial automation\, ...)
 requires efficient resource allocation to satisfy the demand. These new
 offers are compatible with new 5G network infrastructure since it can
 provide low latency and reliability. However\, these new needs require high
 computational power to fulfill the demand\, implying more energy
 consumption in particular in cloud infrastructures and more particularly in
 data centers. Therefore\, it is critical to find new solutions that can
 satisfy these needs still reducing the power usage of resources in cloud
 environments. In this thesis we propose and compare new AI solutions
 (Reinforcement Learning) to orchestrate virtual resources in virtual
 network environments such that performances are guaranteed and operational
 costs are minimised. We consider queuing systems as a model for clouds IaaS
 infrastructures and bring learning methodologies to efficiently allocate
 the right number of resources for the users. Our objective is to minimise a
 cost function considering performance costs and operational costs. We go
 through different types of reinforcement learning algorithms (from
 model-free to relational model-based) to learn the best policy.
 Reinforcement learning is concerned with how a software agent ought to take
 actions in an environment to maximise some cumulative reward. We first
 develop queuing model of a cloud system with one physical node hosting
 several virtual resources. On this first part we assume the agent perfectly
 knows the model (dynamics of the environment and the cost function)\,
 giving him the opportunity to perform dynamic programming methods for
 optimal policy computation. Since the model is known in this part\, we also
 concentrate on the properties of the optimal policies\, which are
 threshold-based and hysteresis-based rules. This allows us to integrate the
 structural property of the policies into MDP algorithms. After providing a
 concrete cloud model with exponential arrivals with real intensities and
 energy data for cloud provider\, we compare in this first approach
 efficiency and time computation of MDP algorithms against heuristics built
 on top of the queuing Markov Chain stationary distributions. In a second
 part we consider that the agent does not have access to the model of the
 environment and concentrate our work with reinforcement learning
 techniques\, especially model based reinforcement learning. We first
 develop model-based reinforcement learning methods where the agent can
 re-use its experience replay to update its value function. We also consider
 MDP online techniques where the autonomous agent approximates environment
 model to perform dynamic programming. This part is evaluated in a larger
 network environment with two physical nodes in tandem and we assess
 convergence time and accuracy of different reinforcement learning methods\,
 mainly model-based techniques versus the state-of-the-art model-free
 methods (e.g. Q-Learning). The last part focuses on model-based
 reinforcement learning techniques with relational structure between
 environment variables. As these tandem networks have structural properties
 due to their infrastructure shape\, we investigate factored and causal
 approaches built-in reinforcement learning methods to integrate this
 information. We provide the autonomous agent with a relational knowledge of
 the environment where it can understand how variables are related to each
 other. The main goal is to accelerate convergence by: first having a more
 compact representation with factorisation where we devise a factored MDP
 online algorithm that we evaluate and compare with model-free and
 model-based reinforcement learning algorithms\; second integrating causal
 and counterfactual reasoning that can tackle environments with partial
 observations and unobserved confounders.
CATEGORIES:PhD Defense
LOCATION:LINCS Seminars room\, 23\, avenue d'Italie\, Paris\, 75013\,
 France
GEO:48.828400;2.356897
X-APPLE-STRUCTURED-LOCATION;VALUE=URI;X-ADDRESS=23\, avenue d'Italie\,
 Paris\, 75013\, France;X-APPLE-RADIUS=100;X-TITLE=LINCS Seminars
 room:geo:48.828400,2.356897
END:VEVENT
BEGIN:VTIMEZONE
TZID:Europe/Paris
X-LIC-LOCATION:Europe/Paris
BEGIN:DAYLIGHT
DTSTART:20220327T030000
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
TZNAME:CEST
END:DAYLIGHT
END:VTIMEZONE
END:VCALENDAR