Network acceleration is critical for modern AI workloads, particularly in optimizing communication and data transfer among distributed GPUs. The network has emerged as a significant bottleneck for Large Language Model (LLM) training and execution, driving rapid advancements in GPU network technologies. This evolution, however, introduces complex research challenges spanning load balancing, congestion control, collective communications, and novel network architectures. This presentation will first establish the fundamental distinctions between AI data centers and conventional cloud computing environments, followed by an overview of ongoing research topics in this rapidly evolving field.
Bio:
Jeremie Leguay received his Ph.D. degree in computer science from Pierre et Marie Curie University, Paris, France. He is Department Head at Nokia Bell Labs Paris Saclay on Network Systems Research. From 2004, he conducted research and led the Networking Lab at Thales Communications and Security (SIX GTS division) where he developed activities on sensor networks, mobile networks, and software-defined networks for mission-critical networked systems. In 2014, he joined Huawei Technologies as leader of the Network and Traffic Optimization Team to conduct research activities on the planning and control of IP networks. He is Senior IEEE member. He has been a Senior Expert and Director of the Datacom Dijkstra Lab at Huawei Technologies. His current activities are mainly on Routing, Network Management and Optimization, Self-driving networks, Automation, Network for AI.
