We address in this thesis the challenge of efficient resource allocation under uncertainty for the transport of time-critical ultra reliable traffic in next-generation networks. We develop optimization and online-learning methods that provide rigorous performance guarantees for short-horizon probabilistic requirements and long-term cumulative constraints. We begin with Ultra-Reliable Low-Latency Communications (URLLC). Prior models for probabilistic delay either impose strong assumptions on arrivals or focus primarily on queue stability. We relax these assumptions and formulate a chance-constrained minimization of resource usage that holds under general arrival processes. By exploiting structural properties of the policy space, we design efficient bandit-based algorithms for both offline (known statistics) and online (unknown statistics) settings. These algorithms provably converge in a fixed number of iterations while meeting stringent 1ms delay and (10^{-5}) reliability targets with minimal resource consumption. We then push these guarantees to extreme URLLC targeting (0.1,mathrm{ms}) latency and reliability on the order of (10^{-7}), where queuing is impermissible and the resource allocation schemes must rely on limited arrival information (historical samples, mean, variance). Static methods tend to over-allocate resources. We introduce an online, dynamic reservation policy based on a sliding-window scenario approach that is robust and safe: it tracks minimal reservations from empirical data and avoids conservative over-provisioning while preserving stringent QoS constraints. Next, we consider goal-oriented communications, focusing on haptic applications that are highly sensitive to bursts of packet losses. We propose a queuing-theoretic framework which minimizes resource costs in the presence of losses from both collisions with other haptic packets and poor radio conditions. We design a joint control policy that combines adaptive transmit-power boosting with preemption of resources initially provisioned for enhanced Mobile Broadband (eMBB), governed by threshold policies. For heterogeneous users, interdependence across user groups induces a high-dimensional decision space, ruling out exhaustive search. To address this complexity, we make use of a modified simulated-annealing algorithm with constraint handling through direct rejection of infeasible policies or cost-based penalties. Eventually, we study long-term compliance and introduce Constrained Online Convex Optimization with Memory (COCO-M), where losses and constraints depend on the last (m) decisions. Prior work considered mainly fixed memory length. We generalize to arbitrary memory lengths and incorporate untrusted short-horizon predictions, providing the first algorithms with provable sublinear regret and sublinear cumulative constraint violation in this general setting. This yields a versatile toolbox for online learning and predictive network control under adversarial conditions.
