Speaker : | Petr Kuznetsov |
Télécom-Paris | |
Date: | 19/01/2022 |
Time: | 3:00 pm - 4:00 pm |
Location: | LINCS + Zoom |
Abstract
There are two major ways to deal with failures in distributed computing: fault-tolerance and accountability. Fault-tolerance intends to anticipate failures by investing into replication and synchronization, so that the system’s correctness is not affected by faulty components. In contrast, accountability enables detecting failures a posteriori and raising undeniable evidences against faulty components.
In this talk, we discuss how accountability can be achieved, both in generic and application-specific ways. We also discuss how fault detection can be combined with reconfiguration, opening an avenue towards “self-healing” systems that seamlessly replace faulty components with correct ones.