Accountable distributed computing

Speaker : Petr Kuznetsov
Télécom-Paris
Date: 19/01/2022
Time: 3:00 pm - 4:00 pm
Location: LINCS + Zoom

Abstract

There are two major ways to deal with failures in distributed computing: fault-tolerance and accountability. Fault-tolerance intends to anticipate failures by investing into replication and synchronization, so that the system’s correctness is not affected by faulty components. In contrast, accountability enables detecting failures a posteriori and raising undeniable evidences against faulty components.

In this talk, we discuss how accountability can be achieved, both in generic and application-specific ways. We also discuss how fault detection can be combined with reconfiguration, opening an avenue towards “self-healing” systems that seamlessly replace faulty components with correct ones.