Accountable distributed computing

When

19/01/2022    
3:00 pm-4:00 pm
Petr Kuznetsov
Télécom-Paris

Where

LINCS + Zoom
23 avenue d'Italie, Paris, 75013

Event Type

There are two major ways to deal with failures in distributed computing: fault-tolerance and accountability. Fault-tolerance intends to anticipate failures by investing into replication and synchronization, so that the system’s correctness is not affected by faulty components. In contrast, accountability enables detecting failures a posteriori and raising undeniable evidences against faulty components.

In this talk, we discuss how accountability can be achieved, both in generic and application-specific ways. We also discuss how fault detection can be combined with reconfiguration, opening an avenue towards “self-healing” systems that seamlessly replace faulty components with correct ones.