|Speaker :||Petr Kuznetsov|
|Time:||3:00 pm - 4:00 pm|
|Location:||LINCS + Zoom|
There are two major ways to deal with failures in distributed computing: fault-tolerance and accountability. Fault-tolerance intends to anticipate failures by investing into replication and synchronization, so that the system’s correctness is not affected by faulty components. In contrast, accountability enables detecting failures a posteriori and raising undeniable evidences against faulty components.
In this talk, we discuss how accountability can be achieved, both in generic and application-specific ways. We also discuss how fault detection can be combined with reconfiguration, opening an avenue towards “self-healing” systems that seamlessly replace faulty components with correct ones.