|Nokia Bell Labs
|2:00 pm - 3:00 pm
|Paris-Rennes Room (EIT Digital)
The increasing number of machines and technologies involved in existing infrastructures and networks hardens their management. Even if many monitoring solutions help to detect a faulty behavior, having a clear understanding of its causes is not always straightforward, especially if relevant information is scattered over logs issued by different software or hardware components. This paper proposes a new methodology inspired from pattern matching and able to find alarm correlations with or without prior knowledge about the monitored system. The proposed data structure can store every observed pattern of correlated alarms by processing logs online. It can be queried to extract the patterns of alarms leading to an arbitrary failure. This paper comes with three main contributions. First, we propose a framework able to represent alarm logs according to spatio-temporal dependencies. Second, we design a new scalable data structure, able to store every observed pattern of alarms, and validate it by simulation on real and artificial datasets. Third, we show how to exploit this data structure for fault diagnosis.