World-class data management and storage solutions in the biggest public clouds.Visit Cloud Services
Our industry-leading solutions are built so you can protect and secure your sensitive company data.Visit Cyber Resilience
November 18, 2016
When distributed systems fail in the field, identifying the root cause and pinpoint the faulty software component or machine can be extremely hard and time-consuming. This research aims to provide an end-to-end solution to automate the diagnosis of production failures on distributed software stack solely using the unstructured logs output. It builds this in three parts. First, it aims to design a new postmortem diagnosis tool to automatically reconstruct the extensive domain knowledge of the programmers who wrote the code; it does this by relying on the principle ("Flow Reconstruction Principle”) that programmers log events such that one can reliably reconstruct the execution flow a posteriori. However, any postmortem debugging relying on log output hinges on the efficacy of such logging. This is the focus of the second part, which is, to measure the quality of software’s log output. Finally, they intend to use this measurement to ultimately automate software logging itself.