Mar 01, 2016: Ale Strooisma: Incident Detection and Resolution for the Ovis Telematics Shepherd platform

March 01, 2016Incident Detection and Resolution for the Ovis Telematics Shepherd platform
Room: HB 2FAle Strooisma
12:30-13:30

The Ovis Telematics Shepherd platform allows management of a large number of stationary and roaming sensors. Information flows from various sources – both real-time and as periodic batches – and is processed and aggregated into reports which are presented to customers through various means.
There are many points of failure in this process complex data processing pipeline and it indeed often happens that reports present false data to customers. Additionally, when a problem is detected, it is hard for developers to get the information required to find and fix the underlying error.
To resolve these issues, I am creating a system that monitors the Shepherd platform. When the a failure is detected, the monitor tries to prevent it or recover from it and provides information to help track down the underlying issue.