Incident Resolution as Code

Julien Pivotto @roidelapluie roidelapluie

B.1.017 - Monday 4th February 2019 - 16:30 → 17:25

In this talk I will go in details about how we quickly resolve incidents using Prometheus and ansible.

Come and discover how we linked the monitoring, queuing systems and orchestration to do unattended incident management.

One of the main advantages of this is that we can apply technical remediation or mitigation from business metrics, even if things happen at different levels.

All of this working anywhere Prometheus and ansible run, so not only in a kube env, but also in a traditional infrastructure.

Speaker Info