Back To Schedule
Monday, November 18 • 5:38pm - 5:43pm
Lightning Talk: How the Observability Team at Spotify Radically Decreased On-Call Alerts - Lauren Muhlhauser, Spotify

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
The Reliability team at Spotify took over the monitoring stack and decreased incident pages by 42% within 6 months. At first, they were devoting all their time to managing on-call alerts and tech debt. Now, on-call alerts are manageable and infrequent, and the team is on a path to using entirely open sourced products.

This stack was developed years prior, when there were few well-developed open source solutions available. Lauren describes how migrations to new tools (Grafana and Prometheus) decreased their backlog and on-call pages. She will also cover the improvements the team made to their own open source products (Heroic and FFWD) and why they chose to continue using and maintaining them. Lastly, she will discuss a new tool that the team will be repurposing and open sourcing in the near future.

avatar for Lauren Muhlhauser

Lauren Muhlhauser

Site Reliability Engineer, Spotify
Lauren is a Site Reliability Engineer at Spotify on the Observability team. She is currently working on maintaining the monitoring and alerting stack, as well as implementing tracing.

Monday November 18, 2019 5:38pm - 5:43pm PST
Exhibit Hall AB - San Diego Convention Center Ground Level