Back To Schedule
Tuesday, November 19 • 4:25pm - 5:00pm
Measuring and Optimizing Kubeflow Clusters at Lyft - Konstantin Gizdarski, Lyft & Richard Liu, Google

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Machine learning workloads are often resource-intensive operations. As companies adopt more of these workloads, tracking resource consumption and optimizing spending becomes more challenging.

At Lyft, we developed a system which scrapes metrics from Kubernetes clusters and persists them in data warehouses. We then built a pipeline that transforms snapshots into cluster utilization metrics along the dimensions of CPU, memory, and GPU. Finally we join these metrics into our cost and usage dataset, so teams can budget resources accordingly and reduce spending.

In this talk, we will give an overview of Infraspend - our infrastructure for tracking Kubernetes usage. Attendees will learn how the data we collected helped Lyft reduce spending for Kubeflow clusters. The audience will also gain insights into how Kubernetes clusters can be optimized without performance or stability compromises.

avatar for Richard Liu

Richard Liu

Senior Software Engineer, Google
Richard Liu is a Senior Software Engineer at Google Cloud. He is currently an owner and maintainer of the TensorFlow operator and Katib projects in Kubeflow. Previously he had worked as a software developer at Microsoft Azure.
avatar for Konstantin Gizdarski

Konstantin Gizdarski

Software Engineer, Lyft
Konstantin Gizdarski is a Software Engineer at Lyft, where he has been working on — among other things — surfacing the utilization and efficiency of Kubernetes infrastructure. Previously, he has worked on machine learning and product at both Facebook and Stripe.

Tuesday November 19, 2019 4:25pm - 5:00pm PST
Room 6C - San Diego Convention Center Upper Level
  Machine Learning + Data