Case Studies [clear filter]
Tuesday, November 19

10:55am PST

Kubernetes at Cruise: Two Years of Multitenancy - Karl Isenberg, Cruise
Cruise has been working on self-driving cars for six years and growing exponentially for most of that time. Two years ago they started using Kubernetes, betting on namespace-level multitenancy to provide isolation between teams and projects. Today they have over 40 internal tenants, 100,000 pods, 4,000 nodes, and… an embarrassing number of KubeDNS replicas.

This session will take you through the motivations, story, and results of migrating to multitenant Kubernetes, along with some hard-earned Pro Tips from the trenches.

You’ll also learn about the open source tooling they built around Spinnaker, Vault, Google Cloud, and Istio in order to integrate with our multitenant Kubernetes.

Come see how they went from barely isolated to very isolated and saved a few million dollars doing it!

avatar for Karl Isenberg

Karl Isenberg

Anthos Solutions Architect, Google
Karl Isenberg is on the Blueprint Solutions team at Google. Prior to Google Karl lead the PaaS team at Cruise. Before that, Karl worked on the vendor side on container platforms for more than 5 years as a committer on Kubernetes, DC/OS, and CloudFoundry at Mesosphere and Pivotal... Read More →

Tuesday November 19, 2019 10:55am - 11:30am PST
Room 6F - San Diego Convention Center Upper Level
  Case Studies

11:50am PST

Scaling Resilient Systems: A Journey into Slack's Database Service - Rafael Chacon & Guido Iaquinti, Slack
Monitoring and observability are important concepts, especially in complex and distributed systems. Redundancy and defensive programming are important as well, but sometimes they are not enough. Designing systems to minimize the blast radius when the unexpected happens is often the key.

In this talk, Rafael and Guido will share an overview about how Slack designed, built, scaled and then iterated to improve its distributed database service based on top of Vitess, now a CNCF project. The Databases team at Slack scaled a Vitess cluster from 0 to spikes of 2.7 Million queries per second. This journey has taught us how to operate a database cluster with more than 2000 nodes and expecting to growth to more than 3500 in the next 12 months.

avatar for Guido Iaquinti

Guido Iaquinti

Site Reliability Engineer, Freelance
Guido is a system engineer with academic background and experience in high volume/high availability Internet architectures. He is a technology enthusiast excited about open source software. His passion is to develop, scale and automate complex systems.
avatar for Rafael Chacon

Rafael Chacon

Engineer, Slack
Rafael Chacon is a Staff Software Engineer on the infrastructure team at Slack, where he is working on the MySQL database layer on top of Vitess. Rafael has been part of the team that has migrated more than 30% of Slack database traffic from MySQL to Vitess. He is also now a core... Read More →

Tuesday November 19, 2019 11:50am - 12:25pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

2:25pm PST

Running Istio and Kubernetes On-prem at Yahoo Scale - Suresh Visvanathan & Mrunmayi Dhume, Verizon
At Yahoo!, there are 18+ production grade Kubernetes(K8s) clusters and my team operates one of the largest on-prem K8s clusters handling 150K+ containers, 500+ applications and serving 1Million+ request per second. Mission critical Applications, such as Yahoo! Sports/Finance/Home are deployed and enabled by K8s/Istio platforms. The journey started 2 years ago as a ‘proof of concept’ with K8s and signing up for ‘early engagement program’ with Istio team to adopt Istio/Envoy to modernize our stack and move towards micro service architecture. During this journey, 1.Build Identity platform which provide unique identity for workloads 2.Enabled workload with sidecar envoy proxy and integrated with in-house Custom CA & RBAC for authN/Z 3. Build tools to manage both Istio & K8s cluster at scale.This talk will detail how K8s and Istio/Envoy used to deploy/secure/connect workloads @ Yahoo Scale.


Suresh Visvanathan

Sr Architect, Verizon Media
Suresh Visvanathan, Sr Architect, has over 13 years of experience in IT and Software. Suresh’s current responsibilities include the architecture, vision, strategy and design of cloud platform as-a-service (PaaS). Suresh has been architecting solutions and building products around... Read More →

Mrunmayi Dhume

Principal Software Engineer, Verizon Media (Yahoo)
Mrunmayi Dhume is a Principal Software Engineer in the Core Infrastructure team at Verizon Media. She is part of the team responsible for providing L3/L4 routing solutions and leads the design and implementation of the routing layer and identity provider system components for Kubernetes... Read More →

Tuesday November 19, 2019 2:25pm - 3:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

3:20pm PST

10 Weird Ways to Blow Up Your Kubernetes - Melanie Cebula & Bruce Sherrod, Airbnb
It’s a brand new world in infrastructure with the advent of microservices, containerization, Kubernetes, and service mesh. And all is well. Or is it? Find out how easy it is to break container runtimes, abuse your service mesh, and take all of your production services down-- the results will surprise you! In the last year Airbnb scaled up to over 700 services in Kubernetes, running on all types of workloads across 1000s of nodes and dozens of clusters. We’ve learned a lot along the way and have some of our favorite stories to share-- from weird bugs, to hacky workarounds, to serious downtime. Favorites include:
- “Just what is the autoscaler doing”?
- “Knock knock, It’s Kube-DNS”
- “Whose PID is it anyway”?
and more!


Melanie Cebula

Staff Software Engineer, Airbnb
Melanie Cebula is an expert in Cloud Infrastructure, where she is recognized worldwide for explaining radically new ways of thinking about cloud efficiency and usability. She is an international keynote speaker, presenting complex technical topics to a broad range of audiences, both... Read More →

Bruce Sherrod

Software Engineer, Airbnb
Bruce Sherrod is a software engineer on the service orchestration team at Airbnb.

Tuesday November 19, 2019 3:20pm - 3:55pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

4:25pm PST

Making an Internal Kubernetes Offering Generally Available - James Wen, Spotify
In the span of two years, Spotify went from two developers investigating what a potential migration to Kubernetes might involve to having an internal, multi-tenant offering of Kubernetes become generally available for all its developers as the new, primary runtime offering.

Spotify has previously given talks on the earlier bootstrapping, experimentation, alpha, and beta phases of this migration process. However, this talk will focus on the latter work involved in bringing the internal offering of Kubernetes “across the finish line.” The talk will cover what was required to bring the offering to general availability, including work shoring up scalability and reliability via a multicluster strategy, DIRT testing, operational metrics and alerts. This talk will also cover the technical and process elements involved in designing a successful self-service migration experience for developers.

avatar for James Wen

James Wen

Senior Site Reliability Engineer, Spotify
James Wen is a senior site reliability engineer at Spotify, where he’s currently focused on revamping Spotify’s runtime infrastructure. Previously, James was the team lead (anchor) of the Cloud Foundry Buildpacks team at Pivotal and served as a core contributor and maintainer... Read More →

Tuesday November 19, 2019 4:25pm - 5:00pm PST
Room 30ABCDE - San Diego Convention Center Upper Level
  Case Studies
Wednesday, November 20

10:55am PST

How Spotify Migrated Ingress HTTP Systems to Envoy - Erica Manno & Vladimir Shakhov, Spotify
Erica and Vladimir are on the team responsible for perimeter systems that sit between Spotify’s clients and its backend services. They started unifying those systems from a range of different technologies and protocols to a solution based on Envoy proxies and a unified control plane.

This talk introduces Spotify’s vision for the next-gen perimeter. However, it will mainly focus on the migration of all HTTP ingress traffic, handled by a brittle, custom Nginx/HAProxy setup to an Envoy-based solution.

The speakers will discuss how they’re migrating multiple high volume web services, serving millions of requests/sec, with minimum disruptions and zero-downtime for the feature teams that maintain Spotify’s backend services.

This talk will also illustrate how Spotify’s engineering culture of loosely coupled but highly aligned teams has informed the decisions taken during the migration.


Erica Manno

Senior Engineer, Spotify
Erica Manno is a Software Engineer on Spotify's Infrastructure and Operations department in Stockholm, Sweden. Her team maintains and operates critical infrastructure that handles all ingress and egress traffic at the edge of Spotify's network. Apart from that Erica is a dedicated... Read More →
avatar for Vladimir Shakhov

Vladimir Shakhov

Software engineer, Spotify
Vladimir is a software engineer. He works on Spotify's Infrastructure and Operations team in Stockholm, mainly focused on clients to backend messaging. Vladimir previously worked at Yandex, where he helped develop task tracking product offering. He is a geek and has a dog.

Wednesday November 20, 2019 10:55am - 11:30am PST
Exhibit Hall AB - San Diego Convention Center Ground Level
  Case Studies

11:50am PST

Case Study: AI-as-a-Service on Kubernetes at Scale and In Production - Itay Gabbay, Israel Ministry of Defense (MOD) & Tushar Katarki, Red Hat
AI is popular and yet faces two big challenges in the industry: 1) self-service and automation 2) Use in real production.

At the Israel Ministry of Defense we are taking on the challenges with containers and Kubernetes. We have built AI-as-a-service with open source tools and Kuberentes. Our Data Scientists use the service for data, experimentation and to deliver models into production iteratively with self-service and automation.

Using Kubernetes, we are able to run massive machine learning pipelines automatically, and improve our machine learning models. We implemented several principles of AutoML - a wide research area nowadays. Using AutoML & Kubernetes, we can further improve our machine learning models and pipelines - automatically.

Come find out how we built our AI service on Kubernetes, issues we ran into and best practices with a live demo and supporting slides.

avatar for Tushar Katarki

Tushar Katarki

Product Manager, Red Hat
Tushar Katarki is a senior technology professional with experience in cloud architecture, product management and engineering. He is currently at Red Hat as a product manager for OpenShift with focus on AI/ML on OpenShift . Tushar is involved with several open source projects around... Read More →

Itay Gabbay

Machine Learning Engineer, MOD Israel
Itay Gabbay is a software engineer specialized in machine learning and AutoML. He is currently at the Israeli ministry of defense, responsible for a machine learning platform he designed and implemented, based on OpenShift.

Wednesday November 20, 2019 11:50am - 12:25pm PST
Room 6C - San Diego Convention Center Upper Level
  Case Studies

2:25pm PST

Moving from Legacy Infrastructure to the Cloud in a Government Organization - Chris Carty, City Of Ottawa
Cloud native tech isn’t just for start-ups. But, if you’re in a government organization looking to go cloud native, you can expect to face extra challenges. How can you select the best tools that will work with the processes you already have? What new skills are needed? How do you train staff? How to get anyone to actually use the framework once it’s in place? How to even start?

The City of Ottawa (yes the capital of Canada) was an organization that started applying DevOps practices just a few years ago. It now has a Kubernetes platform with fully automated CICD pipelines being used by multiple teams and growing. Using The City as a case study, we will examine the common issues faced by government organizations and how The City developed workable solutions on its cloud native journey.

avatar for Chris Carty

Chris Carty

Customer Engineer, Google Cloud
He is a Certified Kubernetes Administrator, Certified Kubernetes Application Developer, panelist for the Kubernetes Office Hours and a member of the Kubernetes 1.16 /1.17 Release Notes teams.

Wednesday November 20, 2019 2:25pm - 3:00pm PST
Room 6F - San Diego Convention Center Upper Level
  Case Studies

3:20pm PST

Panel: Improving and Managing Kubernetes at Scale - Xiang Li, Alibaba; Corin Dwyer, Netflix; Amit Bose, Uber; June Liu & Harry Zhang, Pinterest
Companies like Alibaba, Uber, and Pinterest are managing a huge fleet of machines with demanding and complicated workload. To evolve our infrastructure and adopt Kubernetes, we faced many challenges around scalability, reliability, flexibility and operationality. And today, after overcame those difficulties, we are running some of the largest Kubernetes clusters in the world.

In this panel, we would like to share our real world experience on improving and managing Kubernetes with harsh requirements. We believe the stories are interesting themselves, and many of the lessons we learned also apply to small-mid size cluster operators and users.


Amit Bose

Senior Software Engineer II, Uber

June Liu

Staff Software Engineer, Pinterest Inc
After spending years in large organization, June joined Pinterest to explore the vast ocean of open source and start up spirit. Her interests focus on container orchestration, large scale cluster operations and developer tools. She currently works on the compute platform team at Pinterest... Read More →
avatar for Xiang Li

Xiang Li

Senior Staff Engineer, Alibaba
李响,阿里云智能资深技术专家,负责阿里巴巴大规模集群调度与管理系统,帮助阿里巴巴通过云原生技术初步完成了基础架构的转型,实现了资源利用率与软件的开发和部署效率的大幅提升,并同步支撑了云产品的技术演进。CNCF... Read More →

Harry Zhang

Software Engineer, Pinterest
Harry is a Software Engineer from Pinterest working on its Kubernetes based next generation container cloud. Harry is interested in large scale cluster management solutions and related technologies. Harry is currently a Kubernetes contributor and a CNCF Certified Kubernetes Administrator... Read More →

Corin Dwyer

Senior Software Engineer, Netflix
Corin Dwyer is a senior software engineer within the Netflix compute platform development team. Before working on Titus, Netflix's container platform, he worked on infrastructure engineering for the Netflix content organization and before that in healthcare. He has worked across the... Read More →

Wednesday November 20, 2019 3:20pm - 3:55pm PST
Room 6C - San Diego Convention Center Upper Level

4:25pm PST

Cruise’s Self-Driving Networking Journey - Bernard Van De Walle & Can Yucel, Cruise
Through its exponential growth, the Platform team at Cruise experienced a networking self-driving journey. We scaled our network across numerous clusters, multiple tenants, and multiple thousands of new pods instances a day.

We will take you on a tour of our architecture and you will get a better understanding of how we choose to configure our network and security in order to support Kubernetes loads across multiple regions and multiple environments. We will specifically showcase how we do this on a public cloud (GCP) even though similar results could be achieved on-prem.

You will come out of this session with concrete examples on what it takes to build your network and security needs for internal tenants at scale while keeping internal stakeholders happy (Platform, security and networking).

avatar for Can “Jon” Yucel

Can “Jon” Yucel

Senior Software Engineer, Cruise
Can “Jon” Yucel is a software engineer and technical lead of the PaaS Traffic team at Cruise with the primary focus of internal/external/multi-cluster load balancers, service meshes, hybrid DNS and platform level networking.
avatar for Bernard Van De Walle

Bernard Van De Walle

Principal traffic engineer, Splunk
Bernard is a traffic engineer at Splunk. He is leading the Istio and service Mesh efforts as part of the traffic engineering team. Before this, Bernard had experiences with operations for large scale deployments of Kubernetes and reverse proxies such as Envoy and Nginx.

Wednesday November 20, 2019 4:25pm - 5:00pm PST
Room 11AB - San Diego Convention Center Upper Level
  Case Studies

5:20pm PST

Education as a Service: Containerization and Orchestration of CS50 IDE - Kareem Zidane & David J. Malan, Harvard University
CS50 is Harvard University's introductory course in computer science, freely available as OpenCourseWare, with hundreds of students on campus and more than one million registrants online. So that students have a uniform environment with which to begin programming (without client-side technical difficulties in the way), the course provides CS50 IDE, a free, cloud-based solution.

To minimize cost and avoid homegrown orchestration of VMs, the course transitioned to pods, one container per student. But the migration was not without challenges. How to provide users with ephemeral containers but persistent storage? How to proxy arbitrary ports to students' own web services? And, ultimately, how to provide students with the abstraction of their own machine, without k8s-specific implementation details clouding their own understanding thereof? In this talk, CS50's own solutions thereto.

avatar for David J. Malan

David J. Malan

Gordon McKay Professor of the Practice of Computer Science, Harvard University
Dr. David J. Malan is Gordon McKay Professor of the Practice of Computer Science at Harvard University. He teaches Computer Science 50, otherwise known as CS50, which is Harvard University's largest course, one of Yale University's largest courses, and edX's largest MOOC. He also... Read More →
avatar for Kareem Zidane

Kareem Zidane

Software Engineer, Harvard University
Kareem Zidane is a software developer, system administrator, and teaching fellow for CS50 at Harvard University. He is a self-taught programmer from Egypt who discovered computer science, including CS50 itself, online. He is the chief architect of CS50 IDE.

Wednesday November 20, 2019 5:20pm - 5:55pm PST
Pacific Ballroom, Salon 14-15 - Marriott Marquis San Diego Marina Hotel
  Case Studies
Thursday, November 21

10:55am PST

Balancing Power and Pain: Moving a Startup From a PaaS to Kubernetes - David Sudia, GoSpotCheck & Toni Rib, Gusto
By hiding a lot of complexity and allowing a team to move fast and simply "heroku push" applications, PaaS solutions like Heroku are a perfect fit when you are early stage startup. However, what do you do when your business starts to get traction, and your scale or use case begins to stretch the limitations of a PaaS? This talk will share the story of a startup's successful migration away from a PaaS to a self-built platform powered by CNCF technology.

We'll share the highlights of our journey, such as how we translated PaaS concepts to our new infrastructure, and explain the series of choices we made, like assembling our platform from Kubernetes and other CNCF components. We will also share some of our difficulties, with the goal that other organisations can avoid making the same mistakes.

avatar for David Sudia

David Sudia

Senior DevOps Engineer, GoSpotCheck
David Sudia is a former educator turned developer turned DevOps Engineer. He's passionate about supporting other developers in doing their best work by making sure they have the right tools and environments. In his day to day he's responsible for managing Kubernetes clusters, deploying... Read More →
avatar for Toni Rib

Toni Rib

Software Engineer, Gusto
Toni Rib is a Software Engineer at Gusto. While she focuses mainly on application development, she isn't happy unless she understands not only the application she's developing, but also the database and infrastructure it relies on. This resulted in her being named "honorary DevOps... Read More →

Thursday November 21, 2019 10:55am - 11:30am PST
Room 14AB - San Diego Convention Center Mezzanine Level
  Case Studies

11:50am PST

Security Beyond Buzzwords: How to Secure Kubernetes with Empathy? - Pushkar Joglekar, Visa
Your developers are excited about containerizing their apps for elastic scaling. Your operations team is busy drooling over resource optimizations and cost savings that are predicted with a move away from giant VMs to tiny containers. The security person assigned to review this is, utterly clueless when words like multi-tenancy, service meshes, CRI, CNI and kubectl are thrown around.
In this presentation, Pushkar Joglekar will share his real world experience of being that security person four years ago, to becoming the "go-to" security person for his Ops & Dev teams today. By using a simple formula of risk = likelihood * severity, we will prove that not all vulnerabilities are created equal and how “secure by design” Kubernetes deployments, can reduce the likelihood and surface area of a possible attack exploiting any vulnerabilities.

avatar for Pushkar Joglekar

Pushkar Joglekar

Security Engineer, Visa
Pushkar Joglekar is a Security Engineer who is the first ever open source contributor for his current company. He has architected several “secure by design” large scale containerized deployments in the last four years. This is his first attempt to speak on a topic that he has... Read More →

Thursday November 21, 2019 11:50am - 12:25pm PST
Room 17AB - San Diego Convention Center Mezzanine Level
  Case Studies

2:25pm PST

Gone in 60 Minutes: Migrating 20 TB from AKS to GKE in an Hour with Vitess - Derek Perkins, Nozzle
The holy grail of Cloud Native tech is to have zero vendor lock-in. That becomes extra challenging when dealing with stateful applications. By leveraging out of the box Kubernetes and Vitess features, Derek and his team were able to migrate a high throughput production workload of 20 TB from Azure (AKS) to Google (GKE) in under an hour. This workload consisted of dozens of services writing to MySQL, including heavy usage of the under-marketed pub/sub style message queue feature of Vitess. Derek will go into detail about the public Helm charts that were used to set up these workloads and how Kubernetes and Vitess were configured. We will also touch on a few ecosystem projects like external-dns, cert-manager that helped make the transition low-touch and seamless.

avatar for Derek Perkins

Derek Perkins

Founder & CEO, Nozzle
Derek is the Founder and CEO of Nozzle, an enterprise rank tracking solution that helps companies understand where they and their competitors rank on Google and other search engines. He has been an evangelist for Vitess since it was open sourced, speaking about it often and was responsible... Read More →

Thursday November 21, 2019 2:25pm - 3:00pm PST
Room 14AB - San Diego Convention Center Mezzanine Level

3:20pm PST

Kubernetes at Reddit: Tales from Production - Greg Taylor, Reddit, Inc
This talk is the EAGERLY-anticipated sequel to last year's "Kubernetes at Reddit: An Origin story". Whereas the saga's first installment focused on early results, thoughts, and a rough higher-level vision, this year's edition serves as a retrospective for how it all shook out over Reddit's last year of rapid Kubernetes adoption.

The audience will hear of successes, share in the heartbreak of production explosions, and gain insight into what has and hasn't worked well for one of the world's busiest web properties. Topics covered include:

* A brief recap of InfraRed, our internal Infrastructure product
* How org-wide adoption has progressed
* Scaling challenges (Infrastructure and Inter/Intra-team)
* Fires, near-misses, and outages, oh my!
* Successes and celebration
* Lingering questions and challenges
* The impact of Kubernetes at Reddit

avatar for Greg Taylor

Greg Taylor

Engineering Manager, Reddit, Inc
Greg Taylor leads the Release Engineering team within the Reddit's Infrastructure division. He and his team steward the internal Kubernetes-based infrastructure product (InfraRed) and build tooling and process to empower service owners to get their ideas to production. Greg has recently... Read More →

Thursday November 21, 2019 3:20pm - 3:55pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

4:25pm PST

Tinder's Move to Kubernetes - Chris O'Brien & Chris Thomas, Tinder
Almost 2 years ago, Tinder decided to move its platform to Kubernetes. Kubernetes afforded us an opportunity to drive Tinder Engineering toward containerization and low-touch operation through immutable deployment. Application build, deployment, and infrastructure would be defined as code.

We were also looking to address challenges of scale and stability. When scaling became critical, we often suffered through several minutes of waiting for new EC2 instances to come online. The idea of containers scheduling and serving traffic within seconds as opposed to minutes was appealing to us.

During our migration in early 2019, we reached critical mass within our Kubernetes cluster and began encountering various challenges due to traffic volume, cluster size, and DNS. We solved interesting challenges to migrate 200 services and run a Kubernetes cluster at scale.  


Chris O'Brien

Senior Engineering Manager, Tinder
Chris is a Software Engineer who works in Cloud Infrastructure—Kubernetes, CI/CD, AWS, Linux, Automation and Configuration Management (Terraform, Ansible, Chef, Puppet), and other open source technologies.

Chris Thomas

Engineering Manager, Tinder
Chris is an Engineering Manager for Tinder Cloud Infrastructure. He leads the Resiliency team, which is responsible for much of the infrastructure powering the Tinder backend platform, as well as Observability.

Thursday November 21, 2019 4:25pm - 5:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

5:20pm PST

Creating a Micro Open-Source Community with Helm - Katie Gamanji, Condé Nast International
For over a century Condé Nast International has set the benchmark for print and digital publishing. Our portfolio is composed of luxury and fashion-oriented brands, like Vogue, GQ, Wired, Glamour and many more. Condé Nast International is a digital-first company, targeting to migrate 34 out of 62 existing websites to the Kubernetes clusters across the globe.

Kubernetes underpins Condé Nast International's entire infrastructure, and Helm is used as the de facto deployment package manager. These two components were critical for the delivery of the highest developer experience.

In time, the development teams became self-sufficient and started to contribute to the base Helm charts instead of going the feature requests route. This created a substantial and agile environment for developers, being able to instigate changes and contribute to the internal developer community.

avatar for Katie Gamanji

Katie Gamanji

Senior Field Engineer @ Apple, Apple
Katie is a cloud-native leader, practitioner, and contributor, currently in a Senior Field Engineer role at Apple and a TOC for CNCF. As a cloud platform engineer, Katie has contributed to the buildout of infrastructure at Conde Nast, and American Express, gravitating towards cloud-native... Read More →

Thursday November 21, 2019 5:20pm - 5:55pm PST
Room 14AB - San Diego Convention Center Mezzanine Level
  Case Studies

Filter sessions
Apply filters to sessions.