Ballroom Sec 20AB - San Diego Convention Center [clear filter]
Tuesday, November 19

10:55am PST

Building Reusable DevSecOps Pipelines on a Secure Kubernetes Platform - Steven Terrana, Booz Allen Hamilton & Michael Ducy, Sysdig
Onboarding development teams can often be the critical point in determining if a team will adopt modern Cloud Native and DevSecOps practices. If there is too much friction for developers to build, scan, and test their applications or to secure their application environments then these best practices are often pushed aside. In this talk we’ll cover how we automated the creation of a trusted software supply chain. Through a live demonstration, we will show how this approach accelerates adoption by allowing developers to inherit a preconfigured pipeline performing various security tests (and underlying tooling) as well as safeguards (via the CNCF Sandbox project Falco) put in place to monitor production workloads for security problems.

avatar for Steven Terrana

Steven Terrana

Chief Engineer, Booz Allen Hamilton
Steven is a Chief Engineer at Booz Allen Hamilton focused on building reusable capabilities for the Firm and industry. He uses these capabilities to help organizations adopt all things modern software delivery: DevSecOps, Cloud Infrastructure, Container Orchestration, and Microservice... Read More →
avatar for Michael Ducy

Michael Ducy

Director of Open Source, Sysdig
Michael Ducy started his technology journey at a young age. Always curious, he was once threatened that he’d never have toys bought for him again if he didn’t stop taking them apart to see how they worked. His first workbench was given to him at the age of 5. His first programming... Read More →

Tuesday November 19, 2019 10:55am - 11:30am PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

11:50am PST

Scaling Resilient Systems: A Journey into Slack's Database Service - Rafael Chacon & Guido Iaquinti, Slack
Monitoring and observability are important concepts, especially in complex and distributed systems. Redundancy and defensive programming are important as well, but sometimes they are not enough. Designing systems to minimize the blast radius when the unexpected happens is often the key.

In this talk, Rafael and Guido will share an overview about how Slack designed, built, scaled and then iterated to improve its distributed database service based on top of Vitess, now a CNCF project. The Databases team at Slack scaled a Vitess cluster from 0 to spikes of 2.7 Million queries per second. This journey has taught us how to operate a database cluster with more than 2000 nodes and expecting to growth to more than 3500 in the next 12 months.

avatar for Guido Iaquinti

Guido Iaquinti

Site Reliability Engineer, Freelance
Guido is a system engineer with academic background and experience in high volume/high availability Internet architectures. He is a technology enthusiast excited about open source software. His passion is to develop, scale and automate complex systems.
avatar for Rafael Chacon

Rafael Chacon

Engineer, Slack
Rafael Chacon is a Staff Software Engineer on the infrastructure team at Slack, where he is working on the MySQL database layer on top of Vitess. Rafael has been part of the team that has migrated more than 30% of Slack database traffic from MySQL to Vitess. He is also now a core... Read More →

Tuesday November 19, 2019 11:50am - 12:25pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

2:25pm PST

Running Istio and Kubernetes On-prem at Yahoo Scale - Suresh Visvanathan & Mrunmayi Dhume, Verizon
At Yahoo!, there are 18+ production grade Kubernetes(K8s) clusters and my team operates one of the largest on-prem K8s clusters handling 150K+ containers, 500+ applications and serving 1Million+ request per second. Mission critical Applications, such as Yahoo! Sports/Finance/Home are deployed and enabled by K8s/Istio platforms. The journey started 2 years ago as a ‘proof of concept’ with K8s and signing up for ‘early engagement program’ with Istio team to adopt Istio/Envoy to modernize our stack and move towards micro service architecture. During this journey, 1.Build Identity platform which provide unique identity for workloads 2.Enabled workload with sidecar envoy proxy and integrated with in-house Custom CA & RBAC for authN/Z 3. Build tools to manage both Istio & K8s cluster at scale.This talk will detail how K8s and Istio/Envoy used to deploy/secure/connect workloads @ Yahoo Scale.


Suresh Visvanathan

Sr Architect, Verizon Media
Suresh Visvanathan, Sr Architect, has over 13 years of experience in IT and Software. Suresh’s current responsibilities include the architecture, vision, strategy and design of cloud platform as-a-service (PaaS). Suresh has been architecting solutions and building products around... Read More →

Mrunmayi Dhume

Principal Software Engineer, Verizon Media (Yahoo)
Mrunmayi Dhume is a Principal Software Engineer in the Core Infrastructure team at Verizon Media. She is part of the team responsible for providing L3/L4 routing solutions and leads the design and implementation of the routing layer and identity provider system components for Kubernetes... Read More →

Tuesday November 19, 2019 2:25pm - 3:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

3:20pm PST

10 Weird Ways to Blow Up Your Kubernetes - Melanie Cebula & Bruce Sherrod, Airbnb
It’s a brand new world in infrastructure with the advent of microservices, containerization, Kubernetes, and service mesh. And all is well. Or is it? Find out how easy it is to break container runtimes, abuse your service mesh, and take all of your production services down-- the results will surprise you! In the last year Airbnb scaled up to over 700 services in Kubernetes, running on all types of workloads across 1000s of nodes and dozens of clusters. We’ve learned a lot along the way and have some of our favorite stories to share-- from weird bugs, to hacky workarounds, to serious downtime. Favorites include:
- “Just what is the autoscaler doing”?
- “Knock knock, It’s Kube-DNS”
- “Whose PID is it anyway”?
and more!


Melanie Cebula

Staff Software Engineer, Airbnb
Melanie Cebula is an expert in Cloud Infrastructure, where she is recognized worldwide for explaining radically new ways of thinking about cloud efficiency and usability. She is an international keynote speaker, presenting complex technical topics to a broad range of audiences, both... Read More →

Bruce Sherrod

Software Engineer, Airbnb
Bruce Sherrod is a software engineer on the service orchestration team at Airbnb.

Tuesday November 19, 2019 3:20pm - 3:55pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

4:25pm PST

How to Backup and Restore Your Kubernetes Cluster - Annette Clewett & Dylan Murray, Red Hat
Operating Kubernetes clusters introduces many new practices, but does not change the need to be able to backup and recover your applications and data. Yet traditional methods of server backup work poorly with Kubernetes clusters. How can you make sure your cluster is protected? How can persistent data get saved in a manner so there is minimal or no corruption to the application if recovery is required?

In this session we will explore how to use open-source disaster recovery tools you can use today such as Velero and Restic. We’ll also discuss how to use the Noobaa S3 API to reliably save and store backups for all resources including snapshots housed in Rook-Ceph. To prove this is not just smoke and mirrors, we will demonstrate in a live Kubernetes cluster deleting everything in a namespace and then continue on to show complete recovery of all resources and data.

avatar for Annette Clewett

Annette Clewett

Principal Architect, Red Hat
Red Hat Storage Architect with broad knowledge across a spectrum of technologies – network, storage, virtual, and platform. Have successfully delivered countless studies that improved end-user experience and created a more efficient and available infrastructure. Current projects... Read More →
avatar for Dylan Murray

Dylan Murray

Senior Software Engineer, Red Hat
Red Hat Software Engineer

Tuesday November 19, 2019 4:25pm - 5:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
Wednesday, November 20

10:55am PST

Running Large-Scale Stateful Workloads On Kubernetes at Lyft - Surinder Singh & Anmol Khurana, Lyft
Along with core services, K8s at Lyft also forms the base to run a large variety of data processing stateful data processing jobs which includes Spark, Flink and other jobs via various ML and Data processing pipelines.

At Lyft, K8s has become the driver for the majority of our data processing needs running 10s of thousands of concurrent jobs. Operating the platform at this scale presents an unique set of challenges which get more complex with highly variable load pattern.

In this talk, the speakers will share their journey through some of these challenges and learnings.
- Potential pitfalls of running stateful jobs on K8s.
- Knobs/tweaks to optimize K8s for stateful jobs.
- Running k8s in a cloud environment.
- Building a fault-tolerant self-healing system with multiple K8s clusters underneath.

Talk will also focus on optimizations done to support the widely used workloads at Lyft.

avatar for Surinder Singh

Surinder Singh

Software Engineer, Lyft
Surinder Singh is a software engineer at Lyft in Seattle. He led execution plane for Flyte, Lyft’s open-source Machine learning and Data processing pipelines platform. Before Lyft, Surinder was at Microsoft where he worked on Azure Storage and SQL Server Query Optimizer.

Anmol Khurana

Software Engineer, Lyft
Anmol Khurana is a software engineer at Lyft. He is part of Data Platform team responsible for leading effort on Containerized Spark on K8s. Before Lyft, Anmol was at Amazon for 5+ years mostly with AWS Elastic Block Store team.

Wednesday November 20, 2019 10:55am - 11:30am PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

11:50am PST

Doing Things Prometheus Can’t Do with Prometheus - Tim Simmons, DigitalOcean
The current Cloud Native Observability dogma is that metrics (and logs and traces) are “not good enough” and that this brave new world needs brave new Observability tools. This is false.

This session will focus on how to utilize Prometheus and friends to solve problems that are typically cited as limitations. This talk is for anyone interested in learning how Prometheus can solve the majority of your Observability problems, no vendor required.

An outline of this talk is:
- How to thoughtfully utilize existing Observability tools
- Deploying High Availability Prometheus
- Effectively interacting with high-cardinality data
- Long-term metrics storage
- Doing “machine learning” on metrics
- Handling thousands of alerts in a sane way (https://twitter.com/timsimlol/status/1145790451129167872)
- How to measure *everything* with Prometheus
- Fostering a healthy Observability culture with SLOs

avatar for Tim Simmons

Tim Simmons

Senior Engineer, DigitalOcean
Tim Simmons is a Senior Engineer on the Observability Platforms team at DigitalOcean. He primarily cares for DigitalOcean's internal Prometheus infrastructure. On a normal day, he helps his colleagues with PromQL queries, writes custom Prometheus exporters, and builds tools around... Read More →

Wednesday November 20, 2019 11:50am - 12:25pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

2:25pm PST

Liberating Kubernetes From Kube-proxy and Iptables - Martynas Pumputis, Cilium
iptables and Netfilter are the two foundational technologies of kube-proxy for implementing a Service abstraction. They carry legacy accumulated over 20 years of development grounded in a more traditional networking environment that is typically far more static than your average Kubernetes cluster. In the age of containers, they are no longer the best tool for the job, especially in terms of performance, reliability, scalability, and operations.

Companies like Google, Facebook and Cloudflare have long realised this and therefore embraced eBPF as technology, which lets one to dynamically reprogram the kernel. Can we replicate the same success story in Kubernetes?

In this talk, the audience will learn about running a fully functioning Kubernetes cluster without iptables, Netfilter and thus without kube-proxy in a scalable and secure way with the help of eBPF and Cilium.

avatar for Martynas Pumputis

Martynas Pumputis

Software Engineer, Isovalent
Martynas Pumputis is a Software Engineer at Isovalent working on Cilium, eBPF and Linux kernel.

Wednesday November 20, 2019 2:25pm - 3:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

3:20pm PST

Storage on Kubernetes - Learning From Failures - Hemant Kumar & Jan Šafránek, Red Hat
Using persistent storage with Kubernetes has been continuously improved with each release, but getting where we are was not easy. In this talk, we are going to cover a series of war stories and failure scenarios. We will talk about bugs (or design) that resulted in data loss, file system corruption, or storage simply refusing to come up. The limitations of storage subsystems, both what it can and can not do, will also be discussed

These failures have led to numerous enhancements in Kubernetes. We will review the lessons these failures have provided, and discuss how they have been vital to improving our handling of the storage subsystem.

avatar for Jan Šafránek

Jan Šafránek

Principal Software Engineer, Red Hat
Jan is a Principal Software Engineer at Red Hat working on storage aspects of Kubernetes. He started developing Kubernetes more than 4 years ago, and is one of the founding members of SIG-Storage. He’s the author of PersistentVolume controller, dynamic provisioning and StorageClass... Read More →

Hemant Kumar

Principal Software Engineer, Red Hat
Hemant is a Principal Software Engineer at Red Hat working on storage subsystem of Kubernetes. He is a member of SIG-Storage and author of persistent volume expansion, volume limits, mount options and various instrumentation bits in storage subsystems of Kubernetes. He is also a maintainer... Read More →

slides pdf

Wednesday November 20, 2019 3:20pm - 3:55pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

4:25pm PST

Debugging Live Applications the Kubernetes Way: From a Sidecar - Joe Elliott, Grafana Labs
Linux features a number of powerful debugging tools that give us insight into how our applications run in a real environment. Through live demonstration this session will present a straightforward way to begin debugging applications in a Kubernetes native way: from a sidecar. Sidecars offer a low impact way of profiling applications without installing packages or making messy changes to your nodes.

The techniques demonstrated will include recording LTTng events, cpu profiling, generating Flame Graphs and dynamic tracing with BCC. These techniques will be performed against a .NET Core sample application, but that will not be the focus of the session.

avatar for Joe Elliott

Joe Elliott

Backend Engineer, Grafana Labs
Joe Elliott is a Backend Engineer at Grafana Labs. Since Kubernetes 1.5 he has been building and maintaining microservice platforms on AWS for development teams to deploy their applications. Joe maintains several open source applications in github that publish metrics, manage Grafana... Read More →

Wednesday November 20, 2019 4:25pm - 5:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

5:20pm PST

Introducing Metal³: Kubernetes Native Bare Metal Host Management - Russell Bryant & Doug Hellmann, Red Hat
Metal³ (“metal kubed”) is a new open source bare metal host provisioning tool created to enable Kubernetes-native infrastructure management. Metal³ enables the management of bare metal hosts via custom resources managed through the Kubernetes API as well as the monitoring of bare metal host metrics to Prometheus. This presentation will explain the motivations behind creating the project and what has been accomplished so far. This will be followed by an architectural overview and description of the Custom Resource Definitions (CRDs) for describing bare metal hosts, leading to a demonstration of using Metal³ in a Kubernetes cluster.

avatar for Russell Bryant

Russell Bryant

Distinguished Engineer, Red Hat
Russell is a Distinguished Engineer in Service Delivery, leading SD's adoption of OVN across our managed services. Russell also has a long history with OVN, having helped create the project back in 2015 and leading the planning for product teams to take over ownership of OVN by 2... Read More →
avatar for Doug Hellmann

Doug Hellmann

Senior Principal Software Engineer, Red Hat
Doug Hellmann is a Senior Principal Software Engineer at Red Hat. He has been a professional developer since the mid 1990s and has worked on a variety of projects in fields such as mapping, medical news publishing, banking, data center automation, and hardware provisioning. He has... Read More →

Wednesday November 20, 2019 5:20pm - 5:55pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
Thursday, November 21

10:55am PST

Prometheus Deep Dive - Ben Kochie, GitLab
After the Intro session we will go into a mix of advanced use cases, news, and open Q&A with all Prometheus maintainers who are at CloudNativeCon.

avatar for Ben Kochie

Ben Kochie

Contributor, Prometheus Team

Thursday November 21, 2019 10:55am - 11:30am PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Maintainer Track Sessions

11:50am PST

Am I Using It Right? Checking Best Practices on Live Kubernetes Clusters - Varsha Varadarajan & Adam Wolfe Gordon, DigitalOcean
While Kubernetes is stable, best practices for using it are a moving target. Some are generally applicable, others unique to a particular configuration or platform. Following best practices helps ensure workloads stay running as expected through cluster maintenance and upgrades, but checking them can feel like playing whack-a-mole in the dark.

This talk introduces a new open source tool, clusterlint, that checks compliance with best practices. Unlike other linters that work on deployment manifests, clusterlint identifies risks and problems in running Kubernetes clusters, making it useful for finding potential problems before performing cluster maintenance.

We'll discuss what clusterlint checks, why, how it works, how we use it in DigitalOcean's managed Kubernetes product to warn users of danger, and future plans for the tool.

avatar for Adam Wolfe Gordon

Adam Wolfe Gordon

Senior Engineer II, DigitalOcean
Adam Wolfe Gordon is a senior engineer focused on product strategy at DigitalOcean. Among other things, he previously worked as the tech lead for DigitalOcean's Kubernetes and container registry products. Adam is interested in infrastructure products, and likes to spend as much time... Read More →

Varsha Varadarajan

Engineering Intern, DigitalOcean
Varsha is a software engineer currently pursuing a Master's degree in Computer Science. She previously worked at ThoughtWorks in the continuous delivery domain; and as an intern at DigitalOcean on managed Kubernetes, where clusterlint was created. She likes working on Kubernetes related... Read More →

Thursday November 21, 2019 11:50am - 12:25pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

2:25pm PST

The Gotchas of Zero-Downtime Traffic /w Kubernetes - Leigh Capili, Weaveworks
Noticing your customers receive 503's every now-and-then?
Do they spike when you're updating your app or rotating your k8s cluster nodes?
Maybe you used to have this problem -- then you added some strange settings and it's mostly working now…

What most people need from Kubernetes regarding web-traffic is a repeatable but under-documented combo of esoteric, non-default options.

We'll walk through the basic needs of shaping traffic and apply that knowledge to the states of compute, rollout, and canonical networking we see with k8s.
Expect tidbits about CRI, CNI, Ingress, and the design trade-offs present in Kubernetes and its API's.

You’ll leave this session knowing how to keep your apps serving successful requests for a myriad of edge-cases.

avatar for Leigh Capili

Leigh Capili

Developer Experience Engineer, Weaveworks
Leigh is a Kubernetes Contributor and works in Developer Experience with Weaveworks. :wheel_of_dharma: He authored kubeadm's etcd mTLS implementation and is currently working toward k8s component-standards and cluster-addons. Previously, he helped design a functional state-store for... Read More →

Thursday November 21, 2019 2:25pm - 3:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

3:20pm PST

Kubernetes at Reddit: Tales from Production - Greg Taylor, Reddit, Inc
This talk is the EAGERLY-anticipated sequel to last year's "Kubernetes at Reddit: An Origin story". Whereas the saga's first installment focused on early results, thoughts, and a rough higher-level vision, this year's edition serves as a retrospective for how it all shook out over Reddit's last year of rapid Kubernetes adoption.

The audience will hear of successes, share in the heartbreak of production explosions, and gain insight into what has and hasn't worked well for one of the world's busiest web properties. Topics covered include:

* A brief recap of InfraRed, our internal Infrastructure product
* How org-wide adoption has progressed
* Scaling challenges (Infrastructure and Inter/Intra-team)
* Fires, near-misses, and outages, oh my!
* Successes and celebration
* Lingering questions and challenges
* The impact of Kubernetes at Reddit

avatar for Greg Taylor

Greg Taylor

Engineering Manager, Reddit, Inc
Greg Taylor leads the Release Engineering team within the Reddit's Infrastructure division. He and his team steward the internal Kubernetes-based infrastructure product (InfraRed) and build tooling and process to empower service owners to get their ideas to production. Greg has recently... Read More →

Thursday November 21, 2019 3:20pm - 3:55pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

4:25pm PST

Tinder's Move to Kubernetes - Chris O'Brien & Chris Thomas, Tinder
Almost 2 years ago, Tinder decided to move its platform to Kubernetes. Kubernetes afforded us an opportunity to drive Tinder Engineering toward containerization and low-touch operation through immutable deployment. Application build, deployment, and infrastructure would be defined as code.

We were also looking to address challenges of scale and stability. When scaling became critical, we often suffered through several minutes of waiting for new EC2 instances to come online. The idea of containers scheduling and serving traffic within seconds as opposed to minutes was appealing to us.

During our migration in early 2019, we reached critical mass within our Kubernetes cluster and began encountering various challenges due to traffic volume, cluster size, and DNS. We solved interesting challenges to migrate 200 services and run a Kubernetes cluster at scale.  


Chris O'Brien

Senior Engineering Manager, Tinder
Chris is a Software Engineer who works in Cloud Infrastructure—Kubernetes, CI/CD, AWS, Linux, Automation and Configuration Management (Terraform, Ansible, Chef, Puppet), and other open source technologies.

Chris Thomas

Engineering Manager, Tinder
Chris is an Engineering Manager for Tinder Cloud Infrastructure. He leads the Resiliency team, which is responsible for much of the infrastructure powering the Tinder backend platform, as well as Observability.

Thursday November 21, 2019 4:25pm - 5:00pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level
  Case Studies

5:20pm PST

Envoy on Fire: A Practical Look at Debugging a Service Mesh - Lita Cho & Ryan Cox, Lyft
In this talk, presenters will share lessons from several years of experience running Envoy in production at scale. They will explore practical techniques for triaging issues in a service mesh, along with the intuition behind them. The presenters will cover a broad range of topics including traffic capture, issues specific to GRPC, health checks, and techniques useful during incident mitigation. The talk will end with a deep dive into Envoy stats and their use in resolving issues.

avatar for Lita Cho

Lita Cho

Software Engineer, Lyft
Lita is a senior software engineer on the Networking team, building out the service mesh to handle both Kubernetes and legacy systems at Lyft. Before that, she worked on building out the API infrastructure using Protocol Buffers, creating systems that would generate code and bring... Read More →
avatar for Ryan Cox

Ryan Cox

Software Engineer, Lyft
Ryan Cox is a software engineer at Lyft focused on infrastructure resilience. His career includes the creation of large-scale ecommerce platforms and extensive time working on systems and infrastructure. He holds patents related to distributed filesystems and is an active member of... Read More →

Thursday November 21, 2019 5:20pm - 5:55pm PST
Ballroom Sec 20AB - San Diego Convention Center Upper Level

Filter sessions
Apply filters to sessions.