Uber started using docker containers at scale in 2015, and has gone through a few generations of cluster management and service discovery technologies. In early 2019, we started working on migration from Mesos to Kubernetes to support secure service mesh and machine learning workloads.
This is a complex problem - there are thousands of services, tens of millions of containers to be launched daily while maintaining high machine resource utilization. To that end, a lot of customizations are built into our Kubernetes stack including elastic resource sharing, oversubscription, fast rollback and deploy, changes to service discovery and attestation etc.
This talk will cover: - Overview of Uber Compute Infra - API server benchmark and tweaks - Custom controller and scheduler logic - CRI: resource, health check, logging, isolation - SPIRE and service discovery setup at Uber
The Domain Name System (DNS) is the component that provides the most vital piece of information for one to locate and communicate with services running in a Kubernetes cluster. This technology provides a set of features for name resolution, service discovery, metrics collection, query tracing, etc. However, this is only sufficient to satisfy the requirements of traditional workloads, and modern enterprises demand more.
In this talk, we will discuss the state-of-the-art in the modern enterprise in the context of the Kubernetes DNS. We will present use-cases like extensive aliasing, multi-tenancy, security, etc. that stretch the capabilities of currently available DNS solutions like CoreDNS, Kube-DNS, etc. We will then examine possible approaches to solve these challenges and see where these technologies fall short and how they could be improved.
Deepa Kalani is a Staff Engineer at VMWare, responsible for development of service mesh technologies with a focus on Istio and Envoy integrations for the enterprise. Prior to VMware, Deepa held various engineering roles at PLUMgrid and Cisco Systems.
Venil Noronha is an engineer with the Tanzu Service Mesh team at VMware. He also contributes upstream to open source projects in the service mesh domain, like Istio and Envoy proxy. In the past, he has contributed to several open source projects including Kubernetes, Spring, and... Read More →
Stateful, scalable storage on Kubernetes is an unsolved problem. Creating it as a service is even more difficult. The cloud-native ecosystem offers many tools such as the operator-sdk, Prometheus, Grafana, etcd, Vitess, and much more, but integrating them isn't necessarily intuitive.
Two of PlanetScale's employees that have engineered and managed the project describe the journey of leveraging all of these open source technologies to build out a database as a service on Kubernetes.
Abhi is a confused economist who enjoys writing backend code for various parts of PlanetScale's Vitess management software. In his spare time he is a DJ, podcast host, and competitive Super Smash Bros. player.
Kubernetes is complicated. Instrumenting it can be worse. Measuring the components of a distributed system shouldn't be as daunting as being asked to weigh a literal cloud.
In this talk, we'll go over the components of a Kubernetes control-plane and show you where to look to figure out what is actually happening. We will show you common cluster issues and how they would look in your instrumentation, so that you can more effectively diagnose clusters.
Starting in version 1.14, Kubernetes metrics were overhauled to provide consistent, high quality metrics. Han Kang and Elana Hashman will go over the changes and the potential ingestion implications of this overhaul and how it may affect you.
Han Kang is a Senior Staff Software Engineer at Google. Han co-chairs SIG instrumentation while also participating in SIG API Machinery, focusing on operational aspects of managing Kubernetes clusters.
Elana Hashman currently works for Red Hat as a Principal Software Engineer on the OpenShift Container Platform Node Team, working upstream in Kubernetes SIG Node. Previously, she served as an SRE and technical lead on Azure Red Hat OpenShift. She is a subproject lead for the SIG Node... Read More →
In the span of two years, Spotify went from two developers investigating what a potential migration to Kubernetes might involve to having an internal, multi-tenant offering of Kubernetes become generally available for all its developers as the new, primary runtime offering.
Spotify has previously given talks on the earlier bootstrapping, experimentation, alpha, and beta phases of this migration process. However, this talk will focus on the latter work involved in bringing the internal offering of Kubernetes “across the finish line.” The talk will cover what was required to bring the offering to general availability, including work shoring up scalability and reliability via a multicluster strategy, DIRT testing, operational metrics and alerts. This talk will also cover the technical and process elements involved in designing a successful self-service migration experience for developers.
James Wen is a senior site reliability engineer at Spotify, where he’s currently focused on revamping Spotify’s runtime infrastructure. Previously, James was the team lead (anchor) of the Cloud Foundry Buildpacks team at Pivotal and served as a core contributor and maintainer... Read More →
Please bring your laptop fully charged as we will have limited charging stations available in the room.
Your Kubernetes application is running well, and then all of a sudden the service stops responding. How do you debug? You created a deployment but its not coming up. Is your pod status shown as pending? How do you debug deployments and pods, get their logs, see the filesystem layout? Horizontal Pod Autoscaler is not scaling pods. Is your cluster running out of capacity? Or are the metrics not available? Having DNS lookup failures for services? Is your PVC status shown pending? Is kubectl not able to find nodes? This session will be loaded with different ways your applications on k8s crash and burn, and more importantly to recover from them.
Vice President and General Manager, Open Ecosystem, Intel
Arun Gupta is vice president and general manager of Open Ecosystem Initiatives at Intel Corporation. As an open source strategist, advocate, and practitioner for nearly two decades, Arun has taken companies such as Apple, Amazon, and Sun Microsystems through systemic changes to embrace... Read More →
Please bring your laptop fully charged as we will have limited charging stations available in the room.
Service mesh is often presented as a solution for network engineering and system operability, increasing security, reliability, and observability. However, service mesh is also an incredibly useful tool for developers, and understanding how to leverage this technology can dramatically simplify your day to day workflow.
By leveraging free and open-source tools and a scenario-based approach, we will illustrate how a service mesh can help with application resilience, observability, and debugging.
By the end of this workshop you will understand: How to use metrics and distributed tracing effectively Reliability patterns like retries, timeouts, and circuit breaking How to leverage Canary deployments How you can effectively debug distributed systems
The cloud-native, open-source technology used in this tutorial include: Envoy Prometheus Gloo shot Consul Service Mesh Loop Squash Open Census
Nic Jackson is a developer advocate at HashiCorp, and the author of “Building Microservices in Go”, a book which examines the best patterns and practices for building microservices with the Go,
Christian Posta (@christianposta) is VP, Global Field CTO at Solo.io. He is the author of Istio in Action as well as many other books on cloud-native architecture and is well known in the cloud-native community for being a speaker, blogger (https://blog.christianposta.com) and contributor... Read More →
Please bring your laptop fully charged as we will have limited charging stations available in the room.
Is your Kubernetes cluster able to resist the most common attacks? And, are all the necessary detection mechanisms in place to know if a security issue did occur?
In this hands-on workshop, the instructors will dive into the art and science of Kubernetes security through a series of interactive attack and defense scenarios. Attendees will learn through instructor-led exercises how to identify and exploit realistic misconfigurations in Kubernetes clusters to achieve full cluster compromise. Each attack step will be matched with hardening measures and specific methods for detection and response workflows.
Each workshop attendee will be provided with a pre-configured Kubernetes cluster running realistic workloads in a cloud-based lab environment. The tools and methodologies covered by these exercises will directly help attendees secure their own organization's clusters.
Peter Benjamin is a Software Engineer with a background in Security and a co-organizer for the San Diego Kubernetes and Go meet-ups. He has a passion for enabling engineers to build secure and scalable applications, services, and platforms on modern distributed systems.
Brad Geesaman is a Staff Security Engineer at Ghost Security and focuses on researching and building cloud-native systems with a security practitioner's mindset. When he’s not hacking on containerized environments, he enjoys spending time with his family in Virginia, eating Mexican... Read More →
Jimmy Mesta is the Co-Founder and CTO at KSOC. He is a veteran security engineering leader focusing on building cloud-native security products. Prior to KSOC, Jimmy held senior leadership positions at a number of enterprises including Signal Sciences (acquired by Fastly) where he... Read More →
Tabitha Sable never met a system she didn't want to take apart. She serves the Kubernetes community as co-chair of SIG Security and a member of the Security Response Committee. At work, Tabitha leads Runtime Infrastructure Security at Datadog. She writes exploits, hardens infrastructure... Read More →