Performance [clear filter]
Wednesday, November 20

10:55am PST

Implementing a Consumer Focused SLA for a Kubernetes Based PaaS - Shrenik Dedhia, Box
Box's (internal) Platform as a Service empowers other Box teams to deliver 100's of micro services, on 1000's of hosts, across 10,000's of pods. As they scaled to support a large number of micro services and clusters, they ran into several scaling challenges around both the control and data planes. In order to deliver a production-grade platform, they realized the need for a Service Level Agreement (SLA) for their platform to not only demonstrate availability for infrastructure, but also "value" for a consumer, and serve as a benchmark to prioritize those challenges.

In this talk, Shrenik Dedhia will present how their team approached the problem of defining a SLA, principles used, options explored, path chosen, and future work to improve the platform's availability from ~99.4% to ~99.99%, thereby improving the overall availability of micro services that power Box.com.

avatar for Shrenik Dedhia

Shrenik Dedhia

Sr. Staff Engineer / TLM, Box
Shrenik has been at Box for about 2yrs as a Sr. Staff Engineer, with total 10+ years of experience in designing and implementing secure and scalable platforms. Shrenik is currently leading the Platform As A Service team at Box.

Wednesday November 20, 2019 10:55am - 11:30am PST
Room 11AB - San Diego Convention Center Upper Level

11:50am PST

Did Kubernetes Make My p95s Worse? - Jian Cheung & Stephen Chan, Airbnb
When Airbnb first evaluated Kubernetes, they explicitly tested for performance and saw no significant differences. Then in 2019, as Airbnb’s migration of services from EC2/Chef to Kubernetes went into full swing, performance problems started cropping up. Service owners noticed significant latency increases which threatened to halt the overall move to Kubernetes. This talk will share Airbnb’s journey on performance gains and losses in its mass migration to Kubernetes. It will dive into the investigations Airbnb has done, from hardware differences, to cluster settings, to container configurations, to service language problems, and more.


Stephen Chan

Software Engineer, Airbnb
Stephen has worked on Airbnb during much of its Kubernetes migration, from the first production service to hundreds of services running across many clusters and different environments. He previously spoke about a few custom controllers in use at Airbnb at Kubecon 2018.
avatar for Jian Cheung

Jian Cheung

Software Engineer, Airbnb
Jian Cheung is a software engineer on the Compute Infrastructure Team at Airbnb. He works on supporting application and infrastructure service abstractions running on Kubernetes. He has previously spoken about [performance gotchas on Kubernetes](https://kccncna19.sched.com/event/UaXm/did-kubernetes-make-my-p95s-worse-jian-cheung-stephen-chan-airbnb... Read More →

Wednesday November 20, 2019 11:50am - 12:25pm PST
Room 15AB - San Diego Convention Center Mezzanine Level

2:25pm PST

NHD - A Topology-Aware Scheduler for K8s for Low-Latency & HPC Applications - Cliff Burdick, ViaSat
With an increasing number of HPC, NFV, and other low-latency applications moving to containers, the ability to schedule these workloads efficiently is important for increasing user adoption. The default scheduler in Kubernetes does an excellent job at scheduling cloud-native workloads, but is lacking the ability to schedule low-latency workloads properly. NHD attempts to bridge this gap by introducing a custom scheduler for Kubernetes that’s aware of hardware topology, CPU characteristics, and the application’s threading model. In this talk, we’ll go over the ways NHD integrates with Kubernetes, how it’s used, and the features it offers.

avatar for Cliff Burdick

Cliff Burdick

Senior DevTech Engineer, NVIDIA
Cliff is working at NVIDIA where he focuses on optimizing GPU code for signal processing, numerical computing, and GPU/networking IO. Previously he worked at ViaSat designing the ground system software for high-throughput satellites. At ViaSat he developed an open-source Kubernetes... Read More →

Wednesday November 20, 2019 2:25pm - 3:00pm PST
Pacific Ballroom, Salon 14-15 - Marriott Marquis San Diego Marina Hotel

3:20pm PST

How Container Networking Affects Database Performance - Tyler Duzan & Vadim Tkachenko, Percona
Through benchmarks, Percona Labs explores the effects of different container networking drivers used in Kubernetes when hosting database workloads. For this talk, we will perform benchmarks using Percona's PXC Operator deploying a 3-member PXC MySQL cluster on top of Kubernetes and use our standard database benchmarking stack with TPCC and Sysbench to analyze query throughput and replication performance as affected by our choice of networking driver. Drivers we'll test will be CNI core plugins, Flannel, Cilium, Calico, Kube-Router, and the new Red Hat SR-IOV driver. This Dual Presentation (35 minutes) will address our benchmark methodology and results, as well as recommendations regarding networking and tuning database performance on Kubernetes with a focus on MySQL. Both speakers are experts on this topic, and Vadim co-authored "High Performance MySQL", now in its 3rd Edition.

avatar for Vadim Tkachenko

Vadim Tkachenko

CTO, Percona
Vadim Tkachenko co-founded Percona in 2006 and serves as its Chief Technology Ocer. He leads Percona CTO Labs, which focuses on technology research and performance evaluations of Percona and third-party products, designing hardware, lesystems, storage engines, and databases that surpass... Read More →
avatar for Tyler Duzan

Tyler Duzan

Product Manager, Percona
Tyler Duzan joined Percona in 2017 as a Product Manager and has lead their MySQL software and Cloud technology initiatives since, including the recent GA launch of Percona's Kubernetes operators for their Percona Server for MongoDB and Percona XtraDB Cluster database server products... Read More →

Wednesday November 20, 2019 3:20pm - 3:55pm PST
Room 6F - San Diego Convention Center Upper Level

4:25pm PST

Throttling: New Developments in Application Performance with CPU Limits - Dave Chiluk, Indeed
Are you seeing excessively long tail response times from your applications running on containerized clouds (Kubernetes, Docker, Marathon)? Have you ever seen an application be throttled even though it’s no where near its CPU limit?

Up till now, the answer has always been simply turn off hard-limits, but that has potentially nasty performance implications in shared environments. Now there's another option! This session will explain the real root cause of what has been happening. We'll introduce the kernel mechanisms that Kubernetes and other Container Orchestrators rely on to enforce CPU limits. We'll then show how they were broken, how we fixed them, and what those changes mean for you and your clouds.

By the end of this session you'll understand exactly what you are getting when you set the CPU limits on your pods.

avatar for Dave Chiluk

Dave Chiluk

Linux Platform Software Engineer, Indeed
Dave is a Linux Platform Software Engineer at Indeed. He works closely with the DevOps and Site Reliability teams improving reliability, scalability, and performance across Indeed’s hybrid cloud. He has commits in the mainline kernel and has numerous fixes to stable kernels. He’s... Read More →

Wednesday November 20, 2019 4:25pm - 5:00pm PST
Room 17AB - San Diego Convention Center Mezzanine Level

5:20pm PST

How Ancestry Got Kubernetes to Run 2x Better Per Dollar Using AI - Darek Gajewski, Ancestry
Darek Gajewski, Principal Infrastructure Analyst for Ancestry.com, relies on Kubernetes to quickly integrate and deploy applications across Ancestry’s website, which receives 50-million visitors a month, and generates more than a billion dollars in revenue.

To get optimum performance out of Ancestry’s cloud applications, Ancestry employed artificial intelligence for continuous optimization of the application runtime environment. AI brings continuous optimization (CO) to the CI/CD process. In a PoC, Ancestry used AI to cut the resources of one application by more than 50 percent, with zero drop in performance. In this instance, Ancestry has been able to get two times the performance out of Kubernetes for every dollar spent.

AI-powered CO delivers a well-optimized infrastructure personalized to the workload and delivers better reliability, at higher performance, for much lower costs.

avatar for Darek Gajewski

Darek Gajewski

Principal Infrastructure Analyst, Ancestry
Darek has spent 10 years in the role of capacity planning and management, cost governance, optimizing infrastructure at both BlackBerry and Ancestry operations. He has successfully saved millions in infrastructure spend at both Ancestry and BlackBerry. With a background in development... Read More →

Wednesday November 20, 2019 5:20pm - 5:55pm PST
Room 15AB - San Diego Convention Center Mezzanine Level
Thursday, November 21

4:25pm PST

Ready to Serve! Speeding-Up Startup Time of Istio-Powered Workloads - Michal Malka & Etai Lev-Ran, IBM
Pod startup time has long been a focus area for cloud-native platforms. Optimizing startup time is critical to support use cases such as autoscaling, upgrades, and failure recovery. The recent rise of the serverless model, along with its key value proposition of scale-to-zero of idle workloads, has made pod startup time important than ever: The platform must be able to start the pod fairly quick, such that the latency of request-triggered scale-from-zero is acceptable.

In this talk, we'll analyze the latency contributed by Istio service mesh to pod startup time, right from pod creation and up to the pod becoming ready to service requests. We'll also examine various techniques to reduce it, including using Istio CNI to bootstrap the pod's network, launching the sidecar proxy with an initial routing configuration, and using manual sidecar injection.

avatar for Etai Lev Ran

Etai Lev Ran

System Architect, IBM Research
Etai works for the IBM research lab in Haifa and is responsible for application networking research efforts. He has previously worked on cloud infrastructure services, distributed file systems and high performance networked systems.
avatar for Michal Malka

Michal Malka

Manager, IBM Cloud Foundations, IBM
Michal is working as a manager of the Cloud Foundations group at the IBM Haifa Research Lab, focusing on several projects in the area of Hybrid Cloud. Michal has deep knowledge in microservices technologies and is currently working on new directions for Istio as the microservices... Read More →

Thursday November 21, 2019 4:25pm - 5:00pm PST
Room 11AB - San Diego Convention Center Upper Level

5:20pm PST

Staying in Tune: Optimize Kubernetes for Stability and Utilization - Randy Johnson & Koushik Radhakrishnan, VMware
Kubernetes provides a number of primitives to manage resource consumption. Implementing resource limits, requests and quotas are often the first steps taken to solve this problem at the pod or namespace level. However, the behaviour of an overall Kubernetes cluster as it nears capacity and the parameters available to tune it are often overlooked. To ensure optimal stability and utilization of a cluster, users must learn how to implement, test and manage these parameters over time.

With their field engineering work done for healthcare and financial customers, Randy and Koushik have gathered valuable lessons on how one should approach this problem.This talk will illustrate how you should approach resource limits, resource requests, eviction policies and node allocatable constraints to get the most out of your Kubernetes clusters.

avatar for Koushik Radhakrishnan

Koushik Radhakrishnan

Cloud Native Architect, VMware
Koushik has helped build and rollout infrastructure for some of the largest service providers and enterprise customers. In his role as a Cloud Native Architect at VMware, he is passionate about helping organizations adopt and build solutions around the Kubernetes ecosystem and making... Read More →
avatar for Randy Johnson

Randy Johnson

Cloud Native Architect, VMware
Randy is a Cloud Native Architect on the Kubernetes Architecture team at VMware. He is passionate about container orchestration, distributed systems and solving hard problems. Prior to joining VMware, he was guiding organizations along their cloud modernization journey at Red Hat... Read More →

Thursday November 21, 2019 5:20pm - 5:55pm PST
Room 6F - San Diego Convention Center Upper Level

Filter sessions
Apply filters to sessions.