DeveloperWeek New York 2020 DeveloperWeek New York 2020
Get your ticket or log in to build your agenda.

What We’ve Learned Building a Multi-Region DBaaS on Kubernetes

- EST
Session Stage
Join on Hopin

Josh Imhoff
Cockroach Labs, Site Reliability Engineer

Josh spent his college years programming robots to play soccer in the RoboCup SPL league. There, he learned to love when very complicated computer systems break. This passion for broken computers led him to take an SRE job at Google working on source control systems. Six months ago he moved to Cockroach Labs to work on a DBaaS. Josh enjoyed Borg while at Google and has since spent many long hours puzzling over the strange land of Kubernetes. Josh can often be found roaming the Cockroach Labs NYC office yelling about things like strange node pool API semantics much to the confusion of his coworkers who don’t work on Kubernetes.


When the engineers at Cockroach Labs started development on a global Database as a Service (DBaaS), they weren’t sure if Kubernetes would be the right choice for the underlying orchestration system. They wanted to harness Kubernetes’s powerful orchestration capabilities, but building a system to run geo-distributed Cockroach clusters on Kubernetes presents unique challenges: First, the clusters must run across multiple regions, complicating networking and service discovery. Second, the clusters must store data, requiring the use of stateful sets and persistent volumes. Third, the system must programmatically create Kubernetes clusters on AWS and GKE, which have different APIs for node pools and firewalls. In this presentation, they share their experience of overcoming these challenges to build a global DBaaS.


Benefits to the Ecosystem: We are presenting a unique case study from a Kubernetes user. We will be sharing our team's experience using kubernetes to build a multiregion database as a service. Unique aspects include (a) running a stateful service across multiple regions, (b) heavily using the k8s API to build automation on top of k8s, and (c) offering a service that dynamically allocates k8s clusters on public cloud providers. In particular, (a) could help inform the design of future k8s multiregion networking/federation capabilities.