Thursday, October 28, 2021

Elasticity across the Facebook Cloud
Ariel Rao
Facebook, Software Engineer

Facebook operates an internal cloud to support its family of products. The strategy for scaling has included an investment in elasticity of capacity management. Elasticity means several things. At the physical infrastructure level, we mobilize buffer capacity to mitigate dynamic unavailability and coordinate maintenances. At the workload management level, we AutoScale capacity allocations based on predictive and real-time models of workload demand. At the global resource management level, we time-shift flexible workloads based on time series models of supply availability. And at the global efficiency level, we leverage spare capacity for opportunistic workloads. During this talk, we will dive into each dimension and how they fit together. We will show how elasticity across the stack allows us to meet a high bar on reliability and availability, while making efficient use of all capacity deployed.