Get your ticket or log in to build your agenda.

PRO SESSION (AI): Serving Very Large Numbers of Low Latency ML Models


Manoj Agarwal
Salesforce, Distributed Systems Architect and a Continuous Learner

Manoj has 20 years of experience in the industry. He likes solving high-availability and scalability problems in distributed systems. He is an AWS certified architect & developer. In his spare time, he likes cycling, hiking or playing board games.


Serving machine learning models is a scalability challenge at many companies. Most of the applications require a small number of machine learning models (often <100) to serve predictions. On the other hand, cloud platforms that support model serving, though they support hundreds of thousands of models, provision separate hardware for different customers. Salesforce has a unique challenge that only very few companies deal with, Salesforce needs to run hundreds of thousands of models sharing the underlying infrastructure for multiple tenants for cost effectiveness. In this talk we will explain how Salesforce hosts hundreds of thousands of models on a multi-tenant infrastructure, to support low-latency predictions.