MLOps / AIOps

Tuesday, October 25, 2022

Wednesday, October 26, 2022

- PDT
PRO TALK (AI): ML Drift Monitoring : What to Observe, How to Analyze & When to Act
Kumaran Ponnambalam
Kumaran Ponnambalam
Cisco, Principal Engineer

Deploying a new ML model in production successfully is a great achievement, but also is the beginning of a persistent challenge to keep them performing at expected levels. Models in product will drift and decay, and the value provided by them to the business will drop. ML drift monitoring is a challenging tasks, from identifying the right data to collect, the right metrics to compute, the right trends to analyze and the right actions to take. This session will explore the process of model drift monitoring, from model instrumentation to determining the next-best-action. Real life challenges will be explored and best practices and recommendations will be discussed. 

Thursday, October 27, 2022

- PDT
OPEN TALK (AI): Scaling AIaaS: from DALL-E to Uber
Daniel Siryakov
Daniel Siryakov
Comet, Senior Product Manager

As companies begin to embrace AI in key parts of their businesses, they want to explore and scale AI at minimal costs. However developing in-house AI-based solutions for every problem is a complex process and requires huge capital investment. The industry is now embracing AI as a service wherein third party tools can fill in the gaps. In this talk, Daniel will walk through the current landscape, trends, and technical challenges. He will also feature a few customer stories and a proposed modular solution to help your team jumpstart on this journey. 

- PDT
OPEN TALK (AI): Level Up Your Data Lake - to ML and Beyond
Vinodhini SD
Vinodhini SD
Treeverse, Developer Advocate

A data lake is primarily two things: an object store and the objects being stored. Even with the most basic setup, data lakes are capable of supporting BI, Machine Learning, and operational analytics use cases. This flexibility speaks to the strength of object stores, particularly their flexibility in integrating with a diverse set of data processing engines.

As data lakes exploded in adoption, a number of improvements were made to the first architectures. The first and most obvious improvement was to file formats, which led to the development of analytics-optimized formats like parquet, and eventually modern table formats.

An even newer improvement has been the emergence of data source control tools that bring new levels of manageability across an entire lake! In this talk, we'll cover how to incorporate these technologies into your data lake, and how they simplify workflows critical to ML experimentation, deployment of datasets, and more! 

- PDT
OPEN TALK (AI): Reducing Latency and Resource Consumption for Offline Feature Generation
Dhaval Patel
Dhaval Patel
Netflix, Machine Learning Infrastructure

Personalization is one of the key pillars of Netflix as it enables each member to experience the vast collection of content tailored to their interests. Our personalization system is powered by various machine learning models. We constantly innovate by adding new features to our personalization models and running A/B tests to improve recommendations for our members. We also continue to see that providing larger training sets to our models helps make better predictions. Our ML fact store has enabled us to provide larger training sets where the training set spans over a long time window. While a great success, the ML fact store architecture has its limitations. For example, features computed while generating recommendations must be recomputed by offline feature generation pipelines. This talk is about those limitations and how we enhanced our architecture to run optimized offline feature generation pipelines. 

Wednesday, November 2, 2022

- PDT
[#VIRTUAL] PRO TALK (AI): ML Drift Monitoring : What to Observe, How to Analyze & When to Act
Kumaran Ponnambalam
Kumaran Ponnambalam
Cisco, Principal Engineer

Deploying a new ML model in production successfully is a great achievement, but also is the beginning of a persistent challenge to keep them performing at expected levels. Models in product will drift and decay, and the value provided by them to the business will drop. ML drift monitoring is a challenging tasks, from identifying the right data to collect, the right metrics to compute, the right trends to analyze and the right actions to take. This session will explore the process of model drift monitoring, from model instrumentation to determining the next-best-action. Real life challenges will be explored and best practices and recommendations will be discussed. 

- PDT
[#VIRTUAL] PRO Workshop (AI): Deploying Machine Learning Models with Pulsar Functions
David Kjerrumgaard
David Kjerrumgaard
StreamNative, Developer Advocate

In this talk I will present a technique for deploying machine learning models to provide real-time predictions using Apache Pulsar Functions. In order to provide a prediction in real-time, the model usually receives a single data point from the caller, and is expected to provide an accurate prediction within a few milliseconds. 

Throughout this talk, I will demonstrate the steps required to deploy a fully-trained ML that predicts the delivery time for a food delivery service based upon real-time traffic information, the customer's location, and the restaurant that will be fulfilling the order. 

Thursday, November 3, 2022

- PDT
[#VIRTUAL] PRO TALK (AI): Avoid Mistakes Building AI Products
Karol Przystalski
Karol Przystalski
Codete, CTO

Based on Gartner's research, 85% of AI projects fail. In this talk, we show the most typical mistakes made by the managers, developers, and data scientists that might make the product fail. We base on ten case studies of products that failed and explain the reasons for each fail. On the other hand, we show how to avoid such mistakes by introducing a few lifecycle changes that make an AI product more probable to succeed. 

- PDT
[#VIRTUAL] OPEN TALK (AI): Scaling AIaaS: from DALL-E to Uber
Daniel Siryakov
Daniel Siryakov
Comet, Senior Product Manager

As companies begin to embrace AI in key parts of their businesses, they want to explore and scale AI at minimal costs. However developing in-house AI-based solutions for every problem is a complex process and requires huge capital investment. The industry is now embracing AI as a service wherein third party tools can fill in the gaps. In this talk, Daniel will walk through the current landscape, trends, and technical challenges. He will also feature a few customer stories and a proposed modular solution to help your team jumpstart on this journey. 

- PDT
[#VIRTUAL] OPEN TALK (AI): Level Up Your Data Lake - to ML and Beyond
Oz Katz
Oz Katz
Treeverse, CTO & Co-Founder

A data lake is primarily two things: an object store and the objects being stored. Even with the most basic setup, data lakes are capable of supporting BI, Machine Learning, and operational analytics use cases. This flexibility speaks to the strength of object stores, particularly their flexibility in integrating with a diverse set of data processing engines.

As data lakes exploded in adoption, a number of improvements were made to the first architectures. The first and most obvious improvement was to file formats, which led to the development of analytics-optimized formats like parquet, and eventually modern table formats.

An even newer improvement has been the emergence of data source control tools that bring new levels of manageability across an entire lake! In this talk, we'll cover how to incorporate these technologies into your data lake, and how they simplify workflows critical to ML experimentation, deployment of datasets, and more! 

- PDT
[#VIRTUAL] OPEN TALK (AI): Reducing Latency and Resource Consumption for Offline Feature Generation
Dhaval Patel
Dhaval Patel
Netflix, Machine Learning Infrastructure

Personalization is one of the key pillars of Netflix as it enables each member to experience the vast collection of content tailored to their interests. Our personalization system is powered by various machine learning models. We constantly innovate by adding new features to our personalization models and running A/B tests to improve recommendations for our members. We also continue to see that providing larger training sets to our models helps make better predictions. Our ML fact store has enabled us to provide larger training sets where the training set spans over a long time window. While a great success, the ML fact store architecture has its limitations. For example, features computed while generating recommendations must be recomputed by offline feature generation pipelines. This talk is about those limitations and how we enhanced our architecture to run optimized offline feature generation pipelines.