MLOps / AIOps
Tuesday, October 26, 2021
Kubernetes has become the de-facto tool for orchestrating containerized workloads, and AI workloads are no different. Built to provide isolated environments and simplify reproducibility and portability, it’s an obvious choice for data science, and an ecosystem of data science tools has been built around containers and K8s. But can an orchestrator built for services meet the needs of research experimentation? Can IT easily incorporate K8s into their workflows? Join Guy Salton of Run:AI for a crash course in Kubernetes for AI. Learn what’s working, what’s not, and some fixes for supporting research environments with K8s.
Wednesday, October 27, 2021
Kubeflow is a popular open source project that delivers a composable software foundation for those who need to build and maintain a scalable ML platform with best-in-class KPIs. This presentation and demonstration will review the streamlined ML workflows and simplified operating patterns in Kubeflow 1.4, which is the Community's 11th release since 2018. In this session, Josh Bottum, who is a Kubeflow Community Product Manager, will lead a review of the latest end-to-end machine learning workflows and discuss how market leaders are using Kubeflow to deliver their ML platforms with native Kubernetes efficiencies and portability.
AI teams invest a lot of rigor in defining new project guidelines. But the same is not true for killing existing projects. In the absence of clear guidelines, teams let infeasible projects drag on for months. By streamlining the process to fail fast on infeasible projects, teams can significantly increase their overall success with AI initiatives. This talk covers how to fail fast on AI projects. AI projects have a lot more unknowns compared to traditional software projects: availability of right datasets, model training to meet required accuracy threshold, fairness and robustness of recommendations in production, and many more.In order to fail fast, we manage AI initiatives as a conversion funnel analogous to marketing and sales funnels. Projects start at the top of the five-stage funnel and can drop off at any stage, either to be temporarily put on ice or permanently suspended and added to the AI graveyard. Each stage of the AI funnel defines a clear set of unknowns to be validated with a list of time-bound success criteria. In the talk, we cover details of the 5-stage funnel and experiences building a fail-fast culture where the AI graveyard is celebrated!
One of the main issues with ML and DL deployment is finding the right way to train and operationalize the model within the company. Serverless approach for deep learning provides simple, scalable, affordable yet reliable architecture. The challenge of this approach is to keep in mind certain limitations in CPU, GPU and RAM, and organize training and inference of your model. My presentation will show how to utilize services like Amazon SageMaker, AWS Batch, AWS Fargate, AWS Lambda, AWS Step Functions and SageMaker Pipelines to organize deep learning workflows. My talk will be beneficial for machine learning engineers and platform engineers.
Thursday, October 28, 2021
Too often, “AI-capable” refers to marketing claims instead of practical value add. For this reason, developers tend to be skeptical about AI-driven development. Slapdash application of AI ends up diminishing developer’s creativity and effectiveness. When implemented in inventive, unique ways, AI dramatically improves the productivity of developers and opens up new opportunities for creativity – especially when applied to cloud app development. Beyond the initial development process, AI has the potential to completely transform the entire application lifecycle by eliminating guesswork and repetitive tasks. AI ensures teams are better equipped to manage application dependencies and ensure that regardless of what changes are made, applications never break and are able to seamlessly adapt to inevitable change. AI-supported development democratizes access to advanced tech, making it possible for any IT team – even the lean, mean ones – to build serious apps. Essentially, AI in the DevOps cycle enables developers to shift-left the quality assurance in a more guided and automated way by assisting them at critical phases in the application building process. Instead of finding problems in production, developers are able to identify them while in the midst of the development lifecycle, so they can remain focused on innovating the best solution rather than the intricacies of hand-coding. Pairing AI with visual, model-driven development allows guidance to be both more powerful and less obtrusive and can compress CI/CD pipelines into days or even hours, instead of weeks. As the Head of AI at OutSystems, António has seen firsthand how quickly developers can change their minds after experiencing the speed and creativity AI enables as a complement to traditional development. In this session, he will provide insight on the three most fundamental design decisions regarding integrating AI into an application platform based on OutSystems experience analyzing models based on tens of millions of application graphs and flows, and explore the implications for improving cloud development productivity by 100x. OutSystems serves enterprise customers like Deloitte, which developed a voice to text tool with deep analysis integrated to capture more accurate notes between advisors and their clients.
In this talk, Aparna Dhinakaran, Founder of Arize AI (Ex-Uber ML), will highlight common model failure modes including model drift, data quality issues, performance degradation, etc. The talk will also surface how ML Observability can address these challenges by monitoring for failures, providing tools to troubleshoot and identify the root cause, as well as playing an important part in the feedback loop to improving models. The talk will highlight best practices and share examples from across the industry.
Session will focus on defining Machine Learning (ML) operational models and how enterprises can leverage it through a framework of governance and model risk management to unlock value. Operationalization is essential to realizing the business value of ML models. We will also overlay the paradigm of DevOps on ML lifecycle management including infusing automated validation of model, removing bias and measurement using KPI's. Example framework and architecture of an ML operational model in action will be showcased, including a starter toolkit.
KEYNOTE (AI): Modzy -- Crossing the AI Valley of Death: Deploying and Monitoring Models in Production at Scale
It’s happened again. You built another AI model that will never see the light of day because it won’t make it past the AI “valley of death” – the crossover of model development to model deployment across your enterprise. The handoff between data science and engineering teams is fraught with friction, outstanding questions around governance and accountability, and who is responsible for different parts of the pipeline and process. Even worse? The patchwork approach when building an AI pipeline leaves many organizations open to risks because of a lack of a holistic approach to security and monitoring.Join us to learn about approaches and solutions for configuring a MLOps pipeline that’s right for your organization. You’ll discover why it’s never too early to plan for operationalization of models, regardless of whether your organization has 1, 10, 100, or 1,000 models in production.The discussion will also reveal the merits of an open container specification that allows you to easily package and deploy models in production from everywhere. Finally, new approaches for monitoring model drift and explainability will be revealed that will help manage expectations with business leaders all through a centralized AI software platform called Modzy®.
Anyone building enterprise level machine learning pipelines understands how challenging managing dependencies can be, and that's exactly why Conda works its magic. However, these dependencies can come with security vulnerabilities that are becoming increasingly exploited with malware as hackers target popular open source libraries. In this session, we're cover the most common next generation of cyber attacks, like the cryto-mining typo-squatting on Matplotlib, as well as what tools and best practices you can put into place to protect your MLOps pipelines from cybersecurity attacks.
You know the AI models deployed in production will need to be monitored and updated. It probably does not surprise you that not everyone does so, and that some large bank with thousands of production models doesn’t quite know where all its AI models are, let alone monitor them. But MLOps goes beyond monitoring models to data engineering to driving business objectives. In this session, we will see how Big Tech Cloud and AI players like Azure and AWS enable MLOps today, and what more we can expect to see.