Friday, October 29, 2021
Today we understand that to create good products that leverage AI, we need to run machine learning algorithms on massive amounts of data. In order to do so, we can leverage existing distributed machine learning frameworks, such as Spark MLlib, which helps us simplify the development and usage of large-scale machine learning training and serving. The typical machine learning workflow is:
* Loading data (data ingestion)
* Preparing data (data cleanup)
* Extracting features (feature extraction)
* Fitting model (model training)
* Serving the model
* Scoring (or predictionizing) / using in production
With Apache Spark libraries we can cover the entire basic machine learning workflow. As software and data engineers, it is important to understand the flow, how we can leverage what already exists, and create more enhanced products. As tech leads and architects, understanding the workflow and options available will help us create better architecture and software.
Join this session to learn more about how you can use Apache Spark ecosystem to develop your machine learning end-to-end pipeline.
Download these images to your phone and post using the Instagram app.