This document describes Amazon SageMaker's capabilities for end-to-end machine learning model development and deployment. It discusses how SageMaker provides pre-built algorithms and frameworks, managed training and hosting services, and the ability to customize models with user-provided algorithms or frameworks like fast.ai. The document provides an example workflow of using SageMaker to build, train, and deploy a fast.ai model for inference.
Afternoon everyone, welcome the Loft, Training and Deploying Custom Algorithms with Amazon SageMaker. My name is Chaitanya Hazarey and I am a Solution Architect with AWS working with our emerging partners across US.
What is the most interesting application of AI that you have come across.
My session will be in 2 parts, a presentation giving you an overview of the SageMaker service and showing how you can build, train and deploy a custom algorithm or framework with Amazon Sagemaker and a demo showing the steps applied to bringing a fast.ai based algorithm to SageMaker via the console.
Did anyone attend the previous SageMaker sessions? This talk will be more advanced than those going into more detail around how to bring custom frameworks to the platform.
Each of these steps were very involved in the beginning
Amazon SageMaker removes the complexity that holds back developer or Data Scientist success with each of these steps. Amazon SageMaker includes modules that can be used together or independently to build, train, and deploy your machine learning models.
SageMaker makes it easy to build ML models and get them ready for training by providing everything you need to quickly connect to your training data, and to select and optimize the best algorithm and framework for your application. Amazon SageMaker includes hosted Jupyter notebooks that make it is easy to explore and visualize your training data stored in Amazon S3. You can connect directly to data in S3, or use AWS Glue to move data from Amazon RDS, Amazon DynamoDB, and Amazon Redshift into S3 for analysis in your notebook.
To help you select your algorithm, Amazon SageMaker includes the 10 most common machine learning algorithms which have been pre-installed and optimized to deliver up to 10 times the performance you’ll find running these algorithms anywhere else. Amazon SageMaker also comes pre-configured to run TensorFlow and Apache MXNet, two of the most popular open source frameworks, or you have the option of using your own framework.
You can begin training your model with a single click in the Amazon SageMaker console. The service manages all of the underlying infrastructure for you and can easily scale to train models at petabyte scale. To make the training process even faster and easier, Amazon SageMaker can automatically tune your model to achieve the highest possible accuracy.
Once your model is trained and tuned, SageMaker makes it easy to deploy in production so you can start generating predictions on new data (a process called inference). Amazon SageMaker deploys your model on an auto-scaling cluster of Amazon EC2 instances that are spread across multiple availability zones to deliver both high performance and high availability. It also includes built-in A/B testing capabilities to help you test your model and experiment with different versions to achieve the best results.
For maximum versatility, we designed Amazon SageMaker in three modules – Build, Train, and Deploy – that can be used together or independently as part of any existing ML workflow you might already have in place.
SageMaker is built on 4 independent architectural components.
▪ Notebook instances where you do exploratory data analysis
▪ Amazon-provided algorithms to get you kick-started with ML
▪ The managed service for training models
▪ And the hosting service where you deploy models and we provide the API for you.
There are no dependencies between these components. If you only want to host models you’ve created on premises you can do that. If you only want to train and deploy elsewhere you can do that. You never need to open a notebook to use SageMaker. However, that is the principal means of data exploration and every feature is callable from notebooks.
Creating powerful data exploration notebooks is very easy: you choose your machine size, hit return and bang it’s there. In 1 click you can do nearly everything in ML. It’s connected to a containerized EBS storage volume and at your preference you have access to CPU and GPUs. We have more than 30 sample notebooks included in every instance that provide cut-and-paste ease of use to kick start your workflow. The sample notebooks provide business-focused solutions from churn prediction to demand forecasting, topic and object classification, log processing and anomaly detection.
There’s a free tier available for exploration. Launch it, go through the examples and discover which samples apply to your needs.
Back to architecture. You introduce algorithms into SageMaker 3 basic ways. You can use one of our highly-optimized built-in algorithms, you can build your own script built on MXNet, TensorFlow or the framework you build into the container yourself. Alternatively you can use a connection between Spark / SageMaker where Spark pre-processes your data and hands it off for training in SageMaker as demonstrated at Intuit. Or you can bring your own model trained elsewhere, perhaps on premises. You build the model locally, build your own container and ship the container to SageMaker for deployment.
The SageMaker architecture is built on a foundation of docker containers. If you use the build-in algorithms or MXNet/TensorFlow the docker images are maintained and managed by the SageMaker team. If you bring your own algorithm or framework then you need to build your own training image and upload to Amazon ECR. You can use the containers that make the most sense.
Data Scientists pull training data from S3 as well as a docker container that we provide. Data Scientists may also build their own models and import those containers into the training system. Once the model is trained we push the results back to s3 which we call the model artifacts.
Developers and Operations take those model artifacts along with inference engines to serve predictions at an endpoint via REST. The endpoint is highly configurable providing the ability to evaluate several models at once and provide continuous training and deployment. Alternatively you may push your model to edge devices with the AWS Greengrass service.
Putting a model into production is where the rubber hits the road. If your model is intended for an IoT or edge device you can use AWS Greengrass. AWS Greengrass is software that lets you run local compute, messaging, data caching, sync, and ML inference capabilities for connected devices in a secure way. Greengrass ML Inference is a feature of AWS Greengrass that makes it easy to perform ML inference locally on Greengrass Core devices using models that are built and trained in the cloud.
SageMaker makes deploying in the cloud as simple as filling out a form. You start with defining a production variant. Production variants are units of hardware that you specify to host your predictive system. At AWS hardware is code and as such is as flexible as code. Here we specify our instancetype as a c3.4xlarge with an initial instance count of 3. Importantly we specify the initial variant weight as 100%.
Build 1:
Your model object is a connector between your model artifacts and your inference container image. When you deploy that model initially you’re sending 100% of the traffic to that endpoint and collecting ground truth. That might be the end of it, for a while. As your customers use the model you invariably find that the ground truth data from real world experience is different from the historical data used to build the model. As you retrain your model you will make updated model objects and need to test them before launching into production.
Build 2:
With endpoint configuration we can divert a percentage of traffic for A/B testing or what we call blue green testing. Unlike the old days where continuous integration on enterprise systems meant release cycles of weeks or months, training models from refreshed ground truth often takes place within hours. As you gain confidence in the updated model object you can divert all traffic to the new object and the continuous integration and deployment process continues. This enables rapid model development in near real-time on ground truth derived from actual customer experiences.
SageMaker is built on 4 independent architectural components.
▪ Notebook instances where you do exploratory data analysis
▪ Amazon-provided algorithms to get you kick-started with ML
▪ The managed service for training models
▪ And the hosting service where you deploy models and we provide the API for you.
There are no dependencies between these components. If you only want to host models you’ve created on premises you can do that. If you only want to train and deploy elsewhere you can do that. You never need to open a notebook to use SageMaker. However, that is the principal means of data exploration and every feature is callable from notebooks.
Here is the example architecture we will go through to show how to use a custom framework with SageMaker. We will first spin up a SageMaker notebook to do the ML model build and training. The reason for using the SageMaker notebook is that we can iterate on our models quickly as the data is not too large and we will use a single machine to train our models. We won’t be using the built-in SageMaker Algorithms nor the SageMaker Training facility. Once our model is trained on our notebook we will then save the model locally on the notebook instance then upload the model artefacts to S3 as a tar.gz file. The next phase
Here is the example architecture we will go through to show how to use a custom framework with SageMaker. We will first spin up a SageMaker notebook to do the ML model build and training. The reason for using the SageMaker notebook is that we can iterate on our models quickly as the data is not too large and we will use a single machine to train our models. We won’t be using the built-in SageMaker Algorithms nor the SageMaker Training facility. Once our model is trained on our notebook we will then save the model locally on the notebook instance then upload the model artefacts to S3 as a tar.gz file. The next phase
Here is the example architecture we will go through to show how to use a custom framework with SageMaker. We will first spin up a SageMaker notebook to do the ML model build and training. The reason for using the SageMaker notebook is that we can iterate on our models quickly as the data is not too large and we will use a single machine to train our models. We won’t be using the built-in SageMaker Algorithms nor the SageMaker Training facility. Once our model is trained on our notebook we will then save the model locally on the notebook instance then upload the model artefacts to S3 as a tar.gz file. The next phase
Here is the example architecture we will go through to show how to use a custom framework with SageMaker. We will first spin up a SageMaker notebook to do the ML model build and training. The reason for using the SageMaker notebook is that we can iterate on our models quickly as the data is not too large and we will use a single machine to train our models. We won’t be using the built-in SageMaker Algorithms nor the SageMaker Training facility. Once our model is trained on our notebook we will then save the model locally on the notebook instance then upload the model artefacts to S3 as a tar.gz file. The next phase