Skip to main content

Deploy Custom Models to SageMaker Endpoints

This report is a guide on how to setup automated deployments of custom models to Endpoints with Weights & Biases.
Created on December 5|Last edited on December 20
Amazon SageMaker Endpoints are a feature of AWS's SageMaker service, designed for deploying machine learning models into a production environment. These endpoints allow for real-time inference, meaning they can receive input data, process it, and return predictions. SageMaker has built in support for several types of Models (e.g. TensorFlowModel and PyTorchModel) but if you would like to use a custom model, or have other more advanced requirements, then extra steps are required.
In this report we will see how we can use W&B in a pipeline covering model training, inference container building, and model deployment to SageMaker Endpoints. To support this workflow, there are 3 key components.
1. Building a custom container for inference. This should support custom inference code, system and python packages required for the inference. Often we want to be able to link each specific container build to specific model deployments.
2. A templated deployment job. This should take model artifacts and deploy them to a SageMaker endpoint, using the container from the previous step. Often we want to be able to connect these specific artifacts to the rest of the pipeline that produced them, such as training code, hyperparameter, training data etc.
3. Automated model deployment. Often we want to be able to seamlessly and automatically deploy models on specific conditions (such as tagging a model as a production model).


Building a Custom Container for Inference

In order to support custom models on SageMaker Endpoints, we need to supply a container that will run the code for the inference. There are two key steps here.
The first is to adapt your inference code. We need to key functions to support this, the first is to handle invocations from which will be the entry point for inference requests. The second is a function to handle a ping request that SageMaker will use to determine if the endpoint is healthy. As this is done via POST and GET requests, respectively, the container must also have a web server listening on port 8080.
You can browse the example inference code below.

Inference Code

We also have a simple wsgi.py script to help the web server find out inference code.
You can see the complete directory for our inference here.


Dockerfile

We also need a Dockerfile for the container. Here is an example we used. Key parts include the installation of relevant packages, and copying the inference directory to /opt/program, which is where SageMaker expects the inference code to be, and the final command which is the entrypoint that starts the webserver.

Building the Container and Pushing to ECR

Commonly the container build step is a shell script. However, below I have wrapped this in Python so we can track the build process within W&B! This is the step that will build the custom container, and push it to AWS for use in SageMaker Endpoints. Note that we log the Dockerfile at the end, with the name of the container image we are using. This lets of reference this later in the pipeline, so we have traceable linage at the model deployment step back to the specific container it is being deployed to.


A Templated Deployment Job

Now that we have the steps required to build and push our custom inference container for use in SageMaker endpoints, let's write the code that we can use to deploy models and use the container. Here is an example deployment job.
At a high level, what it does is, when executed, it will download a specific model artifact from W&B (and try to link it to a container version from an earlier step is possible). It then packages up the model artifact into a tar.gz file as required by SageMaker Endpoints, and deploys a custom SageMaker Model, specifying this file as the model_data, and the image_uri which is a URI to a container in AWS ECR that we have previously uploaded. Once finished the model has now been deployed to our container, and is ready for inference!


Deploying Manually

Now we have the code written, we can execute it and try it out, seeing that our model has been successfully deployed.
To do a deployment we then need to do the following
  1. Run python build_and_push.py --image-name <insert_name_here> optionally providing --wandb-project and --inference-code-dir to specify the W&B project to log to and point the build process to a different directory if required.
  2. Run python deploy.py --image-uri <insert_contianer_uri_from_previous_step_here> --artifact <insert_W&B_model_artifact_to_deploy_here> along with the --wandb-project you want to log to, and SageMaker specific configuration, --role and --sagemaker-bucket, which is the arn of the role required to spin up the resource, and the SageMaker bucket name to store data in.
Because we have used W&B for this we have tracked each deployment. We can see the training run that trained the model, and the specific deployment that this model was used in. We can click on any of the boxes below and see all of the tracked information about the run. This let's us answer questions like "what code was used to train this model currently in deployment?" or "what were the hyperparameters of the model and when was it trained?".

We can also explore the wider graph. In the panel below, change the "Style" from "Direct Lineage" to "Complete", and we will see how other components of our tracked pipeline fit into this. For example, we will now be able to see the specific container building process that took place and associated metadata, as well as the Dockerfile used, and the specific container. This lets us easily see which models have been deployed with which container.

clf
Direct lineage view
Artifact - model
clf:v25
Run - training
vital-sweep-24
These are the two key steps in the process. Often, however, it is likely that we will run step 2 much more often that step 1. This is because our models change much more frequently than our inference code. Given that, let's make it much easier to run the deployment step, ultimately automating it.


Automated Model Deployment

Ultimately where we would like to end up is that we can tag a specific model to be deployed, and behind the scenes, the model is deployed for us. In order to get there, we need to use two W&B features. The first is W&B Launch, that lets us execute arbitrary code (or containers), either from the UI or command line. While often used for deploying training runs to compute infrastructure, we can also use it to deploy our model to SageMaker Endpoints for inference.

Jobs

The first component to understand is a W&B job. These are created whenever you log code to W&B (wandb.run.log_code(".")) or if you manually create the job from the command line (wandb job create --project "<project-name>" -e "<your-entity>" \ --name "<name-for-job>" code "<path-to-script/code.py>"). The will appear in the Jobs section of a W&B project (https://wandb.ai/<entity>/project_name>/jobs) and can be launched from here.

What does it mean to Launch a Job in W&B? The process is shown here.


As you can see we have covered how to create a job, but not what a queue is, or how the job is executed (via an agent).

Queues

Queues are the pipes where you submit your job to a particular compute target resource, such as Amazon SageMaker or a Kubernetes cluster. As jobs are added to the queue, one or more launch agents will poll that queue and execute the job on the system targeted by the queue.
You can create a queue for your team at https://wandb.ai/launch. For the purposes of this, we can use a Docker queue, which is also the easiest to configure an agent for!

Agents

Launch agents are lightweight, persistent programs that periodically check Launch queues for jobs to execute. When a launch agent receives a job, it first builds or pulls the image from the job definition then runs it on the target resource. One agent may poll multiple queues, however the agent must be configured properly to support all of the backing target resources for each queue it is polling.
You can launch a Docker agent with the following command (assumes access to the wandb package)
wandb launch-agent -e <the_entity_queue_in> -q <the_name_of_queue> -j <int_max_number_concurrent_jobs>
For SageMaker deployments we need to pass some credentials to the agent doing the actual deployment. One such way of doing this is via environment variables. You can pass these to docker agents as follows
AWS_DEFAULT_REGION="<valid_region>" AWS_ACCESS_KEY_ID="valid_key" AWS_SECRET_ACCESS_KEY="valid_secret_key" AWS_SESSION_TOKEN="<valid_session_token>" wandb launch-agent -e <the_entity_queue_in> -q <the_name_of_queue> -j <int_max_number_concurrent_jobs>
You also need to configure the Docker Launch queue to pass these environment variables along. Go to the Launch queue, and select "View details" and then "Config". Enter the following
e:
- AWS_ACCESS_KEY_ID
- AWS_SESSION_TOKEN
- AWS_SECRET_ACCESS_KEY
- AWS_DEFAULT_REGION
Once setup, run the wandb launch-agent command above and you should see your Queue successfully spin up.


Launching a Deployment Job

You can now launch your deployment job in one of three ways.
  1. As before, executing the deploy.py script directly. This will pull model artifacts from W&B, and deploying them to SageMaker Endpoints automatically.
  2. From the UI of W&B you can launch W&B Jobs. Once you have a queue and agent setup, you can execute the deployment job, following this guide here. You can also launch them from your terminal.
  3. As a W&B Automation!


Setting up a Model Registry Automation

Within the W&B model registry you can create a registered model. You can think of this as a central system of record for your best models, standardised and organised across projects and teams.
From the model registry page select "New registered model" in the top right, and fill in the details. Once you have this setup, you should see a screen like this.

Click on the three dots on the right, as shown in the screenshot, and select "New automation" button. Next, follow the steps to create the automation, defining the action you want to occur based on an event trigger. An event is a change that takes place in the W&B ecosystem. The Model Registry supports two event types: linking a new artifact to a registered model and adding a new alias to a version of the registered model.
In the former case, anytime a model is linked to the registry, it will trigger the automation. In the latter case, we can specify regular expressions that will also need to be presented on the registered model to trigger the automation (common aliases include "production" or "staging" that may trigger different automations).
We must also select the type of action to occur (either a webhook, or job). In this case, we will use a job, and select our deploy job from earlier. In the next step we will specify the configuration for the deployment job. Here we can template specific aspects of the configuration used by the run. For the deployment, an essential one will be the artifact (model) to be deployed. For our deployment job this can be "${artifact_version_string}". You can also define where the run will be logged, and the queue it will run on. You can use the queue created earlier.
Once this is setup, you can test it out by linking a new model to this registered model in the Model Registry. Go to your project and the Artifacts tab. Select a model and note the "Link to registry" button in the top right, and add the model to the newly registered model.


Given the automation we created, it will then now deploy this version of the model to SageMaker Endpoints by triggering the deployment job. You can go to the Launch queue you created and check, and after a few minutes, be able to send requests to the endpoint!



You can see a video of this workflow in action here: https://www.loom.com/share/13e3d743387f4a0cb7e568d6fdfdaaa4

none
artifact