Deployment Guides

BentoML provides a set of APIs and CLI commands for automating cloud deployment workflow which gets your BentoService API server up and running in the cloud, and allows you to easily update and monitor the service. Currently BentoML have implemented this workflow for AWS Lambda and AWS Sagemaker. More platforms such as AWS EC2, Kubernetes Cluster, Azure Virtual Machines are on our roadmap.

You can also manually deploy the BentoService API Server or its docker image to cloud platforms, and we’ve created a few step by step tutorials for doing that.


This documentation is about deploying online serving workloads, essentially deploy API server that serves prediction calls via HTTP requests. For offline serving (or batch serving, batch inference), see Model Serving Guide.

If you are at a small team with limited DevOps support, BentoML provides a fully automated deployment management for AWS EC2, AWS Lambda, AWS SageMaker, and Azure Functions. It provides the basic model deployment functionalities with minimum setup. Here are the detailed guides for each platform:

BentoML also makes it very easy to deploy your models on any cloud platforms or your in-house custom infrastructure. Here are deployment guides for popular cloud services and open source platforms: