Deploying Bento#

Deployment Overview#

The three most common deployment options with BentoML are:

  • 🐳 Generate container images from Bento for custom docker deployment

  • 🦄️ Yatai: Model Deployment at scale on Kubernetes

  • 🚀 bentoctl: Fast model deployment on any cloud platform

Containerize Bentos#

Containerizing bentos as Docker images allows users to easily distribute and deploy bentos. Once services are built as bentos and saved to the bento store, we can containerize saved bentos with the CLI command bentoml containerize.

Start the Docker engine. Verify using docker info.

> docker info

Run bentoml list to view available bentos in the store.

> bentoml list

Tag                               Size        Creation Time        Path
iris_classifier:ejwnswg5kw6qnuqj  803.01 KiB  2022-05-27 00:37:08  ~/bentoml/bentos/iris_classifier/ejwnswg5kw6qnuqj
iris_classifier:h4g6jmw5kc4ixuqj  644.45 KiB  2022-05-27 00:02:08  ~/bentoml/bentos/iris_classifier/h4g6jmw5kc4ixuqj

Run bentoml containerize to start the containerization process.

> bentoml containerize iris_classifier:latest                                                                                                                                             02:10:47

INFO [cli] Building docker image for Bento(tag="iris_classifier:ejwnswg5kw6qnuqj")...
[+] Building 21.2s (20/20) FINISHED
INFO [cli] Successfully built docker image "iris_classifier:ejwnswg5kw6qnuqj"
For Mac with Apple Silicon

Specify the --platform to avoid potential compatibility issues with some Python libraries.

bentoml containerize --platform=linux/amd64 iris_classifier:latest

View the built Docker image:

> docker images

REPOSITORY          TAG                 IMAGE ID       CREATED         SIZE
iris_classifier     ejwnswg5kw6qnuqj    669e3ce35013   1 minutes ago   1.12GB

Run the generated docker image:

> docker run -p 3000:3000 iris_classifier:ejwnswg5kw6qnuqj


  • Add sample code for working with GPU and –gpu flag

  • Add a further reading section

  • Explain buildx requirement

  • Explain multi-platform build

Deploy with Yatai#

Yatai helps ML teams to deploy large scale model serving workloads on Kubernetes. It standardizes BentoML deployment on Kubernetes, provides UI and APis for managing all your ML models and deployments in one place, and enables advanced GitOps and CI/CD workflows.

Yatai is Kubernetes native, integrates well with other cloud native tools in the K8s eco-system.

To get started, get an API token from Yatai Web UI and login from your bentoml CLI command:

bentoml yatai login --api-token {YOUR_TOKEN_GOES_HERE} --endpoint

Push your local Bentos to yatai:

bentoml push iris_classifier:latest


Yatai will automatically start building container images for a new Bento pushed.

Deploy via Web UI#

Although not always recommended for production workloads, Yatai offers an easy-to-use web UI for quickly creating deployments. This is convenient for data scientists to test out Bento deployments end-to-end from a development or testing environment:

Yatai Deployment creation UI

The web UI is also very helpful for viewing system status, monitoring services, and debugging issues.

Yatai Deployment Details UI

Commonly we recommend using APIs or Kubernetes CRD objects to automate the deployment pipeline for production workloads.

Deploy via API#

Yatai’s REST API specification can be found under the /swagger endpoint. If you have Yatai deployed locally with minikube, visit: The Swagger API spec covers all core Yatai functionalities ranging from model/bento management, cluster management to deployment automation.


Python APIs for creating deployment on Yatai is on our roadmap. See #2405. Current proposal looks like this:

yatai_client = bentoml.YataiClient.from_env()

bento = yatai_client.get_bento('my_svc:v1')
assert bento and bento.status.is_ready()

yatai_client.create_deployment('my_deployment', bento.tag, ...)

# For updating a deployment:
yatai_client.update_deployment('my_deployment', bento.tag)

# check deployment_info.status
deployment_info = yatai_client.get_deployment('my_deployment')

Deploy via kubectl and CRD#

For DevOps managing production model serving workloads along with other kubernetes resources, the best option is to use kubectl and directly create BentoDeployment objects in the cluster, which will be handled by the Yatai deployment CRD controller.

# my_deployment.yaml
kind: BentoDeployment
  name: demo
  bento_tag: iris_classifier:3oevmqfvnkvwvuqj
      cpu: 1000m
      cpu: 500m
kubectl apply -f my_deployment.yaml

Deploy with bentoctl#

bentoctl is a CLI tool for deploying Bentos to run on any cloud platform. It supports all major cloud providers, including AWS, Azure, Google Cloud, and many more.

Underneath, bentoctl is powered by Terraform. bentoctl adds required modifications to Bento or service configurations, and then generate terraform templates for the target deploy platform for easy deployment.

The bentoctl deployment workflow is optimized for CI/CD and GitOps. It is highly customizable, users can fine-tune all configurations provided by the cloud platform. It is also extensible, for users to define additional terraform templates to be attached to a deployment.

Quick Tour#

Install aws-lambda plugin for bentoctl as an example:

bentoctl operator install aws-lambda

Initialize a bentoctl project. This enters an interactive mode asking users for related deployment configurations:

> bentoctl init

Bentoctl Interactive Deployment Config Builder

deployment config generated to: deployment_config.yaml
✨ generated template files.
  - bentoctl.tfvars

Deployment config will be saved to ./deployment_config.yaml:

api_version: v1
name: quickstart
    name: aws-lambda
template: terraform
    region: us-west-1
    timeout: 10
    memory_size: 512

Now, we are ready to build the deployable artifacts required for this deployment. In most cases, this step will product a new docker image specific to the target deployment configuration:

bentoctl build -b iris_classifier:btzv5wfv665trhcu -f ./deployment_config.yaml

Next step, use terraform CLI command to apply the generated deployment configs to AWS. This will require user setting up AWS credentials on the environment.

> terraform init
> terraform apply -var-file=bentoctl.tfvars --auto-approve

base_url = ""
function_name = "quickstart-function"
image_tag = ""

Testing the endpoint deployed:

URL=$(terraform output -json | jq -r .base_url.value)classify
curl -i \
    --header "Content-Type: application/json" \
    --request POST \
    --data '[5.1, 3.5, 1.4, 0.2]' \

Supported Cloud Platforms#


Explain limitations of each platform, e.g. GPU support Explain how to customize the terraform workflow

About Horizontal Auto-scaling#

Auto-scaling is one of the most sought-after features when it comes to deploying models. Autoscaling helps optimize resource usage and cost by automatically provisioning up and scaling down depending on incoming traffic.

Among deployment options introduced in this guide, Yatai on Kubernetes is the recommended approach if auto-scaling and resource efficiency are required for your team’s workflow. Yatai enables users to fine-tune resource requirements and auto-scaling policy at the Runner level, which inherently improves interoperability between auto-scaling and data aggregated at Runner’s adaptive batching layer in real-time.

Many of bentoctl’s deployment targets also come with a certain level of auto-scaling capabilities, including AWS EC2 and AWS Lambda.