Building Bentos#

What is a Bento?#

Bento 🍱 is a file archive with all the source code, models, data files and dependency configurations required for running a user-defined bentoml.Service, packaged into a standardized format.

While bentoml.Service standardizes the inference API definition, including the serving logic, runners initialization and API input, output types. Bento standardizes how to reproduce the required environment for running a bentoml.Service in production.


β€œBento Build” is essentially the build process in traditional software development, where source code files were converted into standalone artifacts that are ready to deploy. BentoML reimagined this process for Machine Learning model delivery, and optimized the workflow both for interactive model development and for working with automated training pipelines.

The Build Command#

A Bento can be created with the bentoml build CLI command with a bentofile.yaml build file. Here’s an example from the tutorial:

service: "service:svc"  # Same as the argument passed to `bentoml serve`
    owner: bentoml-team
    stage: dev
- "*.py"  # A pattern for matching which files to include in the bento
    packages:  # Additional pip packages required by the service
    - scikit-learn
    - pandas
Β» bentoml build

Building BentoML service "iris_classifier:dpijemevl6nlhlg6" from build context "/home/user/gallery/quickstart"
Packing model "iris_clf:zy3dfgxzqkjrlgxi"
Locking PyPI package versions..


Successfully built Bento(tag="iris_classifier:dpijemevl6nlhlg6")

Similar to saving a model, a unique version tag will be automatically generated for the newly created Bento.

It is also possible to customize the Bento version string by specifying it in the --version CLI argument. However this is generally not recommended. Only use it if your team has a very specific naming convention for deployable artifacts, e.g.:

Β» bentoml build --version 1.0.1


The Bento build process requires importing the bentoml.Service object defined. This means, the build environment must have all its dependencies installed. Support for building from a docker environment is on the roadmap, see #2495.

Advanced Project Structure#

For projects that are part of a larger codebase and interacts with other local python modules; Or for projects containing multiple Bentos/Services, it may not be possible to put all service definition code and bentofile.yaml under the project’s root directory.

BentoML allows placing the service definition file and bentofile anywhere in the project directory. In this case, the user needs to provide the build_ctx and bentofile argument to the bentoml build CLI command.


Build context is your Python project’s working directory. This is from where you start the Python interpreter during development so that your local python modules can be imported properly. Default to current directory where the bentoml build takes place.


bentofile is a .yaml file that specifies the Bento Build Options. Default to the bentofile.yaml file under the build context.

They can also be customized via the CLI command, e.g.:

Β» bentoml build -f ./src/my_project_a/bento_fraud_detect.yaml ./src/

Managing Bentos#

Bentos are the unit of deployment in BentoML, one of the most important artifact to keep track of for your model deployment workflow.

Local Bento Store#

Similar to Models, Bentos built locally can be managed via the bentoml CLI commands:

Β» bentoml list

Tag                               Size        Creation Time        Path
iris_classifier:nvjtj7wwfgsafuqj  16.99 KiB   2022-05-17 21:36:36  ~/bentoml/bentos/iris_classifier/nvjtj7wwfgsafuqj
iris_classifier:jxcnbhfv6w6kvuqj  19.68 KiB   2022-04-06 22:02:52  ~/bentoml/bentos/iris_classifier/jxcnbhfv6w6kvuqj
Β» bentoml get iris_classifier:latest

service: service:svc
name: iris_classifier
version: nvjtj7wwfgsafuqj
bentoml_version: 1.0.0
creation_time: '2022-05-17T21:36:36.436878+00:00'
  owner: bentoml-team
  project: gallery
- tag: iris_clf:nb5vrfgwfgtjruqj
  module: bentoml.sklearn
  creation_time: '2022-05-17T21:36:27.656424+00:00'
- name: iris_clf
  runnable_type: SklearnRunnable
  - iris_clf:nb5vrfgwfgtjruqj
    cpu: 4.0
    nvidia_gpu: 0.0
- name: classify
  input_type: NumpyNdarray
  output_type: NumpyNdarray
Β» bentoml delete iris_classifier:latest -y

Bento(tag="iris_classifier:nvjtj7wwfgsafuqj") deleted

Import and Export#

Bentos can be exported to a standalone archive file outside of the store, for sharing Bentos between teams or moving between different deployment stages. For example:

> bentoml export iris_classifier:latest .

INFO [cli] Bento(tag="iris_classifier:nvjtj7wwfgsafuqj") exported to ./iris_classifier-nvjtj7wwfgsafuqj.bento
> bentoml import ./iris_classifier-nvjtj7wwfgsafuqj.bento

INFO [cli] Bento(tag="iris_classifier:nvjtj7wwfgsafuqj") imported


Bentos can be exported to or import from AWS S3, GCS, FTP, Dropbox, etc. For example with S3:

pip install fs-s3fs  # Additional dependency required for working with s3
bentoml import s3://
bentoml export iris_classifier:latest s3://my_bucket/my_prefix/

To see a list of plugins usable for upload, see the list provided by the pyfilesystem library.


After you build a Bento, it’s essential to test it locally before containerizing it or pushing it to BentoCloud for production deployment. Local testing ensures that the Bento behaves as expected and helps identify any potential issues. Here are two methods to test a Bento locally.

Via BentoML CLI#

You can easily serve a Bento using the BentoML CLI. Replace BENTO_TAG with your specific Bento tag (for example, iris_classifier:latest) in the following command.

bentoml serve BENTO_TAG

Via bentoml.Server API#

For those working within scripting environments or running Python-based tests where using the CLI might be difficult, the bentoml.Server API offers a more programmatic way to serve and interact with your Bento. It gives you detailed control over the server lifecycle, especially useful for debugging and iterative testing.

The following example uses the Bento iris_classifier:latest created in the quickstart Deploy an Iris classification model with BentoML to create an HTTP server. Note that GrpcServer is also available.

from bentoml import HTTPServer
import numpy as np

# Initialize the server with the Bento
server = HTTPServer("iris_classifier:latest", production=True, port=3000, host='')

# Start the server (non-blocking by default)

# Get a client to make requests to the server
client = server.get_client()

# Send a request using the client
result = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))

# Stop the server to free up resources

Alternatively, you can manage the server’s lifecycle using a context manager. This ensures that the server is automatically stopped once you exit the with block.

from bentoml import HTTPServer
import numpy as np

server = HTTPServer("iris_classifier:latest", production=True, port=3000, host='')

with server.start() as client:
    result = client.classify(np.array([[4.9, 3.0, 1.4, 0.2]]))

Push and Pull#

Yatai provides a centralized Bento repository that comes with flexible APIs and Web UI for managing all Bentos created by your team. It can be configured to store Bento files on cloud blob storage such as AWS S3, MinIO or GCS, and automatically build docker images when a new Bento was pushed.

Β» bentoml push iris_classifier:latest

Successfully pushed Bento "iris_classifier:nvjtj7wwfgsafuqj"
Β» bentoml pull iris_classifier:nvjtj7wwfgsafuqj

Successfully pulled Bento "iris_classifier:nvjtj7wwfgsafuqj"
Yatai Bento Repo UI

Bento Management API#

Similar to concepts/model:Managing Models, equivalent Python APIs are also provided for managing Bentos:

import bentoml
bento = bentoml.get("iris_classifier:latest")

import bentoml
bentos = bentoml.list()
import bentoml
bentoml.export_bento('my_bento:latest', '/path/to/folder/my_bento.bento')


Bentos can be exported to or import from AWS S3, GCS, FTP, Dropbox, etc. For example: bentoml.export_bento('my_bento:latest', 's3://my_bucket/folder')

If your team has Yatai setup, you can also push local Bentos to Yatai, it provides APIs and Web UI for managing all Bentos created by your team, stores Bento files on cloud blob storage such as AWS S3, MinIO or GCS, and automatically builds docker images when a new Bento was pushed.

import bentoml
import bentoml

What’s inside a Bento#

It is possible to view the generated files in a specific Bento. Simply use the -o/--output option of the bentoml get command to find the file path to the Bento archive directory.

Β» cd $(bentoml get iris_classifier:latest -o path)
Β» tree
β”œβ”€β”€ apis
β”‚   └── openapi.yaml
β”œβ”€β”€ bento.yaml
β”œβ”€β”€ env
β”‚   β”œβ”€β”€ docker
β”‚   β”‚   β”œβ”€β”€ Dockerfile
β”‚   β”‚   └──
β”‚   └── python
β”‚       β”œβ”€β”€ requirements.lock.txt
β”‚       β”œβ”€β”€ requirements.txt
β”‚       └── version.txt
β”œβ”€β”€ models
β”‚    └── iris_clf
β”‚       β”œβ”€β”€ latest
β”‚       └── nb5vrfgwfgtjruqj
β”‚           β”œβ”€β”€ model.yaml
β”‚           └── saved_model.pkl
└── src
  • src directory contains files specified under the include field in the bentofile.yaml. These files are relative to user Python code’s CWD (current working directory), which makes importing relative modules and file path inside user code possible.

  • models directory contains all models required by the Service. This is automatically determined from the bentoml.Service object’s runners list.

  • apis directory contains all API definitions. This directory contains API specs that are generated from the bentoml.Service object’s API definitions.

  • env directory contains all environment-related files which will help bootstrap the Bento 🍱. This directory contains files that are generated from Bento Build Options that is specified under bentofile.yaml.


Warning: users should never change files in the generated Bento archive, unless it’s for debugging purpose.

Bento Build Options#

Build options are specified in a .yaml file, which customizes the final Bento produced.

By convention, this file is named bentofile.yaml.

In this section, we will go over all the build options, including defining dependencies, configuring files to include, and customize docker image settings.


service is a required field which specifies where the bentoml.Service object is defined.

In the tutorial, we defined service: "service:svc", which can be interpreted as:

  • service refers to the Python module (the file)

  • svc refers to the bentoml.Service object created in, with svc = bentoml.Service(...)


This is synonymous to how the bentoml serve command specifies a bentoml.Service target.

    β”‚                β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”˜
    β”‚                            β”‚
    β”‚  service: "service:svc"    β”‚
    β”‚                ─┬─         β”‚
    β”‚                 β”‚          β”‚
                      β”‚    β”Œβ”€β”€β”€β”€β”
β”‚                     β”‚    β””β”€β”€β”¬β”€β”˜
β”‚                     β–Ό       β”‚
β”‚ Β» bentoml serve service:svc β”‚
β”‚                             β”‚
β”‚                             β”‚


description field allows user to customize documentation for any given Bento.

The description contents must be plain text, optionally in Markdown format. Description can be specified either inline in the bentofile.yaml, or via a file path to an existing text file:

service: ""
description: |
    ## Description For My Bento 🍱

    Use **any markdown syntax** here!

    > BentoML is awesome!
service: ""
description: "file: ./"


When pointing to a description file, it can be either an absolute path or a relative path. The file must exist on the given path upon bentoml build command run, and for relative file path, the current path is set to the build_ctx, which default to the directory where bentoml build was executed from.


labels are key-value pairs that are attached to an object.

In BentoML, both Bento and Model can have labels attached to them. Labels are intended to be used to specify identifying attributes of Bentos/Models that are meaningful and relevant to users, but do not directly imply semantics to the rest of the system.

Labels can be used to organize models and Bentos in Yatai, which also allow users to add or modify labels at any time.

  owner: bentoml-team
  stage: not-ready

Files to include#

In the example above, the *.py includes every Python files under build_ctx. You can also include other wildcard and directory pattern matching.

  - "data/"
  - "**/*.py"
  - "config/*.json"
  - "path/to/a/file.csv"

If the include field is not specified, BentoML will include all files under the build_ctx directory, besides the ones explicitly set to be excluded, as will be demonstrated in Files to exclude.

See also

Both include and exclude fields support gitignore style pattern matching..

Files to exclude#

If there are a lot of files under the working directory, another approach is to only specify which files to be ignored.

exclude field specifies the pathspecs (similar to .gitignore files) of files to be excluded in the final Bento build. The pathspecs are relative to the build_ctx directory.

- "data/"
- "**/*.py"
- "tests/"
- "secrets.key"

Users can also opt to place a .bentoignore file in the build_ctx directory. This is what a .bentoignore file would look like:



exclude is always applied after include.

Python Packages#

Required Python packages for a given Bento can be specified under the python.packages field.

When a package name is left without a version, BentoML will lock the package to the version available under the current environment when running bentoml build. User can also specify the desired version, install from a custom PyPI source, or install from a GitHub repo:

    - "numpy"
    - "matplotlib==3.5.1"
    - "package>=0.2,<0.3"
    - "torchvision==0.9.2 --extra-index-url"
    - "git+"


There’s no need to specify bentoml as a dependency here since BentoML will add the current version of BentoML to the Bento’s dependency list by default. User can override this by specifying a different BentoML version.

To use a variant of BentoML with additional features such as gRPC, tracing exporters, pydantic validation, specify the desired variant in the under python.packages field:

  - "bentoml[grpc]"
  - "bentoml[aws]"
  - "bentoml[io-json]"
  - "bentoml[io-image]"
  - "bentoml[io-pandas]"
  - "bentoml[io-json]"
  - "bentoml[tracing-jaeger]"
  - "bentoml[tracing-zipkin]"
  - "bentoml[tracing-otlp]"

If you already have a requirements.txt file that defines python packages for your project, you may also supply a path to the requirements.txt file directly:

    requirements_txt: "./project-a/ml-requirements.txt"

Pip Install Options#

Additional pip install arguments can also be provided.

Note that these arguments will be applied to all packages defined in python.packages, as well as the requirements_txt file, if provided.

    requirements_txt: "./requirements.txt"
    index_url: ""
    no_index: False
    - ""
    - ""
    - ""
    - "https://<other api token>"
    - ""
    pip_args: "--pre -U --force-reinstall"


BentoML by default will cache pip artifacts across all local image builds to speed up the build process.

If you want to force a re-download instead of using the cache, you can specify the pip_args: "--no-cache-dir" option in your bentofile.yaml, or use the --no-cache option in bentoml containerize command, e.g.:

Β» bentoml containerize my_bento:latest --no-cache

PyPI Package Locking#

By default, BentoML automatically locks all package versions, as well as all packages in their dependency graph, to the version found in the current build environment, and generates a requirements.lock.txt file. This process uses pip-compile under the hood.

If you have already specified a version for all packages, you can optionally disable this behavior by setting the lock_packages field to False:

    requirements_txt: "requirements.txt"
    lock_packages: false

Python Wheels#

Python .whl files are also supported as a type of dependency to include in a Bento. Simply provide a path to your .whl files under the wheels` field.

    - ./lib/my_package.whl

If the wheel is hosted on a local network without TLS, you can indicate that the domain is safe to pip with the trusted_host field.

Python Options Table#




The path to a custom requirements.txt file


Packages to include in this bento


Whether to lock the packages or not


Inputs for the --index-url pip argument


Whether to include the --no-index pip argument


List of trusted hosts used as inputs using the --trusted-host pip argument


List of links to find as inputs using the --find-links pip argument


List of extra index urls as inputs using the β‰ˆ pip argument


Any additional pip arguments that you would like to add when installing a package


List of paths to wheels to include in the bento


You can specify the model to be used for building a bento using a string model tag or a dictionary, which will be written to the bento.yaml file in the bento package. When you start from an existing project, you can download models from Yatai to your local model store with these configurations by running bentoml models pull. Note that you need to log in to Yatai first by running bentoml yatai login.

See the following example for details. If you don’t define models in bentofile.yaml, the model specified in the service is used to build the bento.

  - "iris_clf:latest" # A string model tag
  - tag: "iris_clf:version1" # A dictionary
    filter: "label:staging"
    alias: "iris_clf_v1"
  • tag: The name and version of the model, separated by a colon.

  • filter: This field uses the same filter syntax in Yatai. You use a filter to list specific models, such as the models with the same label. You can add multiple comma-separated filters to a model.

  • alias: An alias for the model. If this is specified, you can use it directly in code like bentoml.models.get(alias).

Conda Options#

Conda dependencies can be specified under conda field. For example:

    - default
    - h2o
    - "scikit-learn==1.2.0"

When channels filed is left unspecified, BentoML will use the community maintained conda-forge channel as the default.

Optionally, you can export all dependencies from a preexisting conda environment to an environment.yml file, and provide this file in your bentofile.yaml config:

Export conda environment:

Β» conda env export > environment.yml

In your bentofile.yaml:

    environment_yml: "./environment.yml"


Unlike Python packages, BentoML does not support locking conda packages versions automatically. It is recommended for users to specify a version in the config file.

See also

When conda options are provided, BentoML will select a docker base image that comes with Miniconda pre-installed in the generated Dockerfile. Note that only the debian and alpine distro support conda. Learn more at the Docker Options section below.

Conda Options Table#




Path to a conda environment file to copy into the bento. If specified, this file will overwrite any additional option specified


Custom conda channels to use. If not specified will use conda-forge


Custom conda dependencies to include in the environment


The specific pip conda dependencies to include

Docker Options#

BentoML makes it easy to deploy a Bento to a Docker container. This section discuss the available options for customizing the docker image generated from a Bento.

Here’s a basic Docker options configuration:

    distro: debian
    python_version: "3.8.12"
    cuda_version: "11.6.2"
      - libblas-dev
      - liblapack-dev
      - gfortran
      FOO: value1
      BAR: value2


BentoML leverage BuildKit, a cache-efficient builder toolkit, to containerize Bentos 🍱.

BuildKit comes with Docker 18.09. This means if you are using Docker via Docker Desktop, BuildKit will be available by default.

However, if you are using a standalone version of Docker, you can install BuildKit by following the instructions here.

OS Distros#

The following OS distros are currently supported in BentoML:

  • debian: default, similar to Ubuntu

  • alpine: A minimal Docker image based on Alpine Linux

  • ubi8: Red Hat Universal Base Image

  • amazonlinux: Amazon Linux 2

Some of the distros may not support using conda or specifying CUDA for GPU. Here is the support matrix for all distros:


Available Python Versions

Conda Support

CUDA Support (GPU)


3.7, 3.8, 3.9, 3.10




3.7, 3.8, 3.9, 3.10




3.8, 3.9




3.7, 3.8




Document image supported architectures

GPU support#

The cuda_version field specifies the target CUDA version to install on the the generated docker image. Currently, the following CUDA version are supported:

  • "11.6.2"

  • "11.4.3"

  • "11.2.2"

BentoML will also install additional packages required for given target CUDA version.

    cuda_version: "11.6.2"

If you need a different cuda version that is not currently supported in BentoML, it is possible to install it by specifying it in the system_packages or via the setup_script.

Installing custom CUDA version with conda

We will demonstrate how you can install custom cuda version via conda.

Add the following to your bentofile.yaml:

  - conda-forge
  - nvidia
  - defaults
  - cudatoolkit-dev=10.1
  - cudnn=7.6.4
  - cxx-compiler=1.0
  - mpi4py=3.0 # installs cuda-aware openmpi
  - matplotlib=3.2
  - networkx=2.4
  - numba=0.48
  - pandas=1.0

Then proceed with bentoml build and bentoml containerize respectively:

Β» bentoml build

Β» bentoml containerize <bento>:<tag>

Setup Script#

For advanced Docker customization, you can also use a setup_script to inject arbitrary user provided script during the image build process. For example, with NLP projects you can pre-download NLTK data in the image with:

In your bentofile.yaml:

    - nltk
  setup_script: "./"

In the file:

set -euxo pipefail

echo "Downloading NLTK data.."
python -m nltk.downloader all

Now build a new bento and then run bentoml containerize MY_BENTO --progress plain to view the docker image build progress. The newly built docker image will contain pre-downloaded NLTK dataset.


When working with bash scripts, it is recommended to add set -euxo pipefail to the beginning. Especially when set -e is missing, the script will fail silently without raising an exception during bentoml containerize. Learn more about Bash Set builtin.

It is also possible to provide a Python script for initializing the docker image. Here’s an example:

In bentofile.yaml:

      - nltk
  setup_script: "./"

In the file:

#!/usr/bin/env python

import nltk

print("Downloading NLTK data..")'treebank')


Pay attention to #!/bin/bash and #!/usr/bin/env python in the first line of the example scripts above. They are known as Shebang and they are required in a setup script provided to BentoML.

Setup script is always executed after the specified Python packages, conda dependencies, and system packages are installed. Thus user can import and utilize those libraries in their setup script for the initialization process.

Enable features for your Bento#

Users can optionally pass in the --enable-features flag to bentoml containerize to enable additional features for the generated Bento container image.




adding AWS interop (currently file upload to S3)


enable gRPC functionalities in BentoML


enable gRPC Channelz for debugging purposes


enable gRPC Reflection


adding Pillow dependencies to Image IO descriptor


adding Pydantic validation to JSON IO descriptor


adding Pandas dependencies to PandasDataFrame descriptor


enable Jaeger Exporter for distributed tracing


enable OTLP Exporter for distributed tracing


enable Zipkin Exporter for distributed tracing


enable Monitoring feature

Advanced Options#

For advanced customization for generating docker images, see Advanced Containerization:

  1. Using base image

  2. Using dockerfile template

Docker Options Table#




The OS distribution on the Docker image, Default to debian.


Specify which python to include on the Docker image [3.7, 3.8, 3.9, 3.10]. Default to the Python version in build environment.


Specify the cuda version to install on the Docker image [11.6.2].


Declare system packages to be installed in the container.


Declare environment variables in the generated Dockerfile.


A python or shell script that executes during docker build time.


A user-provided docker base image. This will override all other custom attributes of the image.


Customize the generated dockerfile by providing a Jinja2 template that extends the default dockerfile.