Getting Started

There are three parts to the BentoML workflow.

  1. Save Models

  2. Define and Debug Services

  3. Build and Deploy Bentos

Save Models

We start with saving a trained model instance to BentoML’s local model store. If models are already saved to file, they can also be brought to BentoML with the import APIs.

from sklearn import svm
from sklearn import datasets

# Load training data
iris = datasets.load_iris()
X, y =,

# Model Training
clf = svm.SVC(gamma='scale'), y)"iris_clf", clf)
# [INFO] Scikit-learn model 'iris_clf:yftvuwkbbbi6zcphca6rzl235' is successfully saved to BentoML local model store under "~/bentoml/models/iris_clf/yftvuwkbbbi6zcphca6rzl235"

The ML framework specific API,, will save the Iris Classifier to a local model store managed by BentoML. And the load_runner() API can be used to load this model into a Runner:

iris_clf_runner = bentoml.sklearn.load_runner("iris_clf:latest")[5.9, 3. , 5.1, 1.8]))

Models can also be managed via the bentoml models CLI command, see bentoml models –help for more.

> bentoml models list iris_clf

TAG                                FRAMEWORK    CREATED
iris_clf:yftvuwkbbbi6zcphca6rzl235 ScikitLearn  2021/9/19 10:13:35

Define and Debug Services

Services are the core components of BentoML where the serving logic is defined. With the model saved in the model store, we can define the service by creating a Python file in the working directory with the following contents. In the example below, we defined numpy.ndarray as the input and output type. More options like pandas.dataframe and PIL.image are also supported IO types, see @API and IO Descriptors.

import bentoml
import bentoml.sklearn
import numpy as np

from import NumpyNdarray

# Load the runner for the latest ScikitLearn model we just saved
iris_clf_runner = bentoml.sklearn.load_runner("iris_clf:latest")

# Create the iris_classifier service with the ScikitLearn runner
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])

# Create API function with pre- and post- processing logic
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def predict(input_ndarray: np.ndarray) -> np.ndarray:
    # Define pre-processing logic
    result =
    # Define post-processing logic
    return result

We now have everything needed to serve our first request, launch the server in debug mode by running the bentoml serve command in the current working directory. Using the –reload option allows the server to reflect any change in the module without restarting the server.

> bentoml serve ./ --reload

(Press CTRL+C to quit)
[INFO] Starting BentoML API server in development mode with auto-reload enabled
[INFO] Serving BentoML Service "iris_classifier" defined in ""
[INFO] API Server running on

We can send requests to the newly started service with any clients.

import requests
    headers={"content-type": "application/json"},

Build and Deploy Bentos

Once we are happy with the service definition, we can build the model and service into a bento. Bentos are the distribution format of the service that can be deployed and contains all the information required for running the service, from models to the dependencies.

To build a Bento, first create a bentofile.yaml in your project directory:

```yaml # bentofile.yaml service: “iris_classifier:svc” include: - “*.py” python:

  • scikit-learn


Next, use the bentoml build CLI command in the same directory to build a bento.

> bentoml build

[INFO] Building BentoML Service "iris_classifier" with models "iris_clf:yftvuwkbbbi6zcphca6rzl235"
[INFO] Bento is successfully built and saved to ~/bentoml/bentos/iris_classifier/v5mgcacfgzi6zdz7vtpeqaare

Bentos built will be saved in the local bento store, which we can view via the bentoml list CLI command.

> bentoml list
TAG                                        CREATED
iris_classifier:v5mgcacfgzi6zdz7vtpeqaare  2021/09/19 10:15:50

We can serve bentos from the bento store using the bentoml serve –production CLI command. Using the –production option allows serving the bento in production mode.

> bentoml serve iris_classifier_service:latest --production

(Press CTRL+C to quit)
[INFO] Starting BentoML API server in production mode
[INFO] Serving BentoML Service "iris_classifier_service"
[INFO] API Server running on

Lastly, we can containerize bentos as Docker images using the bentoml container CLI command and manage Bentos at scale using the model and bento management service.