Getting Started

Installing BentoML

BentoML requires python 3.6 or above, install via pip:

pip install bentoml

Instructions for installing from source can be found in the development guide.

Download Quickstart Notebook

Download and run the code in this quickstart locally:

pip install jupyter
git clone
jupyter notebook bentoml/guides/quick-start/bentoml-quick-start-guide.ipynb

Or Download the notebook (Right-Click and then “Save Link As”) to your notebook workspace.

To build a model server docker image, you will also need to install docker for your system, read more about how to install docker here.

Alternatively, play with the notebook on Google Colab: BentoML Quickstart on Google Colab.

Hello World

The first step of creating a prediction service with BentoML, is to write a prediction service class inheriting from bentoml.BentoService, and specify the required model artifacts, PyPI dependencies and write the service API function. Here is a minimal prediction service definition with BentoML:

import bentoml
from bentoml.adapters import DataframeInput
from bentoml.artifact import SklearnModelArtifact

class IrisClassifier(bentoml.BentoService):

    def predict(self, df):
        # Optional pre-processing, post-processing code goes here
        return self.artifacts.model.predict(df)

The bentoml.api decorator defines a service API, which is the entry point for accessing the prediction service. The DataframeInput here denotes that this service API will convert HTTP JSON request into pandas.DataFrame object before passing it to the user-defined API function code for inference.

The bentoml.env decorator allows specifying the dependencies and environment settings for this prediction service. The example with auto_pip_dependencies=True flag, BentoML will automatically inference all the pip packages that are required by the prediction service code and pins down their version.

Lastly bentoml.artifact defines the required trained models to be bundled with this prediction service. Here it is using the built-in SklearnModelArtifact and simply naming it ‘model’. BentoML also provide model artifact for other frameworks such as PytorchModelArtifact, KerasModelArtifact, FastaiModelArtifact, and XgboostModelArtifact etc.

From Model Training To Serving

The following code trains a scikit-learn model and bundles the trained model with an IrisClassifier instance. The IrisClassifier instance is then saved to disk in the BentoML SavedBundle format, which is a versioned file archive that is ready for production models serving deployment.

from sklearn import svm
from sklearn import datasets

from iris_classifier import IrisClassifier

if __name__ == "__main__":
    # Load training data
    iris = datasets.load_iris()
    X, y =,

    # Model Training
    clf = svm.SVC(gamma='scale'), y)

    # Create a iris classifier service instance
    iris_classifier_service = IrisClassifier()

    # Pack the newly trained model artifact
    iris_classifier_service.pack('model', clf)

    # Save the prediction service to disk for model serving
    saved_path =

By default, BentoML stores SavedBundle files under the ~/bentoml directory. Users can also customize BentoML to use a different directory or cloud storage like AWS S3. BentoML also comes with a model management component YataiService, which provides advanced model management features including a dashboard web UI:

BentoML YataiService Bento Repository Page BentoML YataiService Bento Details Page


The {saved_path} in the following commands are referring to the returned value of It is the file path where the BentoService saved bundle is stored. BentoML locally keeps track of all the BentoService SavedBundle you’ve created, you can also find the saved_path of your BentoService from the output of bentoml list -o wide, bentoml get IrisClassifier -o wide and bentoml get IrisClassifier:latest command.

A quick way of getting the saved_path from the command line is piping the output of bentoml get to jq command:

saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")

Model Serving via REST API

You can start a REST API server by specifying the BentoService’s name and version, or provide the file path to the saved bundle:

bentoml serve IrisClassifier:latest
# Assuming JQ( was installed, you can also manually
# copy the uri field in `bentoml get` command's JSON output
saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")

bentoml serve $saved_path

The IrisClassifier model is now served at localhost:5000. Use curl command to send a prediction request:

curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \

Similarly, with python and request:

import requests
response ="", json=[[5.1, 3.5, 1.4, 0.2]])

The BentoML API server also provides a web UI for accessing predictions and debugging the server. Visit http://localhost:5000 in the browser and use the Web UI to send prediction request:

BentoML API Server Web UI Screenshot

Batch Serving via CLI

For batch offline serving or testing your prediction service on batch test data, you can load the BentoService SavedBundle from command line and run the prediction task on the given input dataset. e.g.:

bentoml run IrisClassifier:latest predict --input='[[5.1, 3.5, 1.4, 0.2]]'

bentoml run IrisClassifier:latest predict --input='./iris_test_data.csv'

Containerize Model API Server

BentoML provides a convenient way to containerize the model API server with Docker:

  1. Find the SavedBundle directory with bentoml get command

  2. Run docker build with the SavedBundle directory which contains a generated Dockerfile

  3. Run the generated docker image to start a docker container serving the model

saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")

docker build -t {docker_username}/iris-classifier $saved_path

docker run -p 5000:5000 -e BENTOML_ENABLE_MICROBATCH=True {docker_username}/iris-classifier

This made it possible to deploy BentoML bundled ML models with platforms such as Kubeflow, Knative, Kubernetes, which provides advanced model deployment features such as auto-scaling, A/B testing, scale-to-zero, canary rollout and multi-armed bandit.


Ensure docker is installed before running the command above. Instructions on installing docker:

Deploy API server to the cloud

BentoML has a built-in deployment management tool called YataiService. YataiService can be deployed separately to manage all your teams’ trained models, BentoService bundles, and active deployments in a central place. But you can also create standalone model serving deployments with just the BentoML cli, which launches a local YataiService backed by SQLite database on your machine.

BentoML has built-in support for deploying to multiple cloud platforms. For demo purpose, let’s now deploy the IrisClassifier service we just created, to AWS Lambda into a serverless API endpoint.

First you need to install the aws-sam-cli package, which is required by BentoML to work with AWS Lambda deployment:

pip install -U aws-sam-cli==0.31.1


You will also need to configure your AWS account and credentials if you don’t have it configured on your machine. You can do this either via environment variables or through the aws configure command: install aws cli command via pip install awscli and follow detailed instructions here.

Now you can run the bentoml deploy command, to create a AWS Lambda deployment, hosting the BentService you’ve created:

# replace the version here with the generated version string when creating the BentoService SavedBundle
bentoml lambda deploy quick-start-guide-deployment \
    -b=IrisClassifier:20191126125258_4AB1D4 \

Distribute BentoService as a PyPI package

The BentoService SavedBundle is pip-installable and can be directly distributed as a PyPI package if you plan to use the model in your python applications. You can install it as as a system-wide python package with pip:

saved_path=$(bentoml get IrisClassifier:latest -q | jq -r ".uri.uri")

pip install $saved_path
# Your bentoML model class name will become the package name
import IrisClassifier

installed_svc = IrisClassifier.load()
installed_svc.predict([[5.1, 3.5, 1.4, 0.2]])

This also allow users to upload their BentoService to as public python package or to their organization’s private PyPi index to share with other developers.

cd $saved_path & python sdist upload


You will have to configure “.pypirc” file before uploading to pypi index. You can find more information about distributing python package at:

Interested in learning more about BentoML? Check out the BentoML Core Concepts and best practices walkthrough, a must-read for anyone who is looking to adopt BentoML.

Be sure to join BentoML slack channel to hear about the latest development updates and be part of the roadmap discussions.