Getting Started

Installing BentoML

BentoML requires python 3.6 or above, install via pip:

pip install bentoml

Instructions for installing from source can be found in the development guide.

Download Quickstart Notebook

Download and run the code in this quickstart locally:

pip install jupyter
git clone
jupyter notebook bentoml/guides/quick-start/bentoml-quick-start-guide.ipynb

In order to build model server docker image, you will also need to install docker for your system, read more about how to install docker here.

Alternatively, run the code in this guide here on Google’s Colab:

Launch on Colab

Or download the quickstart jupyter notebook and run it on your computer: [download notebook](

Hello World

The first step of creating a prediction service with BentoML, is to write a prediction service class inheriting from bentoml.BentoService, and describe the required model artifacts, environment dependencies and writing your service API call back function. Here is what a simple prediction service looks like:

import bentoml
from bentoml.handlers import DataframeHandler
from bentoml.artifact import SklearnModelArtifact

class IrisClassifier(bentoml.BentoService):

    def predict(self, df):
        return self.artifacts.model.predict(df)

The bentoml.api decorator defines a service API, which is the entry point for sending prediction request. The function being decorated is user defined code for processing prediction requests. Lastly the DataframeHandler here tells BentoML that this service API is expecting pandas.DataFrame object as its input format.

The bentoml.env decorator allows specifying the dependencies and environment settings for this prediction service. Here we are using BentoML’s auto_pip_dependencies fature which automatically extracts and bundles all pip packages that are required for your prediction service and pins down their version.

Lastly bentoml.artifact defines the required trained models to be bundled with this prediction service. Here it is using the built-in SklearnModelArtifact and simply naming it ‘model’. BentoML also provide model artifact for other frameworks such as PytorchModelArtifact, KerasModelArtifact, FastaiModelArtifact, and XgboostModelArtifact etc.

Put the BentoService class definition to a separate file, and now you are ready to train a scikit-learn model and serve it.

From Model Training To Serving

Next, we train a classifier model with Iris dataset, and pack the trained model with an instance of the IrisClassifier BentoService defined above, and save the entire prediction service.

from sklearn import svm
from sklearn import datasets

from iris_classifier import IrisClassifier

if __name__ == "__main__":
    # Load training data
    iris = datasets.load_iris()
    X, y =,

    # Model Training
    clf = svm.SVC(gamma='scale'), y)

    # Create a iris classifier service instance
    iris_classifier_service = IrisClassifier()

    # Pack the newly trained model artifact
    iris_classifier_service.pack('model', clf)

    # Save the prediction service to disk for model serving
    saved_path =

With the BentoService#save call, you’ve just created a BentoML SavedBundle. It is a versioned file archive that is ready for model serving deployment. The file archive directory contains the BentoService you defined, the trained model artifact, all the local python code you imported and PyPI dependencies in a requirements.txt etc, all bundled in one place.


The {saved_path} in the following commands are referring to the returned value of It is the file path where the BentoService saved bundle is stored. BentoML locally keeps track of all the BentoService SavedBundle you’ve created, you can also find the saved_path of your BentoService via bentoml list -o wide or bentoml get IrisClassifier -o wide command.

Model Serving via REST API

You can start a REST API server by specifying the BentoService’s name and version, or provide the file path to the saved bundle:

bentoml serve IrisClassifier:latest
# or
bentoml serve {saved_path}

The REST API server provides web UI for testing and debugging the server. If you are running this command on your local machine, visit in your browser and try out sending API request to the server.

BentoML API Server Web UI Screenshot

You can also send prediction request with curl from command line:

curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \

Or with python and request library:

import requests
response ="", json=[[5.1, 3.5, 1.4, 0.2]])

Batch Serving via CLI

For batch offline serving or testing your prediction service on batch test data, you can load the BentoService SavedBundle from command line and run the prediction task on the given input dataset. e.g.:

bentoml run IrisClassifier:latest predict --input='[[5.1, 3.5, 1.4, 0.2]]'

bentoml run IrisClassifier:latest predict --input='./iris_test_data.csv'

Containerize Model API Server

The BentoService SavedBundle directory is structured to work as a docker build context, which can be used directly to build a API server docker container image:

docker build -t my_api_server {saved_path}

docker run -p 5000:5000 my_api_server


You will need to install docker before running this. Follow instructions here:

Deploy API server to the cloud

BentoML has a built-in deployment management tool called YataiService. YataiService can be deployed separately to manage all your teams’ trained models, BentoService bundles, and active deployments in a central place. But you can also create standalone model serving deployments with just the BentoML cli, which launches a local YataiService backed by SQLite database on your machine.

BentoML has built-in support for deploying to multiple cloud platforms. For demo purpose, let’s now deploy the IrisClassifier service we just created, to AWS Lambda into a serverless API endpoint.

First you need to install the aws-sam-cli package, which is required by BentoML to work with AWS Lambda deployment:

pip install -U aws-sam-cli==0.31.1


You will also need to configure your AWS account and credentials if you don’t have it configured on your machine. You can do this either via environment variables or through the aws configure command: install aws cli command via pip install awscli and follow detailed instructions here.

Now you can run the bentoml deploy command, to create a AWS Lambda deployment, hosting the BentService you’ve created:

# replace the version here with the generated version string when creating the BentoService SavedBundle
bentoml lambda deploy quick-start-guide-deployment \
    -b=IrisClassifier:20191126125258_4AB1D4 \

Distribute BentoService as a PyPI package

The BentoService SavedBundle is pip-installable and can be directly distributed as a PyPI package if you plan to use the model in your python applications. You can install it as as a system-wide python package with pip:

pip install {saved_path}
# Your bentoML model class name will become packaged name
import IrisClassifier

installed_svc = IrisClassifier.load()
installed_svc.predict([[5.1, 3.5, 1.4, 0.2]])

This also allow users to upload their BentoService to as public python package or to their organization’s private PyPi index to share with other developers.

cd {saved_path} & python sdist upload


You will have to configure “.pypirc” file before uploading to pypi index. You can find more information about distributing python package at:

Interested in learning more about BentoML? Check out the Examples on BentoML github repository.

Be sure to join BentoML slack channel to hear about the latest development updates.