Getting Started

Installing BentoML

BentoML requires python 3.6 or above, install via pip:

pip install bentoml

Instructions for installing from source can be found in the development guide.

Download Quickstart Notebook

Download and run the code in this quickstart locally:

pip install jupyter
git clone
jupyter notebook bentoml/guides/quick-start/bentoml-quick-start-guide.ipynb

In order to build model server docker image, you will also need to install docker for your system, read more about how to install docker here.

Alternatively, run the code in this guide here on Google’s Colab:

Launch on Colab

Hello World

The first step of creating a prediction service with BentoML, is to write a prediction service class inheriting from bentoml.BentoService, and declaratively listing the dependencies, model artifacts and writing your service API call back function. Here is what a simple prediction service looks like:

import bentoml
from bentoml.handlers import DataframeHandler
from bentoml.artifact import SklearnModelArtifact

class IrisClassifier(bentoml.BentoService):

    def predict(self, df):
        return self.artifacts.model.predict(df)

The bentoml.api and DataframeHandler here tells BentoML, that following by it, is the service API callback function, and pandas.Dataframe is its expected input format.

The bentoml.env decorator allows user to specify the dependencies and environment settings for this prediction service. Here we are creating the prediction service based on a scikit learn model, so we add it to the list of pip dependencies.

Last but not least, bentoml.artifact declares the required trained model to be bundled with this prediction service. Here it is using the built-in SklearnModelArtifact and simply naming it ‘model’. BentoML also provide model artifact for other frameworks such as PytorchModelArtifact, KerasModelArtifact, FastaiModelArtifact, and XgboostModelArtifact etc.

Saving a versioned BentoService bundle

Next, we train a classifier model with Iris dataset, and pack the trained model with the BentoService IrisClassifier defined above:

from sklearn import svm
from sklearn import datasets

clf = svm.SVC(gamma='scale')
iris = datasets.load_iris()
X, y =,, y)

# Create a iris classifier service with the newly trained model
iris_classifier_service = IrisClassifier.pack(model=clf)

# Save the entire prediction service to file bundle
saved_path =

You’ve just created a BentoService SavedBundle, it’s a versioned file archive that is ready for production deployment. It contains the BentoService you defined, as well as the packed trained model artifacts, pre-processing code, dependencies and other configurations in a single file directory.

Model Serving with BentoML


The {saved_path} in the following commands are referring to the returned value of It is the file path where the BentoService saved bundle is stored. BentoML locally keeps track of all the BentoService you’ve saved, you can also find the saved_path of your BentoService via bentoml get IrisClassifier -o wide command.

Model Serving via REST API

You can start a REST API server by specifying the BentoService’s name and version, or provide the file path to the saved bundle:

bentoml serve IrisClassifier:latest
# or
bentoml serve {saved_path}

The REST API server provides web UI for testing and debugging the server. If you are running this command on your local machine, visit in your browser and try out sending API request to the server.

BentoML API Server Web UI Screenshot

You can also send prediction request with curl from command line:

curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \

Or with python and request library:

import requests
response ="", json=[[5.1, 3.5, 1.4, 0.2]])

Model Serving via CLI

For testing purpose, you can load the BentoService SavedBundle from command line and run the prediction task on the given input dataset:

bentoml run IrisClassifier:latest predict --input='[[5.1, 3.5, 1.4, 0.2]]'

# alternatively pass input data via CSV file:
bentoml run IrisClassifier:latest predict --input='./iris_test_data.csv'

Distribute BentoML SavedBundle as PyPI package

The BentoService SavedBundle is pip-installable and can be directly distributed as a PyPI package if you plan to use the model in your python applications. You can install it as as a system-wide python package with pip:

pip install {saved_path}
# Your bentoML model class name will become packaged name
import IrisClassifier

installed_svc = IrisClassifier.load()
installed_svc.predict([[5.1, 3.5, 1.4, 0.2]])

This also allow users to upload their BentoService to as public python package or to their organization’s private PyPi index to share with other developers.

cd {saved_path} & python sdist upload


You will have to configure “.pypirc” file before uploading to pypi index. You can find more information about distributing python package at:

Containerize REST API server with Docker

The BentoService SavedBundle directory is structured to work as a docker build context, that can be used to build a API server docker container image:

docker build -t my_api_server {saved_path}

docker run -p 5000:5000 my_api_server


You will need to install docker before running this. Follow instructions here:

Deploy REST API server to the cloud

BentoML has a built-in deployment management tool called YataiService. YataiService can be deployed separately to manage all your teams’ trained models, BentoService bundles, and active deployments in a central place. But you can also create standalone model serving deployments with just the BentoML cli, which launches a local YataiService backed by SQLite database on your machine.

BentoML has built-in support for deploying to multiple cloud platforms. For demo purpose, let’s now deploy the IrisClassifier service we just created, to AWS Lambda into a serverless API endpoint.

First you need to install the aws-sam-cli package, which is required by BentoML to work with AWS Lambda deployment:

pip install -U aws-sam-cli==0.31.1


You will also need to configure your AWS account and credentials if you don’t have it configured on your machine. You can do this either via environment variables or through the aws configure command: install aws cli command via pip install awscli and follow detailed instructions here.

Now you can run the bentoml deploy command, to create a AWS Lambda deployment, hosting the BentService you’ve created:

# replace the version here with the generated version string when creating the BentoService SavedBundle
bentoml lambda deploy quick-start-guide-deployment \
    -b=IrisClassifier:20191126125258_4AB1D4 \
    --platform=aws-lambda \

Interested in learning more about BentoML? Check out the Examples on BentoML github repository.

Be sure to join BentoML slack channel to hear about the latest development updates.