Tutorial: Intro to BentoML#
BentoML is a python-first, efficient and flexible framework for machine learning model serving. It lets data scientists to save and version trained models in a standardize format and unifies how a saved model can be accessed for serving. This enables ML engineers to easily use the saved models for building online prediction services or batch inference jobs.
BentoML also helps with defining the APIs, environments and dependencies for running a
model, providing a build tool that encapsulates all model artifacts, source code and
dependencies into a self-contained format Bento
, which is designed to be DevOps
friendly and ready for production deployment - just like docker but designed for ML
models.
What are we building#
In this tutorial, we will focus on online model serving with BentoML, using a classification model trained with Scikit-Learn and the Iris dataset. By the end of this tutorial, we will have an HTTP endpoint for handling inference requests and a docker container image for deployment.
Note
You might be tempted to skip this tutorial because you are not using scikit-learn, but give it a chance. The concepts you will learn in the tutorial are fundamental to model serving with any ML framework using BentoML, and mastering it will give you a deep understanding of BentoML.
Setup for the tutorial#
There are three ways to complete this tutorial:
Run with Google Colab in your browser
π Open Tutorial Notebook on Colab side by side with this guide. As you go through this guide, you can simply run the sample code from the Colab Notebook.
You will be able to try out most of the content in the tutorial on Colab besides the docker container part towards the end. This is because Google Colab currently does not support docker.
Run the tutorial notebook from Docker
If you have Docker installed, you can run the tutorial notebook from a pre-configured docker image with:
docker run -it --rm -p 8888:8888 -p 3000:3000 bentoml/quickstart:latest
Local Development Environment
Download the source code of this tutorial from bentoml/Gallery:
git clone --depth=1 git@github.com:bentoml/gallery.git cd gallery/quickstart/
BentoML supports Linux, Windows and MacOS. You will need Python 3.7 or above to run this tutorial. We recommend using virtual environment to create an isolated local environment. However this is not required.
Install all dependencies required for this tutorial:
pip install --pre bentoml pip install scikit-learn pandas
Saving Models with BentoML#
To begin with BentoML, you will need to save your trained models with BentoML API in its model store(a local directory managed by BentoML). The model store is used for managing all your trained models locally as well as accessing them for serving.
import bentoml
from sklearn import svm
from sklearn import datasets
# Load training data set
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Train the model
clf = svm.SVC(gamma='scale')
clf.fit(X, y)
# Save model to the BentoML local model store
bentoml.sklearn.save_model("iris_clf", clf)
# INFO [cli] Using default model signature `{"predict": {"batchable": False}}` for sklearn model
# INFO [cli] Successfully saved Model(tag="iris_clf:2uo5fkgxj27exuqj", path="~/bentoml/models/iris_clf/2uo5fkgxj27exuqj/")
The model is now saved under the name iris_clf
with an automatically generated
version. The name and version pair can then be used for retrieving the model. For
instance, the original model object can be loaded back into memory for testing via:
model = bentoml.sklearn.load_model("iris_clf:2uo5fkgxj27exuqj")
# Alternatively, use `latest` to find the newest version
model = bentoml.sklearn.load_model("iris_clf:latest")
The bentoml.sklearn.save_model
API is built specifically for the Scikit-Learn
framework and uses its native saved model format under the hood for best compatibility
and performance. This goes the same for other ML frameworks, e.g.
bentoml.pytorch.save_model
, see the Framework-specific Guides to learn more.
See also
It is possible to use pre-trained models directly with BentoML or import existing trained model files to BentoML. Learn more about it from Preparing Models.
Saved models can be managed via the bentoml models
CLI command or Python API,
learn about it here: Managing Models.
Creating a Service#
Services are the core components of BentoML, where the serving logic is defined. Create
a file service.py
with:
import numpy as np
import bentoml
from bentoml.io import NumpyNdarray
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
svc = bentoml.Service("iris_classifier", runners=[iris_clf_runner])
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
def classify(input_series: np.ndarray) -> np.ndarray:
result = iris_clf_runner.predict.run(input_series)
return result
Run it live:
> bentoml serve service:svc --reload
INFO [cli] Starting development BentoServer from "service:svc" running on http://127.0.0.1:3000 (Press CTRL+C to quit)
INFO [dev_api_server] Service imported from source: bentoml.Service(name="iris_classifier", import_str="service:svc", working_dir="/home/user/gallery/quickstart")
INFO [dev_api_server] Will watch for changes in these directories: ['/home/user/gallery/quickstart']
INFO [dev_api_server] Started server process [25915]
INFO [dev_api_server] Waiting for application startup.
INFO [dev_api_server] Application startup complete. on.py:59
bentoml serve service:svc --reload
About the command In the example above:
service
refers to the python module (theservice.py
file)svc
refers to the object created inservice.py
, withsvc = bentoml.Service(...)
--reload
option watches for local code changes and automatically restart server. This is for development use only.
Tip
This syntax also applies to projects with nested directories. For example, if you
have a ./src/foo/bar/my_service.py
file where a service object is defined
with: my_bento_service = bentoml.Service(...)
, the command will be:
bentoml serve src.foo.bar.my_service:my_bento_service
# Or
bentoml serve ./src/foo/bar/my_service.py:my_bento_service
Send prediction requests with an HTTP client:
import requests
requests.post(
"http://127.0.0.1:3000/classify",
headers={"content-type": "application/json"},
data="[[5.9, 3, 5.1, 1.8]]").text
curl \
-X POST \
-H "content-type: application/json" \
--data "[[5.9, 3, 5.1, 1.8]]" \
http://127.0.0.1:3000/classify
Open http://127.0.0.1:3000 in your browser and send test request from the web UI.
Using Models in a Service#
In this example, bentoml.sklearn.get
creates a reference to the saved model
in the model store, and to_runner
create a Runner instance from the model.
The Runner abstraction gives BentoServer more flexibility in terms of how to schedule
the inference computation, how to dynamically batch inference calls and better take
advantage of all hardware resource available.
You can test out the Runner interface this way:
import bentoml
iris_clf_runner = bentoml.sklearn.get("iris_clf:latest").to_runner()
iris_clf_runner.init_local()
iris_clf_runner.predict.run([[5.9, 3., 5.1, 1.8]])
Note
For custom Runners and advanced runner options, see Using Runners and Adaptive Batching.
Service API and IO Descriptor#
The svc.api
decorator adds a function to the bentoml.Service
objectβs
APIs list. The input
and output
parameter takes an
IO Descriptor object, which specifies the API
functionβs expected input/output types, and is used for generating HTTP endpoints.
In this example, both input
and output
are defined with
bentoml.io.NumpyNdarray, which means
the API function being decorated, takes a numpy.ndarray
as input, and returns a
numpy.ndarray
as output.
Note
More options, such as pandas.DataFrame
, Json
, and PIL.image
are also supported. An IO Descriptor object can also be configured with a schema or
a shape for input/output validation. Learn more about them in
API IO Descriptors.
Inside the API function, user can define any business logic, feature fetching, and
feature transformation code. Model inference calls are made directly through runner
objects, that are passed into bentoml.Service(name=.., runners=[..])
call when
creating the service object.
Tip
BentoML supports both Sync and Async endpoints.
For performance sensitive use cases, especially when working with IO-intense
workloads (e.g. fetching features from multiple sources) or when
composing multiple models, you may consider defining an
Async
API instead.
Hereβs an example of the same endpoint above defined with Async
:
@svc.api(input=NumpyNdarray(), output=NumpyNdarray())
async def classify(input_series: np.ndarray) -> np.ndarray:
result = await iris_clf_runner.predict.async_run(input_series)
return result
Building a Bento π±#
Once the service definition is finalized, we can build the model and service into a
bento
. Bento is the distribution format for a service. It is a self-contained
archive that contains all the source code, model files and dependency specifications
required to run the service.
To build a Bento, first create a bentofile.yaml
file in your project directory:
service: "service:svc" # Same as the argument passed to `bentoml serve`
labels:
owner: bentoml-team
stage: dev
include:
- "*.py" # A pattern for matching which files to include in the bento
python:
packages: # Additional pip packages required by the service
- scikit-learn
- pandas
Tip
BentoML provides lots of build options in bentofile.yaml
for customizing the
Python dependencies, cuda installation, docker image distro, etc. Read more about it
in Building Bentos page.
Next, run the bentoml build
CLI command from the same directory:
> bentoml build
INFO [cli] Building BentoML service "iris_classifier:dpijemevl6nlhlg6" from build context "/home/user/gallery/quickstart"
INFO [cli] Packing model "iris_clf:7drxqvwsu6zq5uqj" from "/home/user/bentoml/models/iris_clf/7drxqvwsu6zq5uqj"
INFO [cli] Locking PyPI package versions..
INFO [cli]
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββ¦ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββ¦ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
INFO [cli] Successfully built Bento(tag="iris_classifier:dpijemevl6nlhlg6") at "~/bentoml/bentos/iris_classifier/dpijemevl6nlhlg6/"
π Youβve just created your first Bento, and it is now ready for serving in production!
For starters, you can now serve it with the bentoml serve
CLI command:
> bentoml serve iris_classifier:latest --production
INFO [cli] Service loaded from Bento store: bentoml.Service(tag="iris_classifier:dpijemevl6nlhlg6", path="~/bentoml/bentos/iris_classifier/dpijemevl6nlhlg6")
INFO [cli] Starting production BentoServer from "service.py:svc" running on http://0.0.0.0:3000 (Press CTRL+C to quit)
INFO [iris_clf] Service loaded from Bento store: bentoml.Service(tag="iris_classifier:dpijemevl6nlhlg6", path="~/bentoml/bentos/iris_classifier/dpijemevl6nlhlg6")
INFO [api_server] Service loaded from Bento store: bentoml.Service(tag="iris_classifier:dpijemevl6nlhlg6", path="~/bentoml/bentos/iris_classifier/dpijemevl6nlhlg6")
INFO [iris_clf] Started server process [28761]
INFO [iris_clf] Waiting for application startup.
INFO [api_server] Started server process [28762]
INFO [api_server] Waiting for application startup.
INFO [api_server] Application startup complete.
INFO [iris_clf] Application startup complete.
Note
Even though the service definition code uses model iris_clf:latest
, the
latest
version can be resolved with local model store to find the exact model
version demo_mnist:7drxqvwsu6zq5uqj
during the bentoml build
process. This model is then bundled into the Bento, which makes sure this Bento is
always using this exact model version, wherever it is deployed.
Bento is the unit of deployment in BentoML, one of the most important artifact to keep track of in your model deployment workflow. BentoML provides CLI commands and APIs for managing Bentos and moving them around, see the Managing Bentos section to learn more.
Generate Docker Image#
A docker image can be automatically generated from a Bento for production deployment,
via the bentoml containerize
CLI command:
> bentoml containerize iris_classifier:latest
INFO [cli] Successfully built docker image "iris_classifier:dpijemevl6nlhlg6"
Note
You will need to install Docker before running this command.
Specify the --platform
to avoid potential compatibility issues with some
Python libraries.
bentoml containerize --platform=linux/amd64 iris_classifier:latest
This creates a docker image that includes the Bento, and has all its dependencies installed. The docker image tag will be same as the Bento tag by default:
> docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
iris_classifier dpijemevl6nlhlg6 78e3d3b51205 10 seconds ago 1.05GB
Run the docker image to start the BentoServer:
docker run -p 3000:3000 iris_classifier:dpijemevl6nlhlg6
Most of the deployment tools built on top of BentoML uses Docker under the hood, it is
recommended to test out serving from a containerized Bento docker image first, before
moving to a production deployment. This helps verify the correctness of all the docker
and dependency configs specified in the bentofile.yaml
.
Deploying Bentos#
BentoML standardizes the saved model format, service API definition and the Bento build process, which opens up many different deployment options for ML teams.
The Bento we built and the docker image created in the previous steps, are designed to be DevOps friendly and ready for deployment in production environment. If your team has existing infrastructure for running docker, itβs likely that the Bento generated docker images can be directly deployed to your infrastructure without any modification.
Note
To streamline the deployment process, BentoServer follows most common best practices found in a backend service: it provides health check and prometheus metrics endpoint for monitoring out-of-the-box; It provides configurable distributed tracing and logging for performance analysis and debugging; And it can be easily integrated with other tools that are commonly used by Data Engineers and DevOps engineers.
For teams looking for an end-to-end solution, with more powerful deployment features specific for ML, the BentoML team has also created Yatai and bentoctl:
Learn more about different deployment options with BentoML from the Deploying Bento page.