Deploying to KNative

Knative is kubernetes based platform to deploy and manage serverless workloads. It is a solution for deploying ML workload that requires more computing power that abstracts away infrastructure management and without worry about vendor lock.

This guide demonstrates how to serve a scikit-learn based iris classifier model with BentoML on a KNative cluster. The same deployment steps are also applicable for models trained with other machine learning frameworks, see more BentoML examples here.


Knative deployment with BentoML

Run the example project from the quick start guide to create the BentoML saved bundle for deployment:

git clone
pip install -r ./bentoml/guides/quick-start/requirements.txt
python ./bentoml/guides/quick-start/

Verify the saved bundle created:

$ bentoml get IrisClassifier:20200121141808_FE78B5

# Sample output

  "name": "IrisClassifier",
  "version": "20200121141808_FE78B5",
  "uri": {
    "type": "LOCAL",
    "uri": "/Users/bozhaoyu/bentoml/repository/IrisClassifier/20200121141808_FE78B5"
  "bentoServiceMetadata": {
    "name": "IrisClassifier",
    "version": "20200121141808_FE78B5",
    "createdAt": "2020-01-21T22:18:25.079723Z",
    "env": {
      "condaEnv": "name: bentoml-IrisClassifier\nchannels:\n- defaults\ndependencies:\n- python=3.7.3\n- pip\n",
      "pipDependencies": "bentoml==0.5.8\nscikit-learn",
      "pythonVersion": "3.7.3"
    "artifacts": [
        "name": "model",
        "artifactType": "SklearnModelArtifact"
    "apis": [
        "name": "predict",
        "InputType": "DataframeInput",
        "docs": "BentoService API"

The BentoML saved bundle created can now be used to start a REST API Server hosting the BentoService and available for sending test request:

# Start BentoML API server:
bentoml serve IrisClassifier:latest
# Send test request:
curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \

Deploy BentoML model server to KNative

BentoML provides a convenient way to containerize the model API server with Docker:

  1. Find the SavedBundle directory with bentoml get command

  2. Run docker build with the SavedBundle directory which contains a generated Dockerfile

  3. Run the generated docker image to start a docker container serving the model

# Find the local path of the latest version IrisClassifier saved bundle
saved_path=$(bentoml get IrisClassifier:latest --print-location --quiet)

# Replace {docker_username} with your Docker Hub username
docker build -t {docker_username}/iris-classifier $saved_path
docker push {docker_username}/iris-classifier

Make sure Knative serving components are running.

$ kubectl get pods --namespace knative-serving

# Sample output

NAME                                READY   STATUS    RESTARTS   AGE
activator-845b77cbb5-thpcw          2/2     Running   0          4h33m
autoscaler-7fc56894f5-f2vqc         2/2     Running   0          4h33m
controller-7ffb84fd9c-699pt         2/2     Running   2          4h33m
networking-istio-7fc7f66675-xgfvd   1/1     Running   0          4h32m
webhook-8597865965-9vp25            2/2     Running   1          4h33m

Copy the following service definition into service.yaml and replace {docker_username} with your docker hub username. The Knative service is directing livenessProbe and readyinessProbe to the /healthz endpoint on BentoService.

kind: Service
  name: iris-classifier
  namespace: bentoml
        - image:{docker_username}/iris-classifier
          - containerPort: 5000
              path: /healthz
            initialDelaySeconds: 3
            periodSeconds: 5
              path: /healthz
            initialDelaySeconds: 3
            periodSeconds: 5
            failureThreshold: 3
            timeoutSeconds: 60

Create bentoml namespace and then deploy BentoService to Knative with kubectl apply command.

$ kubectl create namespace bentoml
$ kubectl apply -f service.yaml

# Sample output created

View the status of the deployment with kubectl get ksvc command:

$ kubectl get ksvc --all-namespaces

# Sample output

NAMESPACE   NAME              URL                                          LATESTCREATED           LATESTREADY             READY   REASON
bentoml     iris-classifier   iris-classifier-7k2dv   iris-classifier-7k2dv   True

Validate prediction server with sample data

Find the cluster IP address and exposed port of the deployed Knative service, in the context of minikube:

$ minikube ip

# Sample output

$ kubectl -n istio-system get service istio-ingressgateway -o jsonpath='{.spec.ports[?("http2")].nodePort}

# Sample output


With the IP address and port, Use curl to make an HTTP request to the deployment in Knative:

$ curl -v -i \
    --header "Content-Type: application/json" \
    --header "Host:" \
    --request POST \
    --data '[[5.1, 3.5, 1.4, 0.2]]' \

# Sample output

Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying
* Connected to ( port 31871 (#0)
> POST /predict HTTP/1.1
> Host:
> User-Agent: curl/7.58.0
> Accept: */*
> Content-Type: application/json
> Content-Length: 22
* upload completely sent off: 22 out of 22 bytes
< HTTP/1.1 200 OK
HTTP/1.1 200 OK
< content-length: 3
content-length: 3
< content-type: application/json
content-type: application/json
< date: Wed, 01 Apr 2020 01:24:58 GMT
date: Wed, 01 Apr 2020 01:24:58 GMT
< request_id: 0506467b-75d9-4fb5-9d7e-2d2855fc6028
request_id: 0506467b-75d9-4fb5-9d7e-2d2855fc6028
< server: istio-envoy
server: istio-envoy
< x-envoy-upstream-service-time: 12
x-envoy-upstream-service-time: 12

* Connection #0 to host left intact

Clean up deployment

kubectl delete namespace bentoml