Deploying to Kubernetes Cluster

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It is the de-facto solution for deploying applications today. Machine learning services also can take advantage of Kubernetes’ ability to quickly deploy and scale base on demand.

This guide demonstrates how to serve a scikit-learn based iris classifier model with BentoML on a Kubernetes cluster. The same deployment steps are also applicable for models trained with other machine learning frameworks, see more BentoML examples here.

Prerequisites

Before starting this guide, make sure you have the following:

Kubernetes deployment with BentoML

Run the example project from the quick start guide to create the BentoML saved bundle for deployment:

git clone git@github.com:bentoml/BentoML.git
pip install -r ./bentoml/guides/quick-start/requirements.txt
python ./bentoml/guides/quick-start/main.py

Verify the saved bundle created:

$ bentoml get IrisClassifier:latest

# Sample output
{
  "name": "IrisClassifier",
  "version": "20200121141808_FE78B5",
  "uri": {
    "type": "LOCAL",
    "uri": "/Users/bozhaoyu/bentoml/repository/IrisClassifier/20200121141808_FE78B5"
  },
  "bentoServiceMetadata": {
    "name": "IrisClassifier",
    "version": "20200121141808_FE78B5",
    "createdAt": "2020-01-21T22:18:25.079723Z",
    "env": {
      "condaEnv": "name: bentoml-IrisClassifier\nchannels:\n- defaults\ndependencies:\n- python=3.7.3\n- pip\n",
      "pipDependencies": "bentoml==0.5.8\nscikit-learn",
      "pythonVersion": "3.7.3"
    },
    "artifacts": [
      {
        "name": "model",
        "artifactType": "SklearnModelArtifact"
      }
    ],
    "apis": [
      {
        "name": "predict",
        "InputType": "DataframeInput",
        "docs": "BentoService API"
      }
    ]
  }
}

The BentoML saved bundle created can now be used to start a REST API Server hosting the BentoService and available for sending test request:

# Start BentoML API server:
bentoml serve IrisClassifier:latest
# Send test request:
curl -i \
  --header "Content-Type: application/json" \
  --request POST \
  --data '[[5.1, 3.5, 1.4, 0.2]]' \
  http://localhost:5000/predict

Deploy BentoService to Kubernetes

BentoML provides a convenient way to containerize the model API server with Docker:

  1. Find the SavedBundle directory with bentoml get command

  2. Run docker build with the SavedBundle directory which contains a generated Dockerfile

  3. Run the generated docker image to start a docker container serving the model

# Find the local path of the latest version IrisClassifier saved bundle
saved_path=$(bentoml get IrisClassifier:latest --print-location --quiet)


# Replace {docker_username} with your Docker Hub username
docker build -t {docker_username}/iris-classifier $saved_path
docker push {docker_username}/iris-classifier

The following is an example YAML file for specifying the resources required to run and expose a BentoML model server in a Kubernetes cluster. Replace {docker_username} with your Docker Hub username and save it to iris-classifier.yaml

#iris-classifier.yaml

apiVersion: v1
kind: Service
metadata:
    labels:
        app: iris-classifier
    name: iris-classifier
spec:
    ports:
    - name: predict
      port: 5000
      targetPort: 5000
    selector:
      app: iris-classifier
    type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
    labels:
        app: iris-classifier
    name: iris-classifier
spec:
    selector:
        matchLabels:
            app: iris-classifier
    template:
        metadata:
            labels:
                app: iris-classifier
        spec:
            containers:
            - image: {docker_username}/iris-classifier
              imagePullPolicy: IfNotPresent
              name: iris-classifier
              ports:
              - containerPort: 5000

Use kubectl CLI to deploy model server to Kubernetes cluster.

kubectl apply -f iris-classifier.yaml

Make prediction with curl:

curl -i \
--request POST \
--header "Content-Type: application/json" \
--data '[[5.1, 3.5, 1.4, 0.2]]' \
${minikube ip}:5000/predict

Monitor model server metrics with Prometheus

Remove deployment

kubectl delete -f iris-classifier.yaml