Monitoring with Prometheus

Monitoring stacks usually consist of a metrics collector, a time-series database to store metrics and a visualization layer. A popular open-source stack is Prometheus used with Grafana as visualization tool to create rich dashboards.

BentoML API server comes with Prometheus support out of the box. When launching an API model server with BentoML, whether it is running dev server locally or deployed with docker in the cloud, a /metrics endpoint will always be available, which includes the essential metrics for model serving and the ability to create and customize new metrics base on needs. This guide will introduce how to use Prometheus and Grafana to monitor your BentoService.

Preface

See also

Prometheus and Grafana docs for more in depth topics.

Note

This guide requires users to have a basic understanding of Prometheus’ concept as well as its metrics type. Please refers to Concepts for more information.

Note

Refers to PromQL basics for Prometheus query language.

Note

Please refers to Prometheus’ best practices for consoles and dashboards as well as histogram and summaries.

Note

Users can also create custom metrics for BentoService making use of |prom_client|_, which can be later scraped by Prometheus.

from bentoml.configuratoin.containers import BentoMLContainer

metrics_client = BentoMLContainer.metircs_client.get()

REQUEST_TIME = metrics_clint.Summary('request_processing_time', 'Time spend processing request')

@artifacts([KerasModelArtifact('model'), PickleModel('tokenizer')])
class TensorflowService(BentoService):

    @REQUEST_TIME.time()
    @api(input=JsonInput())
    def predict(self, parsed_json):
        raw = self.preprocessing(parsed_json['text'])
        input_data = [raw[: n + 1] for n in range(len(raw))]
        input_data = pad_sequences(input_data, maxlen=100, padding="post")
        return self.artifacts.model.predict(input_data)

Local Deployment

This section will walk you through how to set up the stack locally, with the optional guide on using docker-compose for easy deployment of the stack.

Setting up Prometheus

It is recommended to run Prometheus with Docker. Please make sure that you have Docker installed on your system.

Users can take advantage of having a prometheus.yml for configuration. An example to monitor multiple BentoServices is shown below:

# prometheus.yml

global:
  scrape_interval:     15s
  evaluation_interval: 30s
  # scrape_timeout is set to the global default (10s).

scrape_configs:
- job_name: prometheus

  honor_labels: true
  static_configs:
  - targets:
    - localhost:5000  # metrics from SentimentClassifier service
    - localhost:6000  # metrics from IrisClassifier service

Note

In order to monitor multiple BentoServices, make sure to set up different ports for each BentoService and add correct targets under static_configs as shown above.

We can then run Prometheus with the following:

# Bind-mount your prometheus.yml from the host by running:
» docker run --network=host -v path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus

Note

When deploying, users can set up docker-compose and a shared network in order for prometheus to scrape metrics from your BentoService. Please refers to (Optional) Prometheus - Grafana - docker-compose stack.

Users can check :9090/status to make sure prometheus is running. In order to check if prometheus is scraping our BentoService, :9090/targets should show:

../_images/prom-targets-running.png

Setting up Grafana

It is also recommended to use Grafana with Docker.

» docker run --network=host grafana/grafana

To log in to Grafana for the first time:

  1. Open your web browser and go to localhost:3000. The default HTTP port that Grafana listens to is :3000 unless you have configured a different port.

  2. On the login page, enter admin for username and password.

  3. Click Log in. If login is successful, you will see a prompt to change the password.

  4. Click OK on the prompt, then change your password.

Users can also import BentoService Dashboard and explore given BentoService metrics by importing dashboard.

../_images/bentoml-grafana-dashboard.png

Warning

Make sure to set up Docker Swarm before proceeding.

(Optional) Prometheus - Grafana - docker-compose stack

Users can freely update prometheus.yml target section to define what should be monitored by Prometheus.

grafana/provisioning provides both datasources and dashboards for us to specify datasources and bootstrap our dashboards quickly courtesy of the introduction of Provisioning in v5.0.0

If you would like to automate the installation of additional dashboards just copy the Dashboard JSON file to grafana/provisioning/dashboards and it will be provisioned next time you stop and start Grafana.

See also

Stack Implementation

.
├── deployment
├── grafana
│   ├── config.monitoring
│   └── provisioning
│       ├── dashboards
│       └── datasources
├── prometheus
│   ├── alert.rules
│   └── prometheus.yml
├── Makefile
├── docker-compose.yml
└── README.rst

content of docker-compose.yml, a sample dashboard can be seen here:

version: '3.7'

volumes:
  prometheus_data:
  grafana_data:

networks:
  shared-network:

services:

  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus/:/etc/prometheus/
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
    ports:
      - 9090:9090
    networks:
      - shared-network
    deploy:
      placement:
        constraints:
          - node.role==manager
    restart: on-failure

  grafana:
    image: grafana/grafana
    depends_on:
      - prometheus
    ports:
      - 3000:3000
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning/:/etc/grafana/provisioning/
    env_file:
      - ./grafana/config.monitoring
    networks:
      - shared-network
    user: "472"
    deploy:
      mode: global
    restart: on-failure

  bentoml:
    image: bentoml/fashion-mnist-classifier:latest
    ports:
      - "5000:5000"
    networks:
      - shared-network
    deploy:
      mode: global
    restart: on-failure

See also

Alertmanager and cAdvisor to set up alerts as well as monitor container resources.

See also

prom/node-exporter for expose machine metrics.


Deploy on Kubernetes

Note

minikube and kubectl are required for this part of the tutorial. Users may also choose to install virtualbox in order to run minikube.

See also

Deploying to Kubernetes Cluster on how to deploy BentoService to Kubernetes.

Deploy Prometheus on K8s

Setting Prometheus stack on Kubernetes could be an arduous task. However, we can take advantage of Helm package manager and make use of prometheus-operator through kube-prometheus:

  • The Operator uses standard configurations and dashboards for Prometheus and Grafana.

  • The Helm prometheus-operator chart allows you to get a full cluster monitoring solution up and running by installing aforementioned components.

See also

kube-prometheus

Warning

Your local minikube cluster will be delete in order to set up kube-prometheus-stack correctly.

Set up virtualbox to be default driver for minikube:

» minikube config set driver virtualbox

Spin up our local K8s cluster:

# prometheus-operator/kube-prometheus
» minikube delete && minikube start \
    --kubernetes-version=v1.20.0 \
    --memory=6g --bootstrapper=kubeadm \
    --extra-config=kubelet.authentication-token-webhook=true \
    --extra-config=kubelet.authorization-mode=Webhook \
    --extra-config=scheduler.address=0.0.0.0 \
    --extra-config=controller-manager.address=0.0.0.0

Note

We allocate 6Gb of memory via --memory for this K8s cluster. Change the value to fit your use-case.

Then get helm repo:

» helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
» helm repo update

Search for available prometheus chart:

» helm search repo kube-prometheus

Once located the version, inspect the chart to modify the settings:

» helm inspect values prometheus-community/kube-prometheus-stack \
    > ./configs/deployment/kube-prometheus-stack.values

Next, we need to change Prometheus server service type in order to access it from the browser, by changing our service type from ClusterIP to NodePort. This enables Prometheus server to be accessible at your machine :30090

## Configuration for Prometheus service
##
service:
  annotations: {}
  labels: {}
  clusterIP: ""

  ## Port for Prometheus Service to listen on
  ##
  port: 9090

  ## To be used with a proxy extraContainer port
  targetPort: 9090

  ## List of IP addresses at which the Prometheus server service is available
  ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
  ##
  externalIPs: []

  ## Port to expose on each node
  ## Only used if service.type is 'NodePort'
  ##
  nodePort: 30090

  ## LoadBalancer IP
  ## Only use if service.type is "LoadBalancer"
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  ## Service type
  ##
  type: NodePort # changed this line from ClusterIP to NodePort

By default, Prometheus discovers PodMonitors and ServiceMonitors within its namespace, with same release tags labeled as prometheus-operator release. Since we want to Prometheus to discover our BentoService (refers to Setting up your BentoService), we need to create a custom PodMonitors/ServiceMonitors to scrape metrics from our services. Thus, one way to do this is to allow Prometheus to discover all PodMonitors/ServiceMonitors within its name, without applying label filtering. Set the following options:

- prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues: false
- prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues: false

Finally deploy Prometheus and Grafana pods using kube-prometheus-stack via Helm:

» helm install prometheus-community/kube-prometheus-stack \
    --create-namespace --namespace bentoml \
    --generate-name --values ./configs/deployment/kube-prometheus-stack.values
NAME: kube-prometheus-stack-1623502925
LAST DEPLOYED: Sat Jun 12 20:02:09 2021
NAMESPACE: bentoml
STATUS: deployed
REVISION: 1
NOTES:
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace bentoml get pods -l "release=kube-prometheus-stack-1623502925"

Visit https://github.com/prometheus-operator/kube-prometheus for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.

Note

You can also provides the values in chart directly with helm --set, e.g:

  • --set prometheus.service.type=NodePort

  • --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

  • --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false

Check for Prometheus and Grafana pods:

» kubectl get pods -A
NAMESPACE     NAME                                                              READY   STATUS    RESTARTS   AGE
bentoml       kube-prometheus-stack-1623-operator-5555798f4f-nghl8              1/1     Running   0          4m22s
bentoml       kube-prometheus-stack-1623502925-grafana-57cdffccdc-n7lpk         2/2     Running   0          4m22s
bentoml       prometheus-kube-prometheus-stack-1623-prometheus-0                2/2     Running   1          4m5s

Check for service startup as part of the operator:

» kubectl get svc -A
NAMESPACE     NAME                                                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                        AGE
bentoml       alertmanager-operated                                       ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP     5m8s
bentoml       kube-prometheus-stack-1623-operator                         ClusterIP   10.106.5.23      <none>        443/TCP                        5m25s
bentoml       kube-prometheus-stack-1623-prometheus                       NodePort    10.96.241.205    <none>        9090:30090/TCP                 5m26s
bentoml       kube-prometheus-stack-1623502925-grafana                    ClusterIP   10.111.205.42    <none>        80/TCP                         5m25s
bentoml       prometheus-operated                                         ClusterIP   None             <none>        9090/TCP                       5m8s

As we can observe that the Prometheus server is available at :30090. Thus, open browser at http://<machine-ip-addr>:30090. By default the Operator enables users to monitor our Kubernetes cluster.

Using Grafana

Users can also launch the Grafana tools for visualization.

There are two ways to deal with exposing Grafana ports, either is recommended based on preference:

Patching Grafana Service

By default, Every services in the Operator uses ClusterIP to expose the ports where the service is accessible, including Grafana. This can be changed to a NodePort instead, so the page is accessible from the browser, similar to what we did earlier with Prometheus dashboard.

We can take advantage of kubectl patch to update the service API to expose a NodePort instead.

Modify the spec to change service type:

» cat << EOF | tee ./configs/deployment/grafana-patch.yaml
spec:
  type: NodePort
  nodePort: 36745
EOF

Use kubectl patch:

# This is how we get grafana pod name

» _GRAFANA_SVC=$(kubectl get svc -n bentoml | grep grafana | cut -d " " -f1)
» kubectl patch svc "${_GRAFANA_SVC}" -n bentoml --patch "$(cat configs/deployment/grafana-patch.yaml)"

service/kube-prometheus-stack-1623502925-grafana patched

Verify that the service is now exposed at an external accessible port:

» kubectl get svc -A
NAMESPACE    NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
bentoml      kube-prometheus-stack-1623-prometheus         NodePort    10.96.241.205    <none>        9090:30090/TCP   35m
bentoml      kube-prometheus-stack-1623502925-grafana      NodePort    10.111.205.42    <none>        80:32447/TCP     35m

Open your browser at http:<machine-ip-addr>:32447, credentials:

  • login: admin

  • password: prom-operator.

Port Forwarding

Another method is to access Grafana with port-forwarding.

Notice that Grafana is accessible at port :80. We will choose an arbitrary port :36745 on our local machine to port :80 on the service (-> :3000 where Grafana is listening at)

» kubectl port-forward svc/kube-prometheus-stack-1623502925-grafana -n bentoml 36745:80

Forwarding from 127.0.0.1:36745 -> 3000
Forwarding from [::1]:36745 -> 3000
Handling connection for 36745

Note

If your cluster is set up on a cloud instance, e.g. AWS EC2, you might have to set up SSH tunnel between your local workstation and the instance using port forwarding to view Grafana tool in your own browser.

Point to http://localhost:36745/ to see Grafana login page using the same credentials.

Setting up your BentoService

An example BentoService with custom ServiceMonitor on Kubernetes:

---
### BentoService
apiVersion: v1
kind: Service
metadata:
  labels:
    app: bentoml-service
  name: bentoml-service
  namespace: bentoml
spec:
  externalTrafficPolicy: Cluster
  ports:
    - name: predict
      nodePort: 32610
      port: 5000
      protocol: TCP
      targetPort: 5000
  selector:
    app: bentoml-service
  sessionAffinity: None
  type: NodePort

---
### BentoService ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: bentoml-service
  namespace: bentoml
spec:
  selector:
    matchLabels:
      app: bentoml-service
  endpoints:
  - port: predict

Note

Make sure that you also include a custom ServiceMonitor definition for your BentoService. For information on how to use ServiceMonitor CRD, please see the documentation.

Apply the changes to enable monitoring:

kubectl apply -f configs/deployment/bentoml-deployment.yml --namespace=bentoml

Note

After logging into Grafana, imports the provided kubernetes dashboards under configs/grafana/provisioning/dashboards.

The final results: Deployed BentoML-Prometheus-Grafana Stack on Kubernetes

» minikube service list
|-------------|-----------------------------------------------------------|--------------|-----------------------------|
|  NAMESPACE  |                           NAME                            | TARGET PORT  |             URL             |
|-------------|-----------------------------------------------------------|--------------|-----------------------------|
| bentoml     | bentoml-service                                           | predict/5000 | http://192.168.99.103:32610 |
| bentoml     | kube-prometheus-stack-1623-prometheus                     | web/9090     | http://192.168.99.102:30090 |
| bentoml     | kube-prometheus-stack-1623502925-grafana                  | service/80   | http://192.168.99.102:32447 |
|-------------|-----------------------------------------------------------|--------------|-----------------------------|
../_images/k8s-bentoml.png ../_images/k8s-grafana.png ../_images/k8s-prometheus.png

Note

You might have to wait a few minutes for everything to spin up. In the meantime, an example dashboard on Kubernetes. You can check your namespace pods health with minikube dashboard:

../_images/k8s-minikube.png

Note

Mounting PersistentVolume for Prometheus and Grafana on K8s is currently working in progress.

(Optional) Exposing GPU Metrics on Kubernetes

Note

This part is currently working in progress. If you have any questions related to this, please join the BentoML Slack community and ask in the bentoml-users channel.