Monitoring with Prometheus

Monitoring stacks usually consist of a metrics collector, a time-series database to store metrics and a visualization layer. A popular open-source stack is Prometheus used with Grafana as visualization tool to create rich dashboards.

BentoML API server comes with Prometheus support out of the box. When launching an API model server with BentoML, whether it is running dev server locally or deployed with docker in the cloud, a /metrics endpoint will always be available, which includes the essential metrics for model serving and the ability to create and customize new metrics base on needs. This guide will introduce how to use Prometheus and Grafana to monitor your BentoService.


See also

Prometheus and Grafana docs for more in depth topics.


This guide requires users to have a basic understanding of Prometheus’ concept as well as its metrics type. Please refers to Concepts for more information.


Refers to PromQL basics for Prometheus query language.


Please refers to Prometheus’ best practices for consoles and dashboards as well as histogram and summaries.


Users can also create custom metrics for BentoService, which can be later scraped by Prometheus.

from prometheus_client import Summary

REQUEST_TIME = Summary(name='request_processing_time', documentation='Time spend processing request', namespace='PREFIX')

@artifacts([KerasModelArtifact('model'), PickleArtifact('tokenizer')])
class TensorflowService(BentoService):

    def predict(self, parsed_json):
        raw = self.preprocessing(parsed_json['text'])
        input_data = [raw[: n + 1] for n in range(len(raw))]
        input_data = pad_sequences(input_data, maxlen=100, padding="post")
        return self.artifacts.model.predict(input_data)

Local Deployment

This section will walk you through how to set up the stack locally, with the optional guide on using docker-compose for easy deployment of the stack.

Setting up Prometheus

It is recommended to run Prometheus with Docker. Please make sure that you have Docker installed on your system.

Users can take advantage of having a prometheus.yml for configuration. An example to monitor multiple BentoServices is shown below:

# prometheus.yml

  scrape_interval:     15s
  evaluation_interval: 30s
  # scrape_timeout is set to the global default (10s).

- job_name: prometheus

  honor_labels: true
  - targets:
    - localhost:5000  # metrics from SentimentClassifier service
    - localhost:6000  # metrics from IrisClassifier service


In order to monitor multiple BentoServices, make sure to set up different ports for each BentoService and add correct targets under static_configs as shown above.

We can then run Prometheus with the following:

# Bind-mount your prometheus.yml from the host by running:
» docker run --network=host -v path/to/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus


When deploying, users can set up docker-compose and a shared network in order for prometheus to scrape metrics from your BentoService. Please refers to (Optional) Prometheus - Grafana - docker-compose stack.

Users can check :9090/status to make sure prometheus is running. In order to check if prometheus is scraping our BentoService, :9090/targets should show:


Setting up Grafana

It is also recommended to use Grafana with Docker.

» docker run --network=host grafana/grafana

To log in to Grafana for the first time:

  1. Open your web browser and go to localhost:3000. The default HTTP port that Grafana listens to is :3000 unless you have configured a different port.

  2. On the login page, enter admin for username and password.

  3. Click Log in. If login is successful, you will see a prompt to change the password.

  4. Click OK on the prompt, then change your password.

Users can also import BentoService Dashboard and explore given BentoService metrics by importing dashboard.



Make sure to set up Docker Swarm before proceeding.

(Optional) Prometheus - Grafana - docker-compose stack

Users can freely update prometheus.yml target section to define what should be monitored by Prometheus.

grafana/provisioning provides both datasources and dashboards for us to specify datasources and bootstrap our dashboards quickly courtesy of the introduction of Provisioning in v5.0.0

If you would like to automate the installation of additional dashboards just copy the Dashboard JSON file to grafana/provisioning/dashboards and it will be provisioned next time you stop and start Grafana.

See also

Stack Implementation

├── deployment
├── grafana
│   ├── config.monitoring
│   └── provisioning
│       ├── dashboards
│       └── datasources
├── prometheus
│   ├── alert.rules
│   └── prometheus.yml
├── Makefile
├── docker-compose.yml
└── README.rst

content of docker-compose.yml, a sample dashboard can be seen here:

version: '3.7'




    image: prom/prometheus
      - ./prometheus/:/etc/prometheus/
      - prometheus_data:/prometheus
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/usr/share/prometheus/console_libraries'
      - '--web.console.templates=/usr/share/prometheus/consoles'
      - 9090:9090
      - shared-network
          - node.role==manager
    restart: on-failure

    image: grafana/grafana
      - prometheus
      - 3000:3000
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning/:/etc/grafana/provisioning/
      - ./grafana/config.monitoring
      - shared-network
    user: "472"
      mode: global
    restart: on-failure

    image: bentoml/fashion-mnist-classifier:latest
      - "5000:5000"
      - shared-network
      mode: global
    restart: on-failure

See also

Alertmanager and cAdvisor to set up alerts as well as monitor container resources.

See also

prom/node-exporter for expose machine metrics.

Deploy on Kubernetes


minikube and kubectl are required for this part of the tutorial. Users may also choose to install virtualbox in order to run minikube.

See also

Deploying to Kubernetes Cluster on how to deploy BentoService to Kubernetes.

Deploy Prometheus on K8s

Setting Prometheus stack on Kubernetes could be an arduous task. However, we can take advantage of Helm package manager and make use of prometheus-operator through kube-prometheus:

  • The Operator uses standard configurations and dashboards for Prometheus and Grafana.

  • The Helm prometheus-operator chart allows you to get a full cluster monitoring solution up and running by installing aforementioned components.

See also



Your local minikube cluster will be delete in order to set up kube-prometheus-stack correctly.

Set up virtualbox to be default driver for minikube:

» minikube config set driver virtualbox

Spin up our local K8s cluster:

# prometheus-operator/kube-prometheus
» minikube delete && minikube start \
    --kubernetes-version=v1.20.0 \
    --memory=6g --bootstrapper=kubeadm \
    --extra-config=kubelet.authentication-token-webhook=true \
    --extra-config=kubelet.authorization-mode=Webhook \
    --extra-config=scheduler.address= \


We allocate 6Gb of memory via --memory for this K8s cluster. Change the value to fit your use-case.

Then get helm repo:

» helm repo add prometheus-community
» helm repo update

Search for available prometheus chart:

» helm search repo kube-prometheus

Once located the version, inspect the chart to modify the settings:

» helm inspect values prometheus-community/kube-prometheus-stack \
    > ./configs/deployment/kube-prometheus-stack.values

Next, we need to change Prometheus server service type in order to access it from the browser, by changing our service type from ClusterIP to NodePort. This enables Prometheus server to be accessible at your machine :30090

## Configuration for Prometheus service
  annotations: {}
  labels: {}
  clusterIP: ""

  ## Port for Prometheus Service to listen on
  port: 9090

  ## To be used with a proxy extraContainer port
  targetPort: 9090

  ## List of IP addresses at which the Prometheus server service is available
  ## Ref:
  externalIPs: []

  ## Port to expose on each node
  ## Only used if service.type is 'NodePort'
  nodePort: 30090

  ## LoadBalancer IP
  ## Only use if service.type is "LoadBalancer"
  loadBalancerIP: ""
  loadBalancerSourceRanges: []
  ## Service type
  type: NodePort # changed this line from ClusterIP to NodePort

By default, Prometheus discovers PodMonitors and ServiceMonitors within its namespace, with same release tags labeled as prometheus-operator release. Since we want to Prometheus to discover our BentoService (refers to Setting up your BentoService), we need to create a custom PodMonitors/ServiceMonitors to scrape metrics from our services. Thus, one way to do this is to allow Prometheus to discover all PodMonitors/ServiceMonitors within its name, without applying label filtering. Set the following options:

- prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues: false
- prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues: false

Finally deploy Prometheus and Grafana pods using kube-prometheus-stack via Helm:

» helm install prometheus-community/kube-prometheus-stack \
    --create-namespace --namespace bentoml \
    --generate-name --values ./configs/deployment/kube-prometheus-stack.values
NAME: kube-prometheus-stack-1623502925
LAST DEPLOYED: Sat Jun 12 20:02:09 2021
NAMESPACE: bentoml
STATUS: deployed
kube-prometheus-stack has been installed. Check its status by running:
  kubectl --namespace bentoml get pods -l "release=kube-prometheus-stack-1623502925"

Visit for instructions on how to create & configure Alertmanager and Prometheus instances using the Operator.


You can also provides the values in chart directly with helm --set, e.g:

  • --set prometheus.service.type=NodePort

  • --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false

  • --set prometheus.prometheusSpec.podMonitorSelectorNilUsesHelmValues=false

Check for Prometheus and Grafana pods:

» kubectl get pods -A
NAMESPACE     NAME                                                              READY   STATUS    RESTARTS   AGE
bentoml       kube-prometheus-stack-1623-operator-5555798f4f-nghl8              1/1     Running   0          4m22s
bentoml       kube-prometheus-stack-1623502925-grafana-57cdffccdc-n7lpk         2/2     Running   0          4m22s
bentoml       prometheus-kube-prometheus-stack-1623-prometheus-0                2/2     Running   1          4m5s

Check for service startup as part of the operator:

» kubectl get svc -A
NAMESPACE     NAME                                                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                        AGE
bentoml       alertmanager-operated                                       ClusterIP   None             <none>        9093/TCP,9094/TCP,9094/UDP     5m8s
bentoml       kube-prometheus-stack-1623-operator                         ClusterIP      <none>        443/TCP                        5m25s
bentoml       kube-prometheus-stack-1623-prometheus                       NodePort    <none>        9090:30090/TCP                 5m26s
bentoml       kube-prometheus-stack-1623502925-grafana                    ClusterIP    <none>        80/TCP                         5m25s
bentoml       prometheus-operated                                         ClusterIP   None             <none>        9090/TCP                       5m8s

As we can observe that the Prometheus server is available at :30090. Thus, open browser at http://<machine-ip-addr>:30090. By default the Operator enables users to monitor our Kubernetes cluster.

Using Grafana

Users can also launch the Grafana tools for visualization.

There are two ways to deal with exposing Grafana ports, either is recommended based on preference:

Patching Grafana Service

By default, Every services in the Operator uses ClusterIP to expose the ports where the service is accessible, including Grafana. This can be changed to a NodePort instead, so the page is accessible from the browser, similar to what we did earlier with Prometheus dashboard.

We can take advantage of kubectl patch to update the service API to expose a NodePort instead.

Modify the spec to change service type:

» cat << EOF | tee ./configs/deployment/grafana-patch.yaml
  type: NodePort
  nodePort: 36745

Use kubectl patch:

# This is how we get grafana pod name

» _GRAFANA_SVC=$(kubectl get svc -n bentoml | grep grafana | cut -d " " -f1)
» kubectl patch svc "${_GRAFANA_SVC}" -n bentoml --patch "$(cat configs/deployment/grafana-patch.yaml)"

service/kube-prometheus-stack-1623502925-grafana patched

Verify that the service is now exposed at an external accessible port:

» kubectl get svc -A
NAMESPACE    NAME                                          TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
bentoml      kube-prometheus-stack-1623-prometheus         NodePort    <none>        9090:30090/TCP   35m
bentoml      kube-prometheus-stack-1623502925-grafana      NodePort    <none>        80:32447/TCP     35m

Open your browser at http:<machine-ip-addr>:32447, credentials:

  • login: admin

  • password: prom-operator.

Port Forwarding

Another method is to access Grafana with port-forwarding.

Notice that Grafana is accessible at port :80. We will choose an arbitrary port :36745 on our local machine to port :80 on the service (-> :3000 where Grafana is listening at)

» kubectl port-forward svc/kube-prometheus-stack-1623502925-grafana -n bentoml 36745:80

Forwarding from -> 3000
Forwarding from [::1]:36745 -> 3000
Handling connection for 36745


If your cluster is set up on a cloud instance, e.g. AWS EC2, you might have to set up SSH tunnel between your local workstation and the instance using port forwarding to view Grafana tool in your own browser.

Point to http://localhost:36745/ to see Grafana login page using the same credentials.

Setting up your BentoService

An example BentoService with custom ServiceMonitor on Kubernetes:

### BentoService
apiVersion: v1
kind: Service
    app: bentoml-service
  name: bentoml-service
  namespace: bentoml
  externalTrafficPolicy: Cluster
    - name: predict
      nodePort: 32610
      port: 5000
      protocol: TCP
      targetPort: 5000
    app: bentoml-service
  sessionAffinity: None
  type: NodePort

### BentoService ServiceMonitor
kind: ServiceMonitor
  name: bentoml-service
  namespace: bentoml
      app: bentoml-service
  - port: predict


Make sure that you also include a custom ServiceMonitor definition for your BentoService. For information on how to use ServiceMonitor CRD, please see the documentation.

Apply the changes to enable monitoring:

kubectl apply -f configs/deployment/bentoml-deployment.yml --namespace=bentoml


After logging into Grafana, imports the provided kubernetes dashboards under configs/grafana/provisioning/dashboards.

The final results: Deployed BentoML-Prometheus-Grafana Stack on Kubernetes

» minikube service list
|  NAMESPACE  |                           NAME                            | TARGET PORT  |             URL             |
| bentoml     | bentoml-service                                           | predict/5000 | |
| bentoml     | kube-prometheus-stack-1623-prometheus                     | web/9090     | |
| bentoml     | kube-prometheus-stack-1623502925-grafana                  | service/80   | |
../_images/k8s-bentoml.png ../_images/k8s-grafana.png ../_images/k8s-prometheus.png


You might have to wait a few minutes for everything to spin up. In the meantime, an example dashboard on Kubernetes. You can check your namespace pods health with minikube dashboard:



Mounting PersistentVolume for Prometheus and Grafana on K8s is currently working in progress.

(Optional) Exposing GPU Metrics on Kubernetes


This part is currently working in progress. If you have any questions related to this, please join the BentoML Slack community and ask in the bentoml-users channel.