Collecting Metrics#

Yatai supports the use of Prometheus to collect metrics for BentoDeployment

Note

This documentation is just for BentoDeployment metrics, not for Yatai itself.

Prerequisites#

  • yatai-deployment

Because the metrics collected are related to BentotDeployment, it relies on yatai-deployment

  • Kubernetes

    Kubernetes cluster with version 1.20 or newer

    Note

    If you do not have a production Kubernetes cluster and want to install yatai for development and testing purposes. You can use minikube to set up a local Kubernetes cluster for testing.

  • Dynamic Volume Provisioning

    As Prometheus requires metrics storage, you need to enable dynamic volume provisioning in your Kubernetes cluster. For more detailed information, please refer to Dynamic Volume Provisioning.

  • Helm

    We use Helm to install Prometheus Stack.

Quick setup#

Note

This quick setup script can only be used for development and testing purposes

This script will automatically install the following dependencies inside the yatai-monitoring namespace of the Kubernetes cluster:

  • Prometheus Operator

  • Prometheus

  • Grafana

  • Alertmanager

bash <(curl -s "https://raw.githubusercontent.com/bentoml/yatai/main/scripts/quick-setup-yatai-monitoring.sh")

Setup steps#

1. Install Prometheus Stack#

1. Create a namespace for Prometheus Stack#

kubectl create ns yatai-monitoring

2. Install prometheus-operator#

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update prometheus-community

cat <<EOF | helm install prometheus prometheus-community/kube-prometheus-stack -n yatai-monitoring -f -
grafana:
  enabled: false
  forceDeployDatasources: true
  forceDeployDashboards: true
EOF

3. Verify that Prometheus is running#

kubectl -n yatai-monitoring get pod -l release=prometheus

The output of the command above should look something like this:

NAME                                                   READY   STATUS    RESTARTS   AGE
prometheus-kube-prometheus-operator-6f5c99cd68-6kshn   1/1     Running   0          21h
prometheus-kube-state-metrics-668449846c-tm2nb         1/1     Running   0          21h
prometheus-prometheus-node-exporter-ljlxk              1/1     Running   0          20h
prometheus-prometheus-node-exporter-fnxs2              1/1     Running   0          20h
prometheus-prometheus-node-exporter-gqq8c              1/1     Running   0          20h

4. Verify that the CRDs of prometheus-operator has been established#

kubectl wait --for condition=established --timeout=120s crd/prometheuses.monitoring.coreos.com
kubectl wait --for condition=established --timeout=120s crd/servicemonitors.monitoring.coreos.com

The output of the command above should look something like this:

customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com condition met
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com condition met

5. Verify that the Prometheus service is running#

kubectl -n yatai-monitoring get pod -l app.kubernetes.io/instance=prometheus-kube-prometheus-prometheus

The output of the command above should look something like this:

NAME                                                 READY   STATUS    RESTARTS   AGE
prometheus-prometheus-kube-prometheus-prometheus-0   2/2     Running   0          15m

6. Verify that the Alertmanager service is running#

kubectl -n yatai-monitoring get pod -l app.kubernetes.io/instance=prometheus-kube-prometheus-alertmanager

The output of the command above should look something like this:

NAME                                                     READY   STATUS    RESTARTS   AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running   0          18m

7. Install Grafana#

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update grafana

cat <<EOF | helm install grafana grafana/grafana -n yatai-monitoring -f -
adminUser: admin
adminPassword: $(LC_ALL=C tr -dc 'A-Za-z0-9' < /dev/urandom | head -c 20)
persistence:
  enabled: true
sidecar:
  dashboards:
    enabled: true
  datasources:
    enabled: true
  notifiers:
    enabled: true
EOF

8. Verify that the Grafana service is running#

kubectl -n yatai-monitoring get pod -l app.kubernetes.io/name=grafana

The output of the command above should look something like this:

NAME                       READY   STATUS    RESTARTS   AGE
grafana-796c6947b7-r7gr4   3/3     Running   0          3m40s

9. Visit the Prometheus web UI#

You can create an ingress for prometheus-kube-prometheus-prometheus service or port-forward the service to :9090:

kubectl -n yatai-monitoring port-forward svc/prometheus-kube-prometheus-prometheus 9090:9090 --address 0.0.0.0

Then visit the Prometheus web UI via http://localhost:9090

Prometheus web UI

10. Visit the Grafana web UI#

You can create an ingress for prometheus-grafana service or port-forward the service to :8888:

kubectl -n yatai-monitoring port-forward svc/grafana 8888:80 --address 0.0.0.0

Then visit the Grafana web UI via http://localhost:8888

Note

Use the following command to get the Grafana username:

kubectl -n yatai-monitoring get secret grafana -o jsonpath='{.data.admin-user}' | base64 -d

Use the following command to get the Grafana password:

kubectl -n yatai-monitoring get secret grafana -o jsonpath='{.data.admin-password}' | base64 -d
Grafana web UI

2. Collect BentoDeployment metrics#

1. Create PodMonitor for BentoDeployment#

kubectl apply -f https://raw.githubusercontent.com/bentoml/yatai/main/scripts/monitoring/bentodeployment-podmonitor.yaml

After some time you can see in the service discovery page in the Prometheus web UI that the bento deployment has been discovered:

Prometheus service discovery header menu Prometheus service discovery

Now you can auto-complete to BentoML’s metrics in the prometheus expression input box:

Prometheus metrics auto complete Prometheus BentoML metrics

3. Create Grafana Dashboard for BentoDeployment#

1. Download the BentoDeployment Grafana dashboard json file#

curl -L https://raw.githubusercontent.com/bentoml/yatai/main/scripts/monitoring/bentodeployment-dashboard.json -o /tmp/bentodeployment-dashboard.json
curl -L https://raw.githubusercontent.com/bentoml/yatai/main/scripts/monitoring/bentofunction-dashboard.json -o /tmp/bentofunction-dashboard.json

2. Create Grafana dashboard configmap#

kubectl -n yatai-monitoring create configmap bentodeployment-dashboard --from-file=/tmp/bentodeployment-dashboard.json
kubectl -n yatai-monitoring label configmap bentodeployment-dashboard grafana_dashboard=1

kubectl -n yatai-monitoring create configmap bentofunction-dashboard --from-file=/tmp/bentofunction-dashboard.json
kubectl -n yatai-monitoring label configmap bentofunction-dashboard grafana_dashboard=1

3. Go to the Grafana web UI to check out the BentoDeployment dashboard#

Note

Wait a few minutes for the Grafana process to automatically reload the configuration

Grafana BentoDeployment dashboard