Monday, March 6th 2023

Prometheus: Measure to Improve

Josh Adams

Monitoring your business is vital. As the management guru Peter Drucker once said, "If you can't measure it, you can't improve it." Fortunately, Prometheus is a cloud-native monitoring tool that can help you measure and improve your business operations. In this blog post, we'll explore the benefits of Prometheus, how it works, and how you can get started using it.

Overview

Prometheus is at the heart of the modern open-source observability stack. It's a time-series database that enables you to collect and analyze metrics from your entire infrastructure. With Prometheus, you can gather data across all of your services, finding new ones via service discovery as they are deployed. Every data source that you want to scrape has some *-exporter package that you can run to connect to it and get its data into Prometheus. This includes all of the Kubernetes components you can imagine, whatever database you use (PostgreSQL, MySQL, SQL Server, etc), your key-value stores, your Slack workspace...seriously, if you use a thing, someone's likely written a prometheus exporter for it.

Here's a diagram of its architecture:

One of the most significant benefits of Prometheus is the broad ecosystem of open-source tooling built for it that works together tightly:

Grafana is an open-source analytics and visualization platform. You can use it to see your Prometheus metrics associated with your Loki logs and your Tempo traces, and the UI makes it easy to query the data and share it or reuse it on custom dashboards. Additionally, there are pre-built Grafana dashboards for most of the data that you'll collect.
Alertmanager allows you to set policies to define the alert conditions for your system, and route them to the systems that need to take action on those alerts. Importantly, these alerts are built from metrics tracked in Prometheus.
Loki is a log aggregation system that helps you search, filter, and analyze logs. It can derive metrics from logs as they flow through and report them to Prometheus.
Tempo is a distributed tracing system that helps you troubleshoot issues across your entire infrastructure.
OnCall is a service management platform that helps you manage on-call schedules, incidents, and communication.

How it works

Prometheus uses a highly dimensional time-series data model, which means that you can track and analyze multiple dimensions of your infrastructure. For example, you can track the CPU usage, memory usage, and disk usage of your servers over time. You can also track the response time, error rate, and throughput of your services. This enables you to identify performance bottlenecks, troubleshoot issues, and optimize your infrastructure.

Prometheus discovers services to gather metrics on, and it regularly fetches metrics from those servers. The Prometheus server stores the metrics in a time-series database, and you can query and visualize the data using Grafana. PromQL is the query language used by Prometheus, and it allows you to filter, aggregate, and transform your data. With Grafana, you can create beautiful visualizations of your data, such as line charts, bar charts, and heat maps.

Getting Started

Getting started with Prometheus is relatively straightforward. The easiest way to get started is to use helm to install kube-prometheus-stack:

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create ns monitoring
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring

After it's had time to install and come up a bit, you can check its status by just getting all the resources from the monitoring namespace:

kubectl get all -n monitoring

You should see some output like this:

NAME                                                         READY   STATUS            RESTARTS     AGE
pod/prometheus-prometheus-node-exporter-5dwfx                1/1     Running           0            30s
pod/prometheus-kube-prometheus-operator-6ddf989d68-rbddz     1/1     Running           0            30s
pod/prometheus-prometheus-kube-prometheus-prometheus-0       0/2     PodInitializing   0            17s
pod/prometheus-kube-state-metrics-5449b9d649-gp2gf           1/1     Running           0            30s
pod/prometheus-grafana-59bb456595-27hfj                      2/3     Running           0            30s
pod/alertmanager-prometheus-kube-prometheus-alertmanager-0   2/2     Running           1 (8s ago)   17s

NAME                                              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
service/prometheus-kube-prometheus-prometheus     ClusterIP   10.43.87.16     <none>        9090/TCP                     30s
service/prometheus-grafana                        ClusterIP   10.43.209.232   <none>        80/TCP                       30s
service/prometheus-kube-state-metrics             ClusterIP   10.43.29.1      <none>        8080/TCP                     30s
service/prometheus-kube-prometheus-alertmanager   ClusterIP   10.43.213.175   <none>        9093/TCP                     30s
service/prometheus-kube-prometheus-operator       ClusterIP   10.43.118.125   <none>        443/TCP                      30s
service/prometheus-prometheus-node-exporter       ClusterIP   10.43.22.58     <none>        9100/TCP                     30s
service/alertmanager-operated                     ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   17s
service/prometheus-operated                       ClusterIP   None            <none>        9090/TCP                     17s

NAME                                                 DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
daemonset.apps/prometheus-prometheus-node-exporter   1         1         1       1            1           <none>          30s

NAME                                                  READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/prometheus-grafana                    0/1     1            0           30s
deployment.apps/prometheus-kube-prometheus-operator   1/1     1            1           30s
deployment.apps/prometheus-kube-state-metrics         1/1     1            1           30s

NAME                                                             DESIRED   CURRENT   READY   AGE
replicaset.apps/prometheus-grafana-59bb456595                    1         1         0       30s
replicaset.apps/prometheus-kube-prometheus-operator-6ddf989d68   1         1         1       30s
replicaset.apps/prometheus-kube-state-metrics-5449b9d649         1         1         1       30s

NAME                                                                    READY   AGE
statefulset.apps/prometheus-prometheus-kube-prometheus-prometheus       0/1     17s
statefulset.apps/alertmanager-prometheus-kube-prometheus-alertmanager   1/1     17s

Now let's check out prometheus:

# I'll port-forward to its web interface on localhost:9090
kubectl port-forward service/prometheus-kube-prometheus-prometheus -n monitoring 9090:http-web

Now if you visit http://localhost:9090/targets you can see all of the targets that prometheus has discovered and is scraping. With the prometheus operator, you can publish a ServiceMonitor and your prometheus instance will discover and begin scraping the specified service. It's incredibly easy to maintain the monitoring targets via gitops.

You can visit http://localhost:9090/alerts to view the current status of the alerts in Alertmanager:

But the real fun comes in graphing when it comes to metrics, of course, so here's a graph of each core's scaling frequency

Bonus: You've actually got a grafana available to play around with as well, as part of the kube-prometheus-stack. You can port-forward to it and dig around - we'll have another post digging into Grafana in detail soon.

# Get the grafana admin password secret
kubectl get secrets -n monitoring prometheus-grafana -ojsonpath='{.data.admin-password}' | base64 -d
# Port forward to grafana
kubectl port-forward service/prometheus-grafana -n monitoring 3000:http-web

Now you can log in with the username admin and the password you got in the previous step, and browse a bunch of great dashboards.

Summary

In conclusion, Prometheus is a powerful monitoring tool that can help you measure and improve your business operations. It allows you to gather data across all of your services, track multiple dimensions of your infrastructure, and visualize your data using Grafana.

However, setting up and managing Prometheus can be challenging. That's where managed Kubernetes services like Kolony can help. Kolony provides managed Kubernetes services that include Prometheus monitoring. We can help you get the most out of Prometheus by handling the setup, management, and monitoring of your infrastructure. Contact us today to learn more about how we can help you measure and improve your business operations.

Prometheus: Measure to Improve

Overview

How it works

Getting Started

Summary

Resources

Subscribe to our NewsletterAt Kolony, we'll keep you up to date
on everything Kubernetes

Prometheus: Measure to Improve

Overview

How it works

Getting Started

Summary

Resources

Subscribe to our NewsletterAt Kolony, we'll keep you up to dateon everything Kubernetes

Subscribe to our NewsletterAt Kolony, we'll keep you up to date
on everything Kubernetes