Practical Monitoring with Prometheus & Grafana (Part I)

Yitaek Hwang
4 min readMay 27, 2020

Installing Prometheus + Grafana via Helm in 5 Minutes

Prometheus at Scale: Architecture Considerations

In August 2018, Prometheus joined Kubernetes as the second project to graduate from CNCF and solidified itself as the de facto standard for open source monitoring tool for Kubernetes. From commercial offerings from Sysdig and Weaveworks to Prometheus operator charts from CoreOS and Bitnami, users now have more choices than before to install and deploy Prometheus onto Kubernetes. With so many options, how should one deploy Prometheus for scale?

Operators vs. Helm

If you aren’t familiar with operators, they are software extensions to Kubernetes that package together application-specific custom resources and configurations. For a monitoring bundle that often includes Prometheus (server, alertmanager, push gateway) and Grafana, using a Prometheus operator from CoreOS or Bitnami provides preconfigured alerts and dashboards. It’s a matter of preference to use CoreOS’s kube-prometheus operator, which uses ksonnet or the prometheus-operator Helm chart.

There are some known issues for using the prometheus-operator on a private GKE cluster, so if you don’t want to change firewall settings or prefer Bitnami’s charts (perhaps you run their Redis or Postgres charts), then Bitnami’s prometheus-operator chart works great as well.

Personally, I found the operator pattern to be bit heavy on small or workload-specific clusters (e.g. a central cluster to host Vault or ChartMuseum). Prometheus Operators also come with a ton of default rules (e.g. etcd, kube-api-server) that you may be getting via another tool (e.g. Stackdriver for GKE) so you may spend a bit more time refining what you need initially.

Multi-Cloud or Multi-Cluster Setup

As Kubernetes clusters scale, there may arise a need for a centralize multi-cluster monitoring solution. It could be to tie together clusters deployed in different regions; a multi-cloud setup to pull CloudWatch, Stackdriver, GKE, and EKS metrics; or a hierarchical architecture to pull distributed job metrics across different job-specific clusters. Prometheus is flexible in…

Yitaek Hwang

Software Engineer at NYDIG writing about cloud, DevOps/SRE, and crypto topics: https://yitaekhwang.com