📊 Kubernetes Monitoring and Observability

Master Kubernetes monitoring and observability with this in-depth guide. Learn how to implement Prometheus and Grafana, set up custom metrics, create powerful dashboards, and establish effective alerting strategies. Essential knowledge for maintaining production-grade Kubernetes clusters.

Posted Aug 6, 2025

2 min read

Kubernetes Monitoring and Observability

Learn how to implement comprehensive monitoring and observability in your Kubernetes clusters to ensure optimal performance and quick troubleshooting.

What We’ll Cover

Setting up Prometheus and Grafana
Custom Metrics and Service Monitors
Alert Management
Log Aggregation
Distributed Tracing

Prerequisites

Working Kubernetes cluster
Helm installed
Basic understanding of monitoring concepts

Installing Prometheus Operator

First, let’s set up Prometheus Operator using Helm:

  
# Add Prometheus community charts helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo update # Install Prometheus Stack helm install monitoring prometheus-community/kube-prometheus-stack \ --namespace monitoring \ --create-namespace 

Custom ServiceMonitor Configuration

  
apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: app-monitor namespace: monitoring spec: selector: matchLabels: app: my-app endpoints: - port: metrics interval: 15s namespaceSelector: matchNames: - default 

Creating Custom Metrics

Example of a custom metrics endpoint in Go:

  
package main import ( "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promhttp" ) var ( httpRequestsTotal = prometheus.NewCounterVec( prometheus.CounterOpts{ Name: "http_requests_total", Help: "Total number of HTTP requests", }, []string{"method", "endpoint"}, ) ) func init() { prometheus.MustRegister(httpRequestsTotal) } 

Grafana Dashboard Configuration

Example dashboard JSON:

  
{ "dashboard": { "id": null, "title": "Application Overview", "panels": [ { "title": "Request Rate", "type": "graph", "datasource": "Prometheus", "targets": [ { "expr": "rate(http_requests_total[5m])", "legendFormat": " " } ] } ] } } 

Alert Configuration

PrometheusRule example:

  
apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: application-alerts namespace: monitoring spec: groups: - name: application rules: - alert: HighErrorRate expr: | rate(http_requests_total{status=~"5.*"}[5m])  /  rate(http_requests_total[5m]) > 0.1 for: 5m labels: severity: critical annotations: description: Error rate is above 10% for 5 minutes 

Log Aggregation with Loki

Installing Loki stack:

  
helm repo add grafana https://grafana.github.io/helm-charts helm install loki grafana/loki-stack \ --namespace monitoring \ --set grafana.enabled=false 

Distributed Tracing with Jaeger

Jaeger deployment:

  
apiVersion: apps/v1 kind: Deployment metadata: name: jaeger spec: selector: matchLabels: app: jaeger template: metadata: labels: app: jaeger spec: containers: - name: jaeger image: jaegertracing/all-in-one:latest ports: - containerPort: 16686 - containerPort: 14268 

Best Practices

Metric Collection:
- Use meaningful labels
- Follow naming conventions
- Keep cardinality under control
Alerting:
- Define clear severity levels
- Avoid alert fatigue
- Include runbooks
Dashboard Design:
- Start with overview
- Use consistent layouts
- Include documentation

Video Resources

Additional Resources

Kubernetes, DevOps, Monitoring, Tutorial

This post is licensed under CC BY 4.0 by the author.