๐Ÿ“Š Kubernetes Monitoring and Observability

Kubernetes Monitoring and Observability

Learn how to implement comprehensive monitoring and observability in your Kubernetes clusters to ensure optimal performance and quick troubleshooting.

What Weโ€™ll Cover

  1. Setting up Prometheus and Grafana
  2. Custom Metrics and Service Monitors
  3. Alert Management
  4. Log Aggregation
  5. Distributed Tracing

Prerequisites

  • Working Kubernetes cluster
  • Helm installed
  • Basic understanding of monitoring concepts

Installing Prometheus Operator

First, letโ€™s set up Prometheus Operator using Helm:

# Add Prometheus community charts
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install Prometheus Stack
helm install monitoring prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace

Custom ServiceMonitor Configuration

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: app-monitor
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: my-app
  endpoints:
  - port: metrics
    interval: 15s
  namespaceSelector:
    matchNames:
    - default

Creating Custom Metrics

Example of a custom metrics endpoint in Go:

package main

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

var (
    httpRequestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint"},
    )
)

func init() {
    prometheus.MustRegister(httpRequestsTotal)
}

Grafana Dashboard Configuration

Example dashboard JSON:

{
  "dashboard": {
    "id": null,
    "title": "Application Overview",
    "panels": [
      {
        "title": "Request Rate",
        "type": "graph",
        "datasource": "Prometheus",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": " "
          }
        ]
      }
    ]
  }
}

Alert Configuration

PrometheusRule example:

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: application-alerts
  namespace: monitoring
spec:
  groups:
  - name: application
    rules:
    - alert: HighErrorRate
      expr: |
        rate(http_requests_total{status=~"5.*"}[5m]) 
        / 
        rate(http_requests_total[5m]) > 0.1
      for: 5m
      labels:
        severity: critical
      annotations:
        description: Error rate is above 10% for 5 minutes

Log Aggregation with Loki

Installing Loki stack:

helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki-stack \
  --namespace monitoring \
  --set grafana.enabled=false

Distributed Tracing with Jaeger

Jaeger deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: jaeger
spec:
  selector:
    matchLabels:
      app: jaeger
  template:
    metadata:
      labels:
        app: jaeger
    spec:
      containers:
      - name: jaeger
        image: jaegertracing/all-in-one:latest
        ports:
        - containerPort: 16686
        - containerPort: 14268

Best Practices

  1. Metric Collection:
    • Use meaningful labels
    • Follow naming conventions
    • Keep cardinality under control
  2. Alerting:
    • Define clear severity levels
    • Avoid alert fatigue
    • Include runbooks
  3. Dashboard Design:
    • Start with overview
    • Use consistent layouts
    • Include documentation

Video Resources

Monitoring Fundamentals

Advanced Monitoring

Observability Practices

Additional Resources

Written on August 6, 2025