Kubernetes Observability - Health Probes Guide

What are Kubernetes Probes?

Probing the Container

The kubelet checks containers periodically using probes
Probes determine the health and readiness of containers
Three types of probes: Startup, Readiness, and Liveness
Each probe can use different action types to check health
Essential for maintaining application reliability

Kubelet

→

Probe

→

Container

→

Response

Startup Probe

To know when a container has started

→

Readiness Probe

To know when a container is ready to accept traffic

→

Liveness Probe

Indicates whether the code is running or not

Probe Types Explained

Startup Probe

Used for slow-starting containers to determine when the application has successfully started.

Disables liveness and readiness checks until it succeeds
Useful for legacy applications with long startup times
Prevents killing containers during initialization

Readiness Probe

Determines if a container is ready to serve requests.

A failing readiness probe stops traffic to the pod
Container remains running but not receiving traffic
Essential for rolling updates and load balancing

Liveness Probe

Determines if the container is running properly.

A failing liveness probe restarts the container
Detects deadlocks and hung applications
Ensures application remains responsive

Important: A failing readiness probe will stop the application from receiving traffic. A failing liveness probe will restart the container.

Probe Action Types

ExecAction

Execute a command inside the container

                            exec:

                              command:

                              - cat

                              - /app/healthy

TCPSocketAction

Check if a TCP socket port is open

                            tcpSocket:

                              port: 8080

HTTPGetAction

Performs an HTTP GET against a specific port and path

                            httpGet:

                              path: /healthz

                              port: 8080

HTTPGet Additional Options

httpGet:
  path: /health
  port: 8080
  host: 127.0.0.1
  scheme: HTTPS
  httpHeaders:
  - name: Custom-Header
    value: Awesome

Probe Configuration Parameters

initialDelaySeconds: Delay before first probe
periodSeconds: How often to probe
timeoutSeconds: Probe timeout
successThreshold: Consecutive successes needed
failureThreshold: Consecutive failures allowed

Complete Probes Example

apiVersion: v1
kind: Pod
metadata:
  name: goproxy
  labels:
    app: goproxy
spec:
  containers:
  - name: goproxy
    image: k8s.gcr.io/goproxy:0.1
    ports:
    - containerPort: 8080
    
    # Startup Probe - for slow starting containers
    startupProbe:
      httpGet:
        path: /healthz
        port: 8080
      failureThreshold: 3
      periodSeconds: 10
    
    # Readiness Probe - when container is ready for traffic
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    
    # Liveness Probe - if container is running properly
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

Startup Probe

HTTP GET to /healthz on port 8080
Checks every 10 seconds
Allows 3 failures before giving up
Disables other probes until successful

Readiness Probe

TCP socket check on port 8080
Starts after 5 seconds
Checks every 10 seconds
If fails, stops traffic to the pod

Liveness Probe

TCP socket check on port 8080
Starts after 15 seconds
Checks every 20 seconds
If fails, restarts the container

Advanced Probe Examples

Exec Action Example

apiVersion: v1
kind: Pod
metadata:
  name: postgres-db
spec:
  containers:
  - name: postgres
    image: postgres:13
    env:
    - name: POSTGRES_PASSWORD
      value: "secret"
    livenessProbe:
      exec:
        command:
        - sh
        - -c
        - exec pg_isready -U postgres
      initialDelaySeconds: 30
      periodSeconds: 10
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - exec pg_isready -U postgres
      initialDelaySeconds: 5
      periodSeconds: 5

HTTP Get with Headers Example

apiVersion: v1
kind: Pod
metadata:
  name: web-application
spec:
  containers:
  - name: webapp
    image: nginx:latest
    livenessProbe:
      httpGet:
        path: /health
        port: 80
        httpHeaders:
        - name: X-Custom-Auth
          value: "Bearer token123"
        - name: Accept
          value: application/json
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 2
    readinessProbe:
      httpGet:
        path: /ready
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5
      successThreshold: 2
      failureThreshold: 3

Database with All Three Probes

apiVersion: v1
kind: Pod
metadata:
  name: mysql-database
spec:
  containers:
  - name: mysql
    image: mysql:8.0
    env:
    - name: MYSQL_ROOT_PASSWORD
      value: "secret123"
    startupProbe:
      exec:
        command:
        - sh
        - -c
        - mysqladmin ping -h localhost -uroot -p${MYSQL_ROOT_PASSWORD}
      failureThreshold: 30
      periodSeconds: 10
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - mysql -e 'SELECT 1' -uroot -p${MYSQL_ROOT_PASSWORD}
      initialDelaySeconds: 5
      periodSeconds: 5
    livenessProbe:
      tcpSocket:
        port: 3306
      initialDelaySeconds: 30
      periodSeconds: 10

Probes Best Practices

Configuration Guidelines

Set appropriate initial delays: Allow applications to start properly
Use conservative timeouts: Avoid false positives from slow responses
Configure realistic periods: Balance between responsiveness and load
Use startup probes for slow applications: Prevent unnecessary restarts
Make readiness probes lightweight: They run frequently during traffic
Test failure scenarios: Ensure probes work as expected

Application Design

Implement proper health endpoints: /healthz, /readyz, /livez
Make health checks independent: Don't depend on external services
Include dependency checks in readiness: Database, cache, etc.
Keep liveness checks simple: Basic "is the process running" check
Use different endpoints: Separate health, readiness, and liveness
Log probe failures: Help with debugging issues

Probe Configuration Recommendations

Startup Probe

failureThreshold: 30
periodSeconds: 10
Use for apps taking >5min to start

Readiness Probe

periodSeconds: 5-10
timeoutSeconds: 1-3
failureThreshold: 3

Liveness Probe

periodSeconds: 10-30
initialDelaySeconds: 15-30
failureThreshold: 3

Warning: Avoid making liveness probes dependent on external services. If an external service fails, it could cause all your containers to restart in a cascade failure.