Cloud-Native Patterns

Design principles and patterns for applications built to run on cloud infrastructure: containerised, dynamically orchestrated, microservices-oriented, and built for scale and resilience.

Design principles and patterns for applications built to run on cloud infrastructure: containerised, dynamically orchestrated, microservices-oriented, and built for scale and resilience.


The Twelve-Factor App

I.   Codebase       — one repo, many deploys (dev/staging/prod from same code)
II.  Dependencies   — explicitly declared (requirements.txt, package.json), never system-installed
III. Config         — in environment variables, not code (DATABASE_URL, API_KEY)
IV.  Backing services — attached as resources (DB, Redis, S3 treated identically)
V.   Build/release/run — strictly separated stages; releases are immutable
VI.  Processes      — stateless and share-nothing; state in backing services
VII. Port binding   — app exports HTTP on a port; no external web server
VIII.Concurrency    — scale out by adding processes, not threads
IX.  Disposability  — fast startup (< 5s), graceful shutdown (SIGTERM handling)
X.   Dev/prod parity — keep environments as similar as possible
XI.  Logs           — treat as event streams, write to stdout/stderr
XII. Admin processes — one-off admin tasks in the same environment as the app

Health Checks — Liveness vs Readiness

# FastAPI health endpoints — Kubernetes probes
from fastapi import FastAPI, Response

app = FastAPI()

@app.get("/health/live")
async def liveness():
    """Kubernetes liveness probe: is the process alive?
    If this fails, Kubernetes restarts the pod."""
    return {"status": "alive"}

@app.get("/health/ready")
async def readiness(response: Response):
    """Kubernetes readiness probe: is the pod ready to serve traffic?
    If this fails, Kubernetes removes the pod from the load balancer."""
    checks = {
        "database": await check_database(),
        "cache": await check_redis(),
        "external_api": await check_payment_api(),
    }
    all_healthy = all(checks.values())
    if not all_healthy:
        response.status_code = 503
    return {"status": "ready" if all_healthy else "not_ready", "checks": checks}

@app.get("/health/startup")
async def startup():
    """Kubernetes startup probe: has the app finished initialising?
    Only checked during startup; prevents liveness killing a slow-starting app."""
    return {"status": "started"}
# Kubernetes probe configuration
livenessProbe:
  httpGet:
    path: /health/live
    port: 8000
  initialDelaySeconds: 10
  periodSeconds: 10
  failureThreshold: 3

readinessProbe:
  httpGet:
    path: /health/ready
    port: 8000
  initialDelaySeconds: 5
  periodSeconds: 5
  failureThreshold: 3

startupProbe:
  httpGet:
    path: /health/startup
    port: 8000
  failureThreshold: 30    # allow 5 minutes (30 * 10s) to start
  periodSeconds: 10

Graceful Shutdown

import signal
import asyncio

class GracefulShutdown:
    def __init__(self, app):
        self.app = app
        self.shutdown_event = asyncio.Event()

    def handle_sigterm(self, *args):
        print("SIGTERM received — beginning graceful shutdown")
        self.shutdown_event.set()

    async def run(self):
        loop = asyncio.get_event_loop()
        loop.add_signal_handler(signal.SIGTERM, self.handle_sigterm)

        # Start server
        server_task = asyncio.create_task(self.app.start())

        # Wait for shutdown signal
        await self.shutdown_event.wait()

        print("Draining in-flight requests (max 30s)...")
        await asyncio.wait_for(self.app.drain(), timeout=30)
        await self.app.shutdown()

Sidecar Pattern

# Inject a proxy sidecar alongside the main container
apiVersion: apps/v1
kind: Deployment
spec:
  template:
    spec:
      containers:
      # Main application
      - name: myapp
        image: myapp:1.2.3
        ports:
        - containerPort: 8000

      # Sidecar: metrics exporter
      - name: prometheus-exporter
        image: prom/statsd-exporter:v0.26.0
        ports:
        - containerPort: 9102

      # Sidecar: log forwarder (Fluent Bit)
      - name: fluent-bit
        image: fluent/fluent-bit:3.0
        volumeMounts:
        - name: log-volume
          mountPath: /var/log/app
        env:
        - name: LOKI_URL
          value: "http://loki.monitoring:3100"

Circuit Breaker Pattern

# Prevent cascade failures when a downstream service is down
import asyncio
import time
from enum import Enum

class CircuitState(Enum):
    CLOSED = "closed"       # normal operation
    OPEN = "open"           # failing fast
    HALF_OPEN = "half_open" # testing recovery

class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60, success_threshold=2):
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.success_threshold = success_threshold
        self.failure_count = 0
        self.success_count = 0
        self.state = CircuitState.CLOSED
        self.opened_at: float = None

    async def call(self, func, *args, **kwargs):
        if self.state == CircuitState.OPEN:
            if time.time() - self.opened_at > self.timeout:
                self.state = CircuitState.HALF_OPEN
                self.success_count = 0
            else:
                raise RuntimeError(f"Circuit OPEN — rejecting call to {func.__name__}")

        try:
            result = await func(*args, **kwargs)
            self._on_success()
            return result
        except Exception as e:
            self._on_failure()
            raise

    def _on_success(self):
        if self.state == CircuitState.HALF_OPEN:
            self.success_count += 1
            if self.success_count >= self.success_threshold:
                self.state = CircuitState.CLOSED
                self.failure_count = 0
        elif self.state == CircuitState.CLOSED:
            self.failure_count = max(0, self.failure_count - 1)

    def _on_failure(self):
        self.failure_count += 1
        if self.failure_count >= self.failure_threshold:
            self.state = CircuitState.OPEN
            self.opened_at = time.time()

# Usage
payment_breaker = CircuitBreaker(failure_threshold=5, timeout=60)

async def charge_payment(amount: float, token: str):
    return await payment_breaker.call(payment_gateway.charge, amount=amount, token=token)

Retry with Exponential Backoff

import asyncio
import random
from functools import wraps

def retry(max_attempts=3, base_delay=1.0, max_delay=30.0, exceptions=(Exception,)):
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            for attempt in range(max_attempts):
                try:
                    return await func(*args, **kwargs)
                except exceptions as e:
                    if attempt == max_attempts - 1:
                        raise
                    delay = min(base_delay * (2 ** attempt), max_delay)
                    jitter = random.uniform(0, delay * 0.2)  # prevent thundering herd
                    await asyncio.sleep(delay + jitter)
        return wrapper
    return decorator

@retry(max_attempts=3, base_delay=1.0, exceptions=(httpx.TimeoutException, httpx.ConnectError))
async def fetch_product_data(product_id: str):
    async with httpx.AsyncClient() as client:
        return await client.get(f"https://api.products.internal/{product_id}", timeout=5.0)

Common Failure Cases

Liveness probe kills a pod during a legitimate slow startup Why: initialDelaySeconds is shorter than the app's actual initialisation time (DB migrations, schema loading, model warm-up), causing Kubernetes to restart the pod in a restart loop before it ever becomes ready. Detect: Pod status shows CrashLoopBackOff with exit code 0 or the liveness probe error in kubectl describe pod; the app logs show successful startup just before the kill signal. Fix: Add a startupProbe with a generous failureThreshold (e.g., failureThreshold: 30, periodSeconds: 10 = 5-minute allowance) so the liveness probe does not activate until startup is confirmed complete.

Circuit breaker opens permanently because the timeout is shorter than the half-open test duration Why: If timeout (seconds before half-open) is shorter than the upstream's actual recovery time, the circuit reopens immediately on the half-open probe and never closes. Detect: Circuit stays in OPEN state indefinitely; half-open attempts always fail even after the upstream is healthy again; all requests are rejected with Circuit OPEN. Fix: Set timeout to at least 2-3x the upstream's typical recovery SLA; use a success threshold greater than 1 so a single flaky probe does not prematurely close the circuit.

Retry decorator amplifies load during an outage instead of protecting the origin Why: If all callers retry without jitter, they all fire at the same instant after each backoff interval, creating a thundering herd that overwhelms a recovering service. Detect: Upstream service metrics show repeated traffic spikes at regular intervals (backoff period) rather than gradual recovery; error rate stays high during what should be a quiet period. Fix: The existing code already adds jitter = random.uniform(0, delay * 0.2) — ensure this is present; also cap max_attempts and combine with a circuit breaker so retries stop entirely when the circuit is open.

Stateful config in environment variables violating twelve-factor factor III Why: Config values that vary between environments are baked into Dockerfile ENV instructions or container images rather than injected at runtime, making the same image behave differently across environments impossible. Detect: Deploying the "same" image to staging produces different behaviour than production; docker inspect shows ENV contains non-default config values. Fix: Remove all environment-specific ENV from Dockerfiles; inject config exclusively via Kubernetes ConfigMaps, Secrets, or platform environment variables at deployment time.

Connections

cloud-hub · cloud/kubernetes-operators · cloud/serverless-patterns · cloud/service-mesh · cs-fundamentals/distributed-systems · cs-fundamentals/microservices-patterns · llms/ae-hub

Open Questions

  • What monitoring and alerting matter most when this is deployed in production?
  • At what scale or workload does this approach hit its practical limits?