CI/CD Pipelines

CI/CD pipeline design as a discipline — stage ordering (lint→build→test→scan→staging→prod), artifact promotion, Jenkins Declarative Pipeline, Azure DevOps YAML stages, and the four DORA metrics that measure delivery performance.

The vault covers GitHub Actions in cloud/github-actions, GitOps workflows in cloud/argocd and cloud/gitops-patterns, and deployment strategies in cloud/blue-green-deployment. This page covers CI/CD pipeline design as a discipline. What stages to include, how to structure them across tools, and how to measure delivery performance.


The Pipeline Contract

One artefact, all environments. Build the Docker image once; push it with a content-addressed SHA tag. Promote the same image through dev → staging → prod. Never rebuild per environment. Rebuilding introduces drift.

Build:    myapp:sha-abc123  →  push to registry
Staging:  tag as :staging, deploy
Prod:     tag as :prod, deploy same image

Environment-specific configuration comes from environment variables injected at deploy time, not baked into the image.

Full run target: under 15 minutes for a production-ready pipeline.


Stage Ordering

The standard pipeline progression:

StageTool ExamplesPurposeFast-Fail?
1. Lint / Static Analysisruff, ESLint, mypyCatch style + type errors before buildingYes
2. BuildDocker, Maven, npmCompile, package, push artefactYes
3. Unit Testspytest, JestIsolated tests, no external depsYes
4. Integration Testspytest + testcontainersReal databases, queues, APIsYes
5. Security ScanTrivy, Semgrep, detect-secretsSAST, CVEs, exposed secretsWarn or fail
6. Deploy to StagingHelm, kubectlDeploy artefact to stagingYes
7. E2E / Smoke TestsPlaywright, httpxValidate staging is functionalYes
8. Approval GateManual or automatedConfirm readiness for prodBlock
9. Deploy to ProductionHelm, kubectlDeploy same artefact to prodYes
10. Post-Deploy Verificationsynthetic monitoringConfirm prod is healthyAlert

Stages 1–4 can run in parallel (lint + unit test + security scan are independent). Stages 6+ are sequential.

Engineering Tradeoffs — when to gate a deployment vs accept risk, rollback cost vs velocity, and how pipeline decisions compound into production reliability.


Jenkins Declarative Pipeline

Jenkins uses a Jenkinsfile checked into the repo. Declarative Pipeline syntax (recommended over Scripted):

pipeline {
    agent none  // critical: avoids executor starvation — agents allocated per stage

    environment {
        REGISTRY = "registry.example.com"
        IMAGE    = "${REGISTRY}/myapp"
    }

    stages {
        stage('Lint & Security') {
            parallel {
                stage('Lint') {
                    agent { label 'docker' }
                    steps {
                        sh 'ruff check . && mypy src/'
                    }
                }
                stage('Security Scan') {
                    agent { label 'docker' }
                    steps {
                        sh 'trivy fs --exit-code 1 --severity HIGH,CRITICAL .'
                    }
                }
            }
        }

        stage('Build') {
            agent { label 'docker' }
            steps {
                sh """
                    docker build -t ${IMAGE}:${GIT_COMMIT} .
                    docker push ${IMAGE}:${GIT_COMMIT}
                """
                stash name: 'image-tag', includes: 'image-tag.txt'
            }
        }

        stage('Test') {
            parallel {
                stage('Unit') {
                    agent { label 'docker' }
                    steps { sh 'pytest tests/unit -x --junitxml=unit-results.xml' }
                    post { always { junit 'unit-results.xml' } }
                }
                stage('Integration') {
                    agent { label 'docker' }
                    steps { sh 'pytest tests/integration --junitxml=int-results.xml' }
                    post { always { junit 'int-results.xml' } }
                }
            }
        }

        stage('Deploy Staging') {
            agent { label 'k8s' }
            steps {
                sh "helm upgrade --install myapp ./charts/myapp --set image.tag=${GIT_COMMIT} -f values.staging.yaml"
                sh 'pytest tests/smoke --base-url https://staging.example.com'
            }
        }

        stage('Deploy Production') {
            when {
                branch 'main'
            }
            input {
                message 'Deploy to production?'
                ok 'Yes, deploy now'
            }
            agent { label 'k8s' }
            steps {
                sh "helm upgrade --install myapp ./charts/myapp --set image.tag=${GIT_COMMIT} -f values.prod.yaml"
            }
        }
    }

    post {
        failure { slackSend channel: '#deploys', color: 'danger', message: "Pipeline failed: ${env.BUILD_URL}" }
        success { slackSend channel: '#deploys', color: 'good', message: "Deployed ${GIT_COMMIT[0..7]} to prod" }
    }
}

Critical: agent none at top level. Without this, the parent pipeline holds an executor while waiting for parallel children — classic deadlock (executor starvation).

parallel {} runs stages concurrently. Add failFast true inside parallel to abort siblings when one fails.


Azure DevOps YAML Pipelines

# azure-pipelines.yml
trigger:
  branches:
    include: [main]
  paths:
    exclude: [docs/**, '*.md']

variables:
  imageTag: $(Build.SourceVersion)

stages:
  - stage: Build
    displayName: 'Build & Test'
    jobs:
      - job: BuildTest
        pool: { vmImage: 'ubuntu-latest' }
        steps:
          - script: |
              docker build -t $(containerRegistry)/myapp:$(imageTag) .
              docker push $(containerRegistry)/myapp:$(imageTag)
            displayName: 'Build and push image'
          - script: pytest tests/ --junitxml=results.xml
          - task: PublishTestResults@2
            inputs: { testResultsFiles: 'results.xml' }

  - stage: Staging
    dependsOn: Build
    displayName: 'Deploy to Staging'
    jobs:
      - deployment: DeployStaging
        pool: { vmImage: 'ubuntu-latest' }
        environment: staging           # environment tracks deployment history
        strategy:
          runOnce:
            deploy:
              steps:
                - script: |
                    helm upgrade --install myapp ./charts/myapp \
                      --set image.tag=$(imageTag) \
                      -f values.staging.yaml

  - stage: Production
    dependsOn: Staging
    condition: |
      and(
        succeeded(),
        eq(variables['Build.SourceBranch'], 'refs/heads/main')
      )
    displayName: 'Deploy to Production'
    jobs:
      - deployment: DeployProd
        pool: { vmImage: 'ubuntu-latest' }
        environment: production        # approval gates configured per environment in portal
        strategy:
          canary:
            increments: [10, 50, 100]  # 10% → 50% → 100% traffic
            deploy:
              steps:
                - script: |
                    helm upgrade --install myapp ./charts/myapp \
                      --set image.tag=$(imageTag) \
                      -f values.prod.yaml

Azure DevOps hierarchy: Pipeline → Stages → Jobs → Steps → Tasks.

environment objects in Azure DevOps track deployment history and host approval policies. A production environment can require specific reviewers, a business-hours window, or a linked work item before a deployment proceeds.

dependsOn creates DAG-style stage dependencies. condition controls whether a stage runs.


DORA Metrics

The four metrics from Google's DevOps Research and Assessment (DORA) programme. The empirical foundation of modern delivery performance measurement.

MetricWhat It MeasuresEliteHighMediumLow
Deployment FrequencyHow often you deploy to prodMultiple/dayDaily–weeklyWeekly–monthly< Monthly
Lead Time for ChangesCommit → running in prod< 1 hour1 day–1 week1 week–1 month> 1 month
Change Failure Rate% of deploys causing incidents0–15%16–30%16–30%> 30%
MTTR (Mean Time to Restore)Time to recover from incident< 1 hour< 1 day1 day–1 week> 1 week
# Automated DORA tracking — calculate lead time
from datetime import datetime
import subprocess

def lead_time_for_change(commit_sha: str) -> float:
    """Time in hours from commit to production deploy."""
    # Get commit timestamp
    commit_ts = subprocess.check_output(
        ["git", "show", "-s", "--format=%ct", commit_sha]
    ).decode().strip()
    commit_dt = datetime.fromtimestamp(int(commit_ts))

    # Production deploy timestamp comes from your CD system (Argo, Helm, etc.)
    # Here we read it from a deploy log or annotation
    deploy_dt = get_prod_deploy_time(commit_sha)  # from your CD system

    return (deploy_dt - commit_dt).total_seconds() / 3600

Jenkins ships a DORA Metrics plugin (v2.8.1+) for automated collection. Datadog and Grafana both have DORA dashboards.


Trunk-Based Development

The branching strategy that enables high deployment frequency.

Trunk-based: All developers commit to main (the trunk) at least daily. Feature branches live for < 1 day. Feature flags control whether new code is active in production.

Gitflow: Long-lived develop, release, and feature branches. Merging is expensive; integration delays are common. Slower but used in regulated industries requiring release sign-off.

Trunk-based: feature-flag controls visibility, code always ships to prod
Gitflow:     release branch accumulates changes, ships on a fixed cycle

Feature flags are the enabler: incomplete features can land in main behind a flag, keeping the trunk green while development continues. See cs-fundamentals/feature-flags.


Environment Promotion Pattern

┌─────────┐    ┌─────────┐    ┌──────────┐    ┌────────────┐
│  Build  │───▶│   Dev   │───▶│ Staging  │───▶│ Production │
└─────────┘    └─────────┘    └──────────┘    └────────────┘
  sha-abc123    auto-deploy    auto-deploy    approval gate
  pushed to      on merge       on merge        + canary
  registry        to main       to main

Same image (sha-abc123) at every stage. Environment-specific config via Helm values files or Kubernetes Secrets. Never baked in.


Pipeline Quality Gates

GateCheckAction on Fail
Pre-commitLint, type check, secret scanBlock commit
PRUnit tests, linting, coverageBlock merge
Post-mergeIntegration tests, full test suiteAlert on-call, block staging deploy
StagingE2E, performance baselineBlock prod deploy
ProductionSmoke tests, synthetic monitoringRollback

See technical-qa/ci-cd-quality-gates for the full implementation with YAML examples.


Key Facts

  • Jenkins declarative: agent none at top level prevents executor starvation in parallel pipelines
  • Azure DevOps: environment objects carry approval gates and deployment history
  • DORA elite: deploy multiple times/day, lead time < 1 hour, MTTR < 1 hour
  • Trunk-based development is the branching strategy that enables high deployment frequency
  • Build once, promote everywhere — never rebuild per environment

Common Failure Cases

Jenkins executor starvation in parallel stages
Why: agent { label 'docker' } is declared on the parent pipeline instead of each stage; the parent holds an executor while waiting for parallel children.
Detect: builds queue indefinitely; Jenkins executor count shows all slots occupied by waiting parent pipelines.
Fix: set agent none at the top-level pipeline block; declare agent on each individual stage.

Different image deployed to production than tested in staging
Why: the pipeline rebuilt the Docker image for production instead of promoting the staging-tested image; environment variables or base image were different.
Detect: docker inspect shows different layer SHA between staging and production images; check the pipeline for duplicate docker build steps.
Fix: build once, tag with git commit SHA, promote the same image through all environments; never rebuild per environment.

Secret exposed in build logs
Why: a --build-arg or echo in a shell step printed a secret to stdout; CI logs are publicly accessible.
Detect: search build log output for credential patterns; run detect-secrets in the scan stage.
Fix: pass secrets via environment variables from the secret store, never as build args; add detect-secrets as a pipeline gate.

Flaky integration test blocks every PR
Why: integration test relies on a real external service or timing; it passes locally but fails ~20% of the time in CI due to network variability.
Detect: same test fails and passes on re-run without code changes; failure rate > 5% on a single test.
Fix: mock the external dependency with testcontainers or a mock server; add the test to a quarantine suite until it's made deterministic.

DORA lead time metric is inaccurate — commit timestamp is wrong
Why: git log returns the author date, not the committer date; rebased commits have author dates weeks in the past.
Detect: DORA dashboard shows lead times of days for commits that deployed in hours.
Fix: use git show -s --format=%ct for the committer timestamp, not author timestamp; validate against a known-good deploy.

Connections

Open Questions

  • When does Azure DevOps outperform GitHub Actions for enterprise teams? (Compliance controls? Legacy integration?)
  • At what team size does trunk-based development become harder to enforce than Gitflow?
  • How do DORA metrics change for LLM-based services where deployment includes model version changes?