CI/CD Pipelines with Jenkins and Docker: Production Guide

Last year, our team was deploying code manually to production. Every Friday at 5 PM (terrible idea, I know), someone would SSH into servers, pull the latest code, restart services, and pray nothing broke. When we scaled from 50k to 2M daily active users, this approach became completely unsustainable. We lost an entire weekend debugging a deployment that went sideways because someone forgot to run database migrations.

That's when I spent three months building out our CI/CD infrastructure with Jenkins and Docker. We went from manual deployments taking 45 minutes (with a 30% failure rate) to automated pipelines completing in 8 minutes with a 98% success rate. We now handle 500+ deployments per month across 12 microservices.

This isn't a beginner's tutorial on installing Jenkins. I'm assuming you understand Docker basics, know what CI/CD means, and you're here because you need to build something that actually works in production. I'm going to share everything we learned the hard way—the configurations that broke at 3 AM, the security holes we discovered, the performance optimizations that cut our build times in half, and the architectural decisions I'd make differently if I started over today.

Why We Chose Jenkins and Docker (And What We Almost Picked Instead)

Before committing to Jenkins, I evaluated GitLab CI, CircleCI, and GitHub Actions for two weeks. Here's why we landed on Jenkins, and I'll be honest about where it falls short.

Jenkins won because:

We needed complete control over our build environment. Our application has dependencies on legacy C++ libraries that require specific compiler versions. CircleCI's Docker images didn't support our exact setup, and customizing them meant waiting for support tickets. With Jenkins, I could build exactly the environment we needed.

We already had on-premise servers with spare capacity. The CFO wasn't excited about adding another $500/month SaaS subscription when we had hardware sitting idle. Jenkins runs on our existing infrastructure for essentially zero additional cost.

Our security team required that source code never leave our network. This ruled out cloud-based CI/CD solutions immediately. Jenkins runs entirely within our VPC.

Where Jenkins hurts:

The UI looks like it's from 2010 (because it basically is). Our junior developers initially hated it compared to the sleek interfaces of modern CI/CD tools. I've spent hours explaining Jenkinsfile syntax that would be intuitive in GitLab CI.

Plugin management is a nightmare. We have 47 plugins installed, and I dread updating them because something always breaks. Last month, updating the Docker plugin broke our entire pipeline for 6 hours until I rolled back.

Initial setup took me three full days. Modern alternatives like GitHub Actions get you running in 20 minutes. But that upfront investment paid off in flexibility.

Docker's role in our architecture:

Docker solved our "works on my machine" problem permanently. Before Docker, we had developers running macOS, Linux, and Windows, with subtle environment differences causing bugs that only appeared in production. Now everyone—local development, CI pipeline, staging, and production—runs the exact same Docker images.

We also needed to support multiple runtime environments. We have services running Node.js 16, Node.js 18, Python 3.9, Python 3.11, and Go 1.21. Managing these dependencies on bare metal servers was chaos. Docker containers gave us perfect isolation.

The killer feature: immutable deployments. When a deployment fails, we roll back by simply pointing traffic to the previous Docker image tag. No complex rollback scripts, no database restoration procedures (well, usually). Just change a tag and redeploy.

The Production Architecture That Actually Works

Let me show you the architecture we built after trying three different approaches that failed. This setup handles our current scale and has room to grow to 10M+ users.

Infrastructure overview:

We run three Jenkins controller nodes behind a load balancer for high availability. Each controller has 8 CPU cores and 32GB RAM. Initially, I tried running a single controller with 4GB RAM, and it would crash during peak deployment hours when 5+ pipelines ran simultaneously.

We have 12 Jenkins agent nodes that actually execute builds. These are beefy machines—16 cores, 64GB RAM each—because Docker builds are CPU and memory intensive. Each agent can run 4 concurrent builds maximum. I learned this limit the hard way when running 8 concurrent builds caused the agents to swap to disk and builds took 40+ minutes instead of 8.

Our Docker registry runs on a separate server cluster using Harbor. We initially used Docker Hub, but pulling images during deployments became a bottleneck. Harbor gives us a local registry with image scanning, replication, and we control the network bandwidth.

Network topology:

Jenkins controllers sit in a private subnet with no internet access. They communicate with agents through a VPN tunnel. This was a security team requirement after we had a close call with a compromised Jenkins plugin that tried to exfiltrate environment variables.

Agents have limited internet access through a NAT gateway for pulling dependencies (npm packages, pip packages, etc.). All outbound traffic goes through our corporate proxy with SSL inspection.

The Docker registry is accessible from the agent subnet and the Kubernetes cluster subnet where our applications run. We use mutual TLS authentication between all components.

Storage architecture:

This is where I made expensive mistakes initially. Jenkins stores artifacts, logs, and workspace data. Our first controller filled a 500GB disk in two months. Now we use:

1TB NVMe SSD for Jenkins home directory (fast random I/O for job configs)
4TB SSD for workspace data (builds generate lots of temporary files)
S3-compatible object storage for long-term artifact retention (anything older than 30 days)

We mount the workspace storage on agents using NFS. Initially, I had each agent use local storage, but this caused problems when builds failed and needed debugging—the workspace data was scattered across 12 machines.

The Jenkinsfile structure that scales:

Here's the pipeline structure we use for all our services. This took months to refine:

@Library('shared-pipeline-library@v2.1.0') _

pipeline {
    agent {
        kubernetes {
            yaml """
apiVersion: v1
kind: Pod
metadata:
  labels:
    jenkins: agent
spec:
  containers:
  - name: docker
    image: docker:24.0.5-dind
    securityContext:
      privileged: true
    volumeMounts:
    - name: docker-sock
      mountPath: /var/run
  - name: kubectl
    image: bitnami/kubectl:1.28
    command: ['cat']
    tty: true
  volumes:
  - name: docker-sock
    emptyDir: {}
"""
        }
    }
    
    options {
        buildDiscarder(logRotator(numToKeepStr: '30', artifactNumToKeepStr: '10'))
        disableConcurrentBuilds()
        timeout(time: 20, unit: 'MINUTES')
        timestamps()
    }
    
    environment {
        DOCKER_REGISTRY = 'harbor.company.internal'
        DOCKER_REPO = "${DOCKER_REGISTRY}/production"
        IMAGE_NAME = "${DOCKER_REPO}/${env.JOB_NAME}"
        BUILD_VERSION = "${env.BUILD_NUMBER}-${env.GIT_COMMIT.take(7)}"
        SLACK_CHANNEL = '#deployments'
    }
    
    stages {
        stage('Checkout') {
            steps {
                checkout scm
                script {
                    env.GIT_COMMIT_MSG = sh(
                        returnStdout: true,
                        script: 'git log -1 --pretty=%B'
                    ).trim()
                }
            }
        }
        
        stage('Build') {
            steps {
                container('docker') {
                    script {
                        dockerImage = docker.build(
                            "${IMAGE_NAME}:${BUILD_VERSION}",
                            "--build-arg BUILD_DATE=\$(date -u +'%Y-%m-%dT%H:%M:%SZ') " +
                            "--build-arg VCS_REF=${env.GIT_COMMIT} " +
                            "--build-arg VERSION=${BUILD_VERSION} " +
                            "--cache-from ${IMAGE_NAME}:latest " +
                            "--network=host ."
                        )
                    }
                }
            }
        }
        
        stage('Test') {
            parallel {
                stage('Unit Tests') {
                    steps {
                        container('docker') {
                            sh """
                                docker run --rm \
                                    -e CI=true \
                                    ${IMAGE_NAME}:${BUILD_VERSION} \
                                    npm run test:unit -- --coverage --maxWorkers=4
                            """
                        }
                    }
                }
                
                stage('Integration Tests') {
                    steps {
                        container('docker') {
                            sh """
                                docker run --rm \
                                    -e DATABASE_URL=postgresql://test:test@postgres:5432/testdb \
                                    -e REDIS_URL=redis://redis:6379/0 \
                                    --network test-network \
                                    ${IMAGE_NAME}:${BUILD_VERSION} \
                                    npm run test:integration
                            """
                        }
                    }
                }
                
                stage('Security Scan') {
                    steps {
                        container('docker') {
                            sh """
                                trivy image \
                                    --severity HIGH,CRITICAL \
                                    --exit-code 1 \
                                    --no-progress \
                                    ${IMAGE_NAME}:${BUILD_VERSION}
                            """
                        }
                    }
                }
            }
        }
        
        stage('Push') {
            when {
                branch 'main'
            }
            steps {
                container('docker') {
                    script {
                        docker.withRegistry("https://${DOCKER_REGISTRY}", 'harbor-credentials') {
                            dockerImage.push("${BUILD_VERSION}")
                            dockerImage.push("latest")
                        }
                    }
                }
            }
        }
        
        stage('Deploy to Staging') {
            when {
                branch 'main'
            }
            steps {
                container('kubectl') {
                    sh """
                        kubectl set image deployment/${env.JOB_NAME} \
                            ${env.JOB_NAME}=${IMAGE_NAME}:${BUILD_VERSION} \
                            -n staging
                        
                        kubectl rollout status deployment/${env.JOB_NAME} \
                            -n staging \
                            --timeout=5m
                    """
                }
            }
        }
        
        stage('Smoke Tests') {
            when {
                branch 'main'
            }
            steps {
                script {
                    def response = sh(
                        returnStdout: true,
                        script: """
                            curl -f -s -o /dev/null -w '%{http_code}' \
                                https://staging.company.internal/health
                        """
                    ).trim()
                    
                    if (response != '200') {
                        error("Smoke test failed: received ${response}")
                    }
                }
            }
        }
        
        stage('Deploy to Production') {
            when {
                branch 'main'
            }
            steps {
                input message: 'Deploy to production?', ok: 'Deploy'
                
                container('kubectl') {
                    sh """
                        kubectl set image deployment/${env.JOB_NAME} \
                            ${env.JOB_NAME}=${IMAGE_NAME}:${BUILD_VERSION} \
                            -n production
                        
                        kubectl rollout status deployment/${env.JOB_NAME} \
                            -n production \
                            --timeout=10m
                    """
                }
            }
        }
    }
    
    post {
        success {
            slackSend(
                channel: "${SLACK_CHANNEL}",
                color: 'good',
                message: "✅ ${env.

Unlock Premium Content

You've read 30% of this article

What's in the full article

Complete step-by-step implementation guide
Working code examples you can copy-paste
Advanced techniques and pro tips
Common mistakes to avoid
Real-world examples and metrics

Don't have an account? Start your free trial

Join 10,000+ developers who love our premium content

Articles

Tutorials

Bloggers

Complete Guide to CI/CD Pipelines with Jenkins and Docker: From Setup to Production

Listen to Article

Why We Chose Jenkins and Docker (And What We Almost Picked Instead)

The Production Architecture That Actually Works

Unlock Premium Content

What's in the full article

Never Miss an Article

Comments (0)

Related Articles

Complete Solution: Building a Secure E-commerce Website with Stripe and Laravel

Automation Scripts for Deploying a Full-Stack Application: What We Learned Scaling to Production

Database Optimization Techniques for High-Traffic Websites

Articles

Tutorials

Bloggers

Complete Guide to CI/CD Pipelines with Jenkins and Docker: From Setup to Production

Listen to Article

Why We Chose Jenkins and Docker (And What We Almost Picked Instead)

The Production Architecture That Actually Works

Unlock Premium Content

What's in the full article

Never Miss an Article

Comments (0)

Related Articles

Complete Solution: Building a Secure E-commerce Website with Stripe and Laravel

Automation Scripts for Deploying a Full-Stack Application: What We Learned Scaling to Production

Database Optimization Techniques for High-Traffic Websites

Cookie & Ad Consent