NextGenBeing Founder
Listen to Article
Loading...Last year, our team was deploying code manually to production. Every Friday at 5 PM (terrible idea, I know), someone would SSH into servers, pull the latest code, restart services, and pray nothing broke. When we scaled from 50k to 2M daily active users, this approach became completely unsustainable. We lost an entire weekend debugging a deployment that went sideways because someone forgot to run database migrations.
That's when I spent three months building out our CI/CD infrastructure with Jenkins and Docker. We went from manual deployments taking 45 minutes (with a 30% failure rate) to automated pipelines completing in 8 minutes with a 98% success rate. We now handle 500+ deployments per month across 12 microservices.
This isn't a beginner's tutorial on installing Jenkins. I'm assuming you understand Docker basics, know what CI/CD means, and you're here because you need to build something that actually works in production. I'm going to share everything we learned the hard way—the configurations that broke at 3 AM, the security holes we discovered, the performance optimizations that cut our build times in half, and the architectural decisions I'd make differently if I started over today.
Why We Chose Jenkins and Docker (And What We Almost Picked Instead)
Before committing to Jenkins, I evaluated GitLab CI, CircleCI, and GitHub Actions for two weeks. Here's why we landed on Jenkins, and I'll be honest about where it falls short.
Jenkins won because:
We needed complete control over our build environment. Our application has dependencies on legacy C++ libraries that require specific compiler versions. CircleCI's Docker images didn't support our exact setup, and customizing them meant waiting for support tickets. With Jenkins, I could build exactly the environment we needed.
We already had on-premise servers with spare capacity. The CFO wasn't excited about adding another $500/month SaaS subscription when we had hardware sitting idle. Jenkins runs on our existing infrastructure for essentially zero additional cost.
Our security team required that source code never leave our network. This ruled out cloud-based CI/CD solutions immediately. Jenkins runs entirely within our VPC.
Where Jenkins hurts:
The UI looks like it's from 2010 (because it basically is). Our junior developers initially hated it compared to the sleek interfaces of modern CI/CD tools. I've spent hours explaining Jenkinsfile syntax that would be intuitive in GitLab CI.
Plugin management is a nightmare. We have 47 plugins installed, and I dread updating them because something always breaks. Last month, updating the Docker plugin broke our entire pipeline for 6 hours until I rolled back.
Initial setup took me three full days. Modern alternatives like GitHub Actions get you running in 20 minutes. But that upfront investment paid off in flexibility.
Docker's role in our architecture:
Docker solved our "works on my machine" problem permanently. Before Docker, we had developers running macOS, Linux, and Windows, with subtle environment differences causing bugs that only appeared in production. Now everyone—local development, CI pipeline, staging, and production—runs the exact same Docker images.
We also needed to support multiple runtime environments. We have services running Node.js 16, Node.js 18, Python 3.9, Python 3.11, and Go 1.21. Managing these dependencies on bare metal servers was chaos. Docker containers gave us perfect isolation.
The killer feature: immutable deployments. When a deployment fails, we roll back by simply pointing traffic to the previous Docker image tag. No complex rollback scripts, no database restoration procedures (well, usually). Just change a tag and redeploy.
The Production Architecture That Actually Works
Let me show you the architecture we built after trying three different approaches that failed. This setup handles our current scale and has room to grow to 10M+ users.
Infrastructure overview:
We run three Jenkins controller nodes behind a load balancer for high availability. Each controller has 8 CPU cores and 32GB RAM. Initially, I tried running a single controller with 4GB RAM, and it would crash during peak deployment hours when 5+ pipelines ran simultaneously.
We have 12 Jenkins agent nodes that actually execute builds. These are beefy machines—16 cores, 64GB RAM each—because Docker builds are CPU and memory intensive. Each agent can run 4 concurrent builds maximum. I learned this limit the hard way when running 8 concurrent builds caused the agents to swap to disk and builds took 40+ minutes instead of 8.
Our Docker registry runs on a separate server cluster using Harbor. We initially used Docker Hub, but pulling images during deployments became a bottleneck. Harbor gives us a local registry with image scanning, replication, and we control the network bandwidth.
Network topology:
Jenkins controllers sit in a private subnet with no internet access. They communicate with agents through a VPN tunnel. This was a security team requirement after we had a close call with a compromised Jenkins plugin that tried to exfiltrate environment variables.
Agents have limited internet access through a NAT gateway for pulling dependencies (npm packages, pip packages, etc.). All outbound traffic goes through our corporate proxy with SSL inspection.
The Docker registry is accessible from the agent subnet and the Kubernetes cluster subnet where our applications run. We use mutual TLS authentication between all components.
Storage architecture:
This is where I made expensive mistakes initially. Jenkins stores artifacts, logs, and workspace data. Our first controller filled a 500GB disk in two months. Now we use:
- 1TB NVMe SSD for Jenkins home directory (fast random I/O for job configs)
- 4TB SSD for workspace data (builds generate lots of temporary files)
- S3-compatible object storage for long-term artifact retention (anything older than 30 days)
We mount the workspace storage on agents using NFS. Initially, I had each agent use local storage, but this caused problems when builds failed and needed debugging—the workspace data was scattered across 12 machines.
The Jenkinsfile structure that scales:
Here's the pipeline structure we use for all our services. This took months to refine:
@Library('shared-pipeline-library@v2.1.0') _
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
metadata:
labels:
jenkins: agent
spec:
containers:
- name: docker
image: docker:24.0.5-dind
securityContext:
privileged: true
volumeMounts:
- name: docker-sock
mountPath: /var/run
- name: kubectl
image: bitnami/kubectl:1.28
command: ['cat']
tty: true
volumes:
- name: docker-sock
emptyDir: {}
"""
}
}
options {
buildDiscarder(logRotator(numToKeepStr: '30', artifactNumToKeepStr: '10'))
disableConcurrentBuilds()
timeout(time: 20, unit: 'MINUTES')
timestamps()
}
environment {
DOCKER_REGISTRY = 'harbor.company.internal'
DOCKER_REPO = "${DOCKER_REGISTRY}/production"
IMAGE_NAME = "${DOCKER_REPO}/${env.JOB_NAME}"
BUILD_VERSION = "${env.BUILD_NUMBER}-${env.GIT_COMMIT.take(7)}"
SLACK_CHANNEL = '#deployments'
}
stages {
stage('Checkout') {
steps {
checkout scm
script {
env.GIT_COMMIT_MSG = sh(
returnStdout: true,
script: 'git log -1 --pretty=%B'
).trim()
}
}
}
stage('Build') {
steps {
container('docker') {
script {
dockerImage = docker.build(
"${IMAGE_NAME}:${BUILD_VERSION}",
"--build-arg BUILD_DATE=\$(date -u +'%Y-%m-%dT%H:%M:%SZ') " +
"--build-arg VCS_REF=${env.GIT_COMMIT} " +
"--build-arg VERSION=${BUILD_VERSION} " +
"--cache-from ${IMAGE_NAME}:latest " +
"--network=host ."
)
}
}
}
}
stage('Test') {
parallel {
stage('Unit Tests') {
steps {
container('docker') {
sh """
docker run --rm \
-e CI=true \
${IMAGE_NAME}:${BUILD_VERSION} \
npm run test:unit -- --coverage --maxWorkers=4
"""
}
}
}
stage('Integration Tests') {
steps {
container('docker') {
sh """
docker run --rm \
-e DATABASE_URL=postgresql://test:test@postgres:5432/testdb \
-e REDIS_URL=redis://redis:6379/0 \
--network test-network \
${IMAGE_NAME}:${BUILD_VERSION} \
npm run test:integration
"""
}
}
}
stage('Security Scan') {
steps {
container('docker') {
sh """
trivy image \
--severity HIGH,CRITICAL \
--exit-code 1 \
--no-progress \
${IMAGE_NAME}:${BUILD_VERSION}
"""
}
}
}
}
}
stage('Push') {
when {
branch 'main'
}
steps {
container('docker') {
script {
docker.withRegistry("https://${DOCKER_REGISTRY}", 'harbor-credentials') {
dockerImage.push("${BUILD_VERSION}")
dockerImage.push("latest")
}
}
}
}
}
stage('Deploy to Staging') {
when {
branch 'main'
}
steps {
container('kubectl') {
sh """
kubectl set image deployment/${env.JOB_NAME} \
${env.JOB_NAME}=${IMAGE_NAME}:${BUILD_VERSION} \
-n staging
kubectl rollout status deployment/${env.JOB_NAME} \
-n staging \
--timeout=5m
"""
}
}
}
stage('Smoke Tests') {
when {
branch 'main'
}
steps {
script {
def response = sh(
returnStdout: true,
script: """
curl -f -s -o /dev/null -w '%{http_code}' \
https://staging.company.internal/health
"""
).trim()
if (response != '200') {
error("Smoke test failed: received ${response}")
}
}
}
}
stage('Deploy to Production') {
when {
branch 'main'
}
steps {
input message: 'Deploy to production?', ok: 'Deploy'
container('kubectl') {
sh """
kubectl set image deployment/${env.JOB_NAME} \
${env.JOB_NAME}=${IMAGE_NAME}:${BUILD_VERSION} \
-n production
kubectl rollout status deployment/${env.JOB_NAME} \
-n production \
--timeout=10m
"""
}
}
}
}
post {
success {
slackSend(
channel: "${SLACK_CHANNEL}",
color: 'good',
message: "✅ ${env.
Unlock Premium Content
You've read 30% of this article
What's in the full article
- Complete step-by-step implementation guide
- Working code examples you can copy-paste
- Advanced techniques and pro tips
- Common mistakes to avoid
- Real-world examples and metrics
Don't have an account? Start your free trial
Join 10,000+ developers who love our premium content
Never Miss an Article
Get our best content delivered to your inbox weekly. No spam, unsubscribe anytime.
Comments (0)
Please log in to leave a comment.
Log In