Learn Orchestration Patterns
68 patterns across 16 categories. Each one shows the convention, a side-by-side example, and why it matters.
Start here
New to container orchestration? Follow these five categories in order.
Dockerfile Basics
FROM, RUN, COPY, ENTRYPOINT, and CMD instructions for building container images. You'll hit this when your container starts but runs the wrong command or ignores signals.
FROM node:20-alpine
# Add application files
ADD . /app
WORKDIR /app
RUN npm install
CMD ["node", "server.js"]FROM node:20-alpine
# Add application files
ADD . /app
WORKDIR /app
RUN npm install
CMD ["node", "server.js"]FROM node:20-alpine
# Copy application files
COPY . /app
WORKDIR /app
RUN npm install
CMD ["node", "server.js"]FROM node:20-alpine
# Copy application files
COPY . /app
WORKDIR /app
RUN npm install
CMD ["node", "server.js"]Image Optimization
Multi-stage builds, layer caching, .dockerignore, and base image selection. You'll hit this when your image is 2 GB, builds take 10 minutes, or a small code change invalidates every layer.
FROM node:latest
WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]FROM node:latest
WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]FROM node:20.11-alpine
WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]FROM node:20.11-alpine
WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]Docker Compose
Service definitions, depends_on, profiles, and compose file structure. You'll hit this when your local dev stack has five services and you need them to start in the right order.
services:
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
app:
build: .
depends_on:
- dbservices:
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
app:
build: .
depends_on:
- dbservices:
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
app:
build: .
depends_on:
db:
condition: service_healthyservices:
db:
image: postgres:16
environment:
POSTGRES_PASSWORD: secret
healthcheck:
test: ["CMD-SHELL", "pg_isready -U postgres"]
interval: 5s
timeout: 3s
retries: 5
app:
build: .
depends_on:
db:
condition: service_healthyVolumes & Storage
Named volumes, bind mounts, tmpfs, and volume drivers. You'll hit this when container restarts lose your database data or your local file edits don't appear inside the container.
services:
db:
image: postgres:16
volumes:
# Anonymous volume
- /var/lib/postgresql/dataservices:
db:
image: postgres:16
volumes:
# Anonymous volume
- /var/lib/postgresql/dataservices:
db:
image: postgres:16
volumes:
# Named volume
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:services:
db:
image: postgres:16
volumes:
# Named volume
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:Networking
Bridge networks, overlay networks, port mapping, and DNS resolution. You'll hit this when containers can't reach each other or your app is exposed on the wrong port.
services:
app:
build: .
environment:
# Hardcoded IP, will break
DB_HOST: "172.18.0.3"
db:
image: postgres:16services:
app:
build: .
environment:
# Hardcoded IP, will break
DB_HOST: "172.18.0.3"
db:
image: postgres:16services:
app:
build: .
environment:
# Docker DNS resolves service names
DB_HOST: "db"
db:
image: postgres:16services:
app:
build: .
environment:
# Docker DNS resolves service names
DB_HOST: "db"
db:
image: postgres:16Health Checks
HEALTHCHECK in Dockerfiles, readiness and liveness probes in Kubernetes. You'll hit this when your orchestrator routes traffic to a container that hasn't finished starting up.
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
# No health check defined
CMD ["node", "server.js"]FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
# No health check defined
CMD ["node", "server.js"]FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
HEALTHCHECK --interval=30s --timeout=3s \
--start-period=10s --retries=3 \
CMD wget --spider -q http://localhost:3000/health
CMD ["node", "server.js"]FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
HEALTHCHECK --interval=30s --timeout=3s \
--start-period=10s --retries=3 \
CMD wget --spider -q http://localhost:3000/health
CMD ["node", "server.js"]Security
Running as non-root, read-only filesystems, secrets management, and image scanning. You'll hit this when a security audit flags your containers for running as root with write access everywhere.
FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
# Runs as root by default
CMD ["node", "server.js"]FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
# Runs as root by default
CMD ["node", "server.js"]FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
# Create and switch to non-root user
RUN addgroup -S appgroup && \
adduser -S appuser -G appgroup
USER appuser
CMD ["node", "server.js"]FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci
# Create and switch to non-root user
RUN addgroup -S appgroup && \
adduser -S appuser -G appgroup
USER appuser
CMD ["node", "server.js"]Environment & Config
Environment variables, .env files, ConfigMaps, and runtime configuration. You'll hit this when you hardcode a database URL and it breaks in staging because the host is different.
services:
app:
build: .
environment:
NODE_ENV: production
DB_HOST: db
DB_PORT: "5432"
DB_USER: admin
DB_PASSWORD: changeme
DB_NAME: myapp
REDIS_URL: redis://cache:6379services:
app:
build: .
environment:
NODE_ENV: production
DB_HOST: db
DB_PORT: "5432"
DB_USER: admin
DB_PASSWORD: changeme
DB_NAME: myapp
REDIS_URL: redis://cache:6379services:
app:
build: .
env_file:
- .env
environment:
# Only overrides go here
NODE_ENV: productionservices:
app:
build: .
env_file:
- .env
environment:
# Only overrides go here
NODE_ENV: productionPods & Deployments
Pod specs, Deployments, ReplicaSets, and rolling updates. You'll hit this when you need zero-downtime deploys or your pods keep crashing without clear reasons.
apiVersion: v1
kind: Pod
metadata:
name: web
spec:
containers:
- name: web
image: myapp:1.0
ports:
- containerPort: 8080apiVersion: v1
kind: Pod
metadata:
name: web
spec:
containers:
- name: web
image: myapp:1.0
ports:
- containerPort: 8080apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: myapp:1.0
ports:
- containerPort: 8080apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- name: web
image: myapp:1.0
ports:
- containerPort: 8080Services & Ingress
ClusterIP, NodePort, LoadBalancer, Ingress resources, and service mesh basics. You'll hit this when external traffic can't reach your pods or internal services can't find each other.
apiVersion: v1
kind: Service
metadata:
name: backend
spec:
type: NodePort
selector:
app: backend
ports:
- port: 8080
targetPort: 8080
# Exposed on every node
nodePort: 30080apiVersion: v1
kind: Service
metadata:
name: backend
spec:
type: NodePort
selector:
app: backend
ports:
- port: 8080
targetPort: 8080
# Exposed on every node
nodePort: 30080apiVersion: v1
kind: Service
metadata:
name: backend
spec:
type: ClusterIP
selector:
app: backend
ports:
- port: 8080
targetPort: 8080apiVersion: v1
kind: Service
metadata:
name: backend
spec:
type: ClusterIP
selector:
app: backend
ports:
- port: 8080
targetPort: 8080ConfigMaps & Secrets
ConfigMaps, Secrets, resource requests and limits, and pod scheduling. You'll hit this when your pod gets OOMKilled, can't read its config, or lands on the wrong node.
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp:1.0
env:
- name: LOG_LEVEL
value: "info"
- name: MAX_RETRIES
value: "3"
- name: CACHE_TTL
value: "300"apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp:1.0
env:
- name: LOG_LEVEL
value: "info"
- name: MAX_RETRIES
value: "3"
- name: CACHE_TTL
value: "300"apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
LOG_LEVEL: "info"
MAX_RETRIES: "3"
CACHE_TTL: "300"
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp:1.0
envFrom:
- configMapRef:
name: app-configapiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
LOG_LEVEL: "info"
MAX_RETRIES: "3"
CACHE_TTL: "300"
---
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
containers:
- name: app
image: myapp:1.0
envFrom:
- configMapRef:
name: app-configHelm Charts
Chart structure, values.yaml, templates, helpers, and chart dependencies. You'll hit this when you copy-paste Kubernetes manifests across environments instead of parameterizing them.
# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
template:
spec:
containers:
- name: web
image: myapp:1.2.3
resources:
limits:
memory: 256Mi# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
replicas: 3
template:
spec:
containers:
- name: web
image: myapp:1.2.3
resources:
limits:
memory: 256Mi# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-web
spec:
replicas: {{ .Values.replicaCount }}
template:
spec:
containers:
- name: web
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
resources:
{{- toYaml .Values.resources | nindent 12 }}# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ .Release.Name }}-web
spec:
replicas: {{ .Values.replicaCount }}
template:
spec:
containers:
- name: web
image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
resources:
{{- toYaml .Values.resources | nindent 12 }}Docker Swarm
Swarm mode, service definitions, stacks, replicas, and rolling updates. You'll hit this when you need simple container orchestration without the complexity of Kubernetes.
# Running containers directly
docker run -d --name web-1 myapp:1.0
docker run -d --name web-2 myapp:1.0
docker run -d --name web-3 myapp:1.0
# Manual management needed
# No auto-restart on failure
# No load balancing# Running containers directly
docker run -d --name web-1 myapp:1.0
docker run -d --name web-2 myapp:1.0
docker run -d --name web-3 myapp:1.0
# Manual management needed
# No auto-restart on failure
# No load balancing# Initialize swarm (once)
docker swarm init
# Create a service with replicas
docker service create \
--name web \
--replicas 3 \
--publish 80:8080 \
myapp:1.0
# Swarm handles scheduling,
# load balancing, and restarts# Initialize swarm (once)
docker swarm init
# Create a service with replicas
docker service create \
--name web \
--replicas 3 \
--publish 80:8080 \
myapp:1.0
# Swarm handles scheduling,
# load balancing, and restartsCI/CD Pipelines
Building images in CI, layer caching in pipelines, multi-platform builds, and registry tagging. You'll hit this when your CI pipeline rebuilds everything from scratch on every commit.
# GitHub Actions
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: docker build -t myapp:latest .
# Builds from scratch every time
# No layer caching# GitHub Actions
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: docker build -t myapp:latest .
# Builds from scratch every time
# No layer caching# GitHub Actions
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v6
with:
context: .
push: false
tags: myapp:latest
cache-from: type=gha
cache-to: type=gha,mode=max# GitHub Actions
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/build-push-action@v6
with:
context: .
push: false
tags: myapp:latest
cache-from: type=gha
cache-to: type=gha,mode=maxBuild Scripts
Makefiles, Ant build files, Gradle tasks, and shell scripts for container workflows. You'll hit this when your team needs a single command to build, test, and deploy containers.
#!/bin/bash
# build.sh - hard to discover, no help
docker build -t myapp:latest .
docker run --rm -p 3000:3000 myapp:latest
# Different scripts for different tasks
# No dependency tracking
# No tab completion#!/bin/bash
# build.sh - hard to discover, no help
docker build -t myapp:latest .
docker run --rm -p 3000:3000 myapp:latest
# Different scripts for different tasks
# No dependency tracking
# No tab completion# Makefile - self-documenting
.PHONY: build run test clean
build: ## Build the container image
docker build -t myapp:latest .
run: build ## Run the app locally
docker run --rm -p 3000:3000 myapp:latest
test: build ## Run tests in container
docker run --rm myapp:latest npm test
clean: ## Remove images and volumes
docker compose down -v
docker rmi myapp:latest
help: ## Show available targets
@grep -E '^[a-zA-Z_-]+:.*?##' $(MAKEFILE_LIST) | sort | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'# Makefile - self-documenting
.PHONY: build run test clean
build: ## Build the container image
docker build -t myapp:latest .
run: build ## Run the app locally
docker run --rm -p 3000:3000 myapp:latest
test: build ## Run tests in container
docker run --rm myapp:latest npm test
clean: ## Remove images and volumes
docker compose down -v
docker rmi myapp:latest
help: ## Show available targets
@grep -E '^[a-zA-Z_-]+:.*?##' $(MAKEFILE_LIST) | sort | \
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-15s\033[0m %s\n", $$1, $$2}'Common Mistakes
Misusing latest tags, ignoring .dockerignore, running as root, and other orchestration anti-patterns. You'll hit this when a deploy fails in production but works on your machine.
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
template:
spec:
containers:
- name: web
image: myapp:latest
imagePullPolicy: AlwaysapiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
template:
spec:
containers:
- name: web
image: myapp:latest
imagePullPolicy: AlwaysapiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
template:
spec:
containers:
- name: web
image: myapp:1.2.3
# Or use SHA:
# image: myapp@sha256:abc123...apiVersion: apps/v1
kind: Deployment
metadata:
name: web
spec:
template:
spec:
containers:
- name: web
image: myapp:1.2.3
# Or use SHA:
# image: myapp@sha256:abc123...