Learn Orchestration Patterns

FROM, RUN, COPY, ENTRYPOINT, and CMD instructions for building container images. You'll hit this when your container starts but runs the wrong command or ignores signals.

FROM node:20-alpine

# Add application files
ADD . /app
WORKDIR /app

RUN npm install
CMD ["node", "server.js"]

FROM node:20-alpine

# Add application files
ADD . /app
WORKDIR /app

RUN npm install
CMD ["node", "server.js"]

FROM node:20-alpine

# Copy application files
COPY . /app
WORKDIR /app

RUN npm install
CMD ["node", "server.js"]

FROM node:20-alpine

# Copy application files
COPY . /app
WORKDIR /app

RUN npm install
CMD ["node", "server.js"]

Image Optimization

Multi-stage builds, layer caching, .dockerignore, and base image selection. You'll hit this when your image is 2 GB, builds take 10 minutes, or a small code change invalidates every layer.

FROM node:latest

WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]

FROM node:latest

WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]

FROM node:20.11-alpine

WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]

FROM node:20.11-alpine

WORKDIR /app
COPY . .
RUN npm ci
CMD ["node", "server.js"]

Docker Compose

Service definitions, depends_on, profiles, and compose file structure. You'll hit this when your local dev stack has five services and you need them to start in the right order.

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret

  app:
    build: .
    depends_on:
      - db

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret

  app:
    build: .
    depends_on:
      - db

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 5

  app:
    build: .
    depends_on:
      db:
        condition: service_healthy

services:
  db:
    image: postgres:16
    environment:
      POSTGRES_PASSWORD: secret
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 5s
      timeout: 3s
      retries: 5

  app:
    build: .
    depends_on:
      db:
        condition: service_healthy

Volumes & Storage

Named volumes, bind mounts, tmpfs, and volume drivers. You'll hit this when container restarts lose your database data or your local file edits don't appear inside the container.

services:
  db:
    image: postgres:16
    volumes:
      # Anonymous volume
      - /var/lib/postgresql/data

services:
  db:
    image: postgres:16
    volumes:
      # Anonymous volume
      - /var/lib/postgresql/data

services:
  db:
    image: postgres:16
    volumes:
      # Named volume
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

services:
  db:
    image: postgres:16
    volumes:
      # Named volume
      - pgdata:/var/lib/postgresql/data

volumes:
  pgdata:

Networking

Bridge networks, overlay networks, port mapping, and DNS resolution. You'll hit this when containers can't reach each other or your app is exposed on the wrong port.

services:
  app:
    build: .
    environment:
      # Hardcoded IP, will break
      DB_HOST: "172.18.0.3"

  db:
    image: postgres:16

services:
  app:
    build: .
    environment:
      # Hardcoded IP, will break
      DB_HOST: "172.18.0.3"

  db:
    image: postgres:16

services:
  app:
    build: .
    environment:
      # Docker DNS resolves service names
      DB_HOST: "db"

  db:
    image: postgres:16

services:
  app:
    build: .
    environment:
      # Docker DNS resolves service names
      DB_HOST: "db"

  db:
    image: postgres:16

Health Checks

HEALTHCHECK in Dockerfiles, readiness and liveness probes in Kubernetes. You'll hit this when your orchestrator routes traffic to a container that hasn't finished starting up.

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

# No health check defined
CMD ["node", "server.js"]

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

# No health check defined
CMD ["node", "server.js"]

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

HEALTHCHECK --interval=30s --timeout=3s \
  --start-period=10s --retries=3 \
  CMD wget --spider -q http://localhost:3000/health

CMD ["node", "server.js"]

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

HEALTHCHECK --interval=30s --timeout=3s \
  --start-period=10s --retries=3 \
  CMD wget --spider -q http://localhost:3000/health

CMD ["node", "server.js"]

Security

Running as non-root, read-only filesystems, secrets management, and image scanning. You'll hit this when a security audit flags your containers for running as root with write access everywhere.

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

# Runs as root by default
CMD ["node", "server.js"]

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

# Runs as root by default
CMD ["node", "server.js"]

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

# Create and switch to non-root user
RUN addgroup -S appgroup && \
    adduser -S appuser -G appgroup
USER appuser

CMD ["node", "server.js"]

FROM node:20-alpine
WORKDIR /app
COPY . .
RUN npm ci

# Create and switch to non-root user
RUN addgroup -S appgroup && \
    adduser -S appuser -G appgroup
USER appuser

CMD ["node", "server.js"]

Environment & Config

Environment variables, .env files, ConfigMaps, and runtime configuration. You'll hit this when you hardcode a database URL and it breaks in staging because the host is different.

services:
  app:
    build: .
    environment:
      NODE_ENV: production
      DB_HOST: db
      DB_PORT: "5432"
      DB_USER: admin
      DB_PASSWORD: changeme
      DB_NAME: myapp
      REDIS_URL: redis://cache:6379

services:
  app:
    build: .
    environment:
      NODE_ENV: production
      DB_HOST: db
      DB_PORT: "5432"
      DB_USER: admin
      DB_PASSWORD: changeme
      DB_NAME: myapp
      REDIS_URL: redis://cache:6379

services:
  app:
    build: .
    env_file:
      - .env
    environment:
      # Only overrides go here
      NODE_ENV: production

services:
  app:
    build: .
    env_file:
      - .env
    environment:
      # Only overrides go here
      NODE_ENV: production

Pods & Deployments

Pod specs, Deployments, ReplicaSets, and rolling updates. You'll hit this when you need zero-downtime deploys or your pods keep crashing without clear reasons.

apiVersion: v1
kind: Pod
metadata:
  name: web
spec:
  containers:
    - name: web
      image: myapp:1.0
      ports:
        - containerPort: 8080

apiVersion: v1
kind: Pod
metadata:
  name: web
spec:
  containers:
    - name: web
      image: myapp:1.0
      ports:
        - containerPort: 8080

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: myapp:1.0
          ports:
            - containerPort: 8080

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web
  template:
    metadata:
      labels:
        app: web
    spec:
      containers:
        - name: web
          image: myapp:1.0
          ports:
            - containerPort: 8080

Services & Ingress

ClusterIP, NodePort, LoadBalancer, Ingress resources, and service mesh basics. You'll hit this when external traffic can't reach your pods or internal services can't find each other.

apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: NodePort
  selector:
    app: backend
  ports:
    - port: 8080
      targetPort: 8080
      # Exposed on every node
      nodePort: 30080

apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: NodePort
  selector:
    app: backend
  ports:
    - port: 8080
      targetPort: 8080
      # Exposed on every node
      nodePort: 30080

apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
    - port: 8080
      targetPort: 8080

apiVersion: v1
kind: Service
metadata:
  name: backend
spec:
  type: ClusterIP
  selector:
    app: backend
  ports:
    - port: 8080
      targetPort: 8080

ConfigMaps & Secrets

ConfigMaps, Secrets, resource requests and limits, and pod scheduling. You'll hit this when your pod gets OOMKilled, can't read its config, or lands on the wrong node.

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
    - name: app
      image: myapp:1.0
      env:
        - name: LOG_LEVEL
          value: "info"
        - name: MAX_RETRIES
          value: "3"
        - name: CACHE_TTL
          value: "300"

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
    - name: app
      image: myapp:1.0
      env:
        - name: LOG_LEVEL
          value: "info"
        - name: MAX_RETRIES
          value: "3"
        - name: CACHE_TTL
          value: "300"

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  LOG_LEVEL: "info"
  MAX_RETRIES: "3"
  CACHE_TTL: "300"
---
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
    - name: app
      image: myapp:1.0
      envFrom:
        - configMapRef:
            name: app-config

apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
data:
  LOG_LEVEL: "info"
  MAX_RETRIES: "3"
  CACHE_TTL: "300"
---
apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  containers:
    - name: app
      image: myapp:1.0
      envFrom:
        - configMapRef:
            name: app-config

Helm Charts

Chart structure, values.yaml, templates, helpers, and chart dependencies. You'll hit this when you copy-paste Kubernetes manifests across environments instead of parameterizing them.

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: web
          image: myapp:1.2.3
          resources:
            limits:
              memory: 256Mi

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: web
          image: myapp:1.2.3
          resources:
            limits:
              memory: 256Mi

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-web
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
        - name: web
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ .Release.Name }}-web
spec:
  replicas: {{ .Values.replicaCount }}
  template:
    spec:
      containers:
        - name: web
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag }}"
          resources:
            {{- toYaml .Values.resources | nindent 12 }}

Docker Swarm

Swarm mode, service definitions, stacks, replicas, and rolling updates. You'll hit this when you need simple container orchestration without the complexity of Kubernetes.

# Running containers directly
docker run -d --name web-1 myapp:1.0
docker run -d --name web-2 myapp:1.0
docker run -d --name web-3 myapp:1.0

# Manual management needed
# No auto-restart on failure
# No load balancing

# Running containers directly
docker run -d --name web-1 myapp:1.0
docker run -d --name web-2 myapp:1.0
docker run -d --name web-3 myapp:1.0

# Manual management needed
# No auto-restart on failure
# No load balancing

# Initialize swarm (once)
docker swarm init

# Create a service with replicas
docker service create \
  --name web \
  --replicas 3 \
  --publish 80:8080 \
  myapp:1.0

# Swarm handles scheduling,
# load balancing, and restarts

# Initialize swarm (once)
docker swarm init

# Create a service with replicas
docker service create \
  --name web \
  --replicas 3 \
  --publish 80:8080 \
  myapp:1.0

# Swarm handles scheduling,
# load balancing, and restarts

CI/CD Pipelines

Building images in CI, layer caching in pipelines, multi-platform builds, and registry tagging. You'll hit this when your CI pipeline rebuilds everything from scratch on every commit.

# GitHub Actions
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker build -t myapp:latest .
      # Builds from scratch every time
      # No layer caching

# GitHub Actions
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker build -t myapp:latest .
      # Builds from scratch every time
      # No layer caching

# GitHub Actions
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/build-push-action@v6
        with:
          context: .
          push: false
          tags: myapp:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

# GitHub Actions
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: docker/setup-buildx-action@v3
      - uses: docker/build-push-action@v6
        with:
          context: .
          push: false
          tags: myapp:latest
          cache-from: type=gha
          cache-to: type=gha,mode=max

Build Scripts

Makefiles, Ant build files, Gradle tasks, and shell scripts for container workflows. You'll hit this when your team needs a single command to build, test, and deploy containers.

#!/bin/bash
# build.sh - hard to discover, no help

docker build -t myapp:latest .
docker run --rm -p 3000:3000 myapp:latest

# Different scripts for different tasks
# No dependency tracking
# No tab completion

#!/bin/bash
# build.sh - hard to discover, no help

docker build -t myapp:latest .
docker run --rm -p 3000:3000 myapp:latest

# Different scripts for different tasks
# No dependency tracking
# No tab completion

# Makefile - self-documenting
.PHONY: build run test clean

build: ## Build the container image
	docker build -t myapp:latest .

run: build ## Run the app locally
	docker run --rm -p 3000:3000 myapp:latest

test: build ## Run tests in container
	docker run --rm myapp:latest npm test

clean: ## Remove images and volumes
	docker compose down -v
	docker rmi myapp:latest

help: ## Show available targets
	@grep -E '^[a-zA-Z_-]+:.*?##' $(MAKEFILE_LIST) | sort | \
	  awk 'BEGIN {FS = ":.*?## "}; {printf "  \033[36m%-15s\033[0m %s\n", $$1, $$2}'

# Makefile - self-documenting
.PHONY: build run test clean

build: ## Build the container image
	docker build -t myapp:latest .

run: build ## Run the app locally
	docker run --rm -p 3000:3000 myapp:latest

test: build ## Run tests in container
	docker run --rm myapp:latest npm test

clean: ## Remove images and volumes
	docker compose down -v
	docker rmi myapp:latest

help: ## Show available targets
	@grep -E '^[a-zA-Z_-]+:.*?##' $(MAKEFILE_LIST) | sort | \
	  awk 'BEGIN {FS = ":.*?## "}; {printf "  \033[36m%-15s\033[0m %s\n", $$1, $$2}'

Common Mistakes

Misusing latest tags, ignoring .dockerignore, running as root, and other orchestration anti-patterns. You'll hit this when a deploy fails in production but works on your machine.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  template:
    spec:
      containers:
        - name: web
          image: myapp:latest
          imagePullPolicy: Always

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web
spec:
  template:
    spec:
      containers:
        - name: web
          image: myapp:latest
          imagePullPolicy: Always