· ci / docker / buildkit

Build cache optimization: what actually works in 2025

Your CI is slow because of wrong cache layer, mode, or key — not hardware. Systematic fixes for Docker BuildKit, GitHub Actions, and Turborepo remote cache.

By

2,810 words · 15 min read

Most teams leave significant CI time unredeemed. The builds finish, the tests pass, the pipeline is green — but a large fraction of what ran didn’t need to run. The caching is configured, but it’s caching at the wrong granularity, in the wrong mode, or with keys that invalidate too eagerly.

This piece covers the three layers where caching most often goes wrong — Docker/BuildKit, GitHub Actions cache, and Turborepo remote cache — and how to diagnose each one before you reach for faster hardware.

Who this is for

Senior developers and DevOps engineers who already have caching in place and are hitting edge cases: builds that should be cache hits but aren’t, cache poisoning concerns, or CI that got slower after an infrastructure upgrade. If you haven’t set up caching at all, the official docs for each tool are a better starting point than this.

What “cache hit” means per tool

The word “cache” covers three architecturally different things, and conflating them is the root of most debugging confusion.

Docker layer cache: keyed on the instruction content and the hash of the parent layer. A RUN instruction at layer 7 invalidates all layers after it. The cache is stored on disk or in a registry, depending on which backend you configure.

GitHub Actions cache action: keyed on a string you construct, looked up against a 10 GB per-repository store. On a miss, the action does nothing — your build runs cold. On a hit, it restores a .tar.gz archive to a path you specify, then saves that path at the end of the job if the primary key is new.

Turborepo task hash: keyed on the content hash of all inputs to a task — source files, env vars, turbo config. A hit means the task output is replayed from cache instead of the task running at all. The cache can live on local disk or on a remote store (Vercel Remote Cache or any S3-compatible endpoint).

These three caches don’t compose. Fixing Docker layer ordering doesn’t help Turborepo. A perfect Turborepo cache hit still re-runs Docker if a non-Docker cache key changed. Keep them separate in your mental model.

Docker/BuildKit: the modes that matter

BuildKit has six cache backends: inline, registry, local, gha, s3, and azblob. The s3 and azblob backends are unreleased. The gha backend is beta. For most pipelines, the practical choice is inline, registry, or gha.

min vs max mode — the most misunderstood setting

Every external cache backend supports a mode parameter: min or max.

  • min (default): exports only the layers of the final image. Intermediate stages in a multi-stage build are not cached.
  • max: exports all layers, including every intermediate stage.

inline is structurally limited to min mode. Inline cache stores the cache metadata in the image manifest itself. As the official docs note: “It doesn’t scale with multi-stage builds as well as the other drivers do.”

If your Dockerfile has more than one FROM stage and you’re using inline cache, you’re getting no cache benefit on intermediate stages. This is why switching from inline to registry cache often produces the largest CI speedup of any single change.

Inline vs registry vs gha

BackendMode supportCache locationBest for
inlinemin onlyImage manifestSingle-stage images, small teams
registrymin + maxSeparate OCI artifactMulti-stage builds, production use
ghamin + maxGitHub Actions cacheGitHub-only pipelines

Switching to registry mode with max:

- name: Build
  uses: docker/build-push-action@v6
  with:
    cache-from: type=registry,ref=ghcr.io/myorg/myimage:cache
    cache-to: type=registry,ref=ghcr.io/myorg/myimage:cache,mode=max

Note that both registry and gha backends require a non-default BuildKit driver. To use these backends, create a new builder using a different driver (e.g., docker-container or kubernetes). The default docker driver does not support them.

COPY/RUN layer ordering

Layer ordering is the most discussed Docker optimization and still the most commonly wrong in practice.

Every layer that changes invalidates all subsequent layers. The ordering rule:

  1. Files that change rarely (system packages, language runtime config)
  2. Dependency manifests (package.json, requirements.txt, go.mod)
  3. Dependency install (RUN npm ci, RUN pip install -r requirements.txt)
  4. Application source code
  5. Build step
FROM node:22-slim AS builder

WORKDIR /app

# Layer 2-3: dependency manifests + install (changes rarely)
COPY package.json package-lock.json ./
RUN npm ci --omit=dev

# Layer 4-5: source + build (changes on every commit)
COPY . .
RUN npm run build

FROM node:22-slim AS runner
WORKDIR /app
COPY --from=builder /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules

A COPY . . before npm ci means any source change — including a comment in a test file — busts the install layer. That single reorder is the highest-ROI Docker optimization on most projects.

February–March 2025 breaking change in gha cache

GitHub Actions Cache API v1 was sunset in phases: the legacy service ended on February 1, 2025, with the mandatory upgrade deadline on March 1, 2025. The gha backend now requires Cache API v2, which in turn requires:

  • Buildx ≥ v0.21.0
  • BuildKit ≥ v0.20.0

If you pinned a Buildx version before these dates and use the gha backend, your cache silently fell back to a cold build after March 1, 2025. The failure mode is subtle: the build succeeds, just without cache. Check your Buildx version:

docker buildx version

The gha backend also enforces branch scoping: cache entries are accessible from the current branch, the PR base branch, and the default branch only. Cross-branch cache pollution is impossible, but so is cross-team cache warming on long-lived feature branches.

GitHub Actions cache: key design

The actions/cache action’s primary key is the only string that produces an exact match and saves a new cache entry. If it changes on every run, you never get a hit. If it never changes, you get stale data.

The right key for Node.js dependencies:

- uses: actions/cache@v4
  with:
    path: ~/.npm
    key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-node-

The hashFiles('**/package-lock.json') expression produces a stable key across runs where the lockfile didn’t change. On a lockfile change, the key changes and a new cache entry is saved. The restore-keys fallback catches partial matches — a cache from yesterday’s lockfile is better than no cache at all.

Eviction policy

The GitHub Actions cache store is 10 GB per repository. When that limit is exceeded, GitHub evicts by last-access date, oldest first. Entries also expire after 7 days of no access.

Two practical consequences:

  1. Feature branches that run weekly get evicted frequently. Add ${{ github.ref }} to the key only if you can tolerate cold starts on branches that run less often than daily.
  2. Keep one restore-keys fallback to a shorter prefix. A fallback of ${{ runner.os }}-node- gives you a warm cache from any recent run, even if the exact lockfile hash differs.

Intentional cache invalidation

When you update a major dependency version, you want a fresh cache, not a partial restore that silently retains old packages. The pattern is to version-stamp your cache key:

key: ${{ runner.os }}-node-v2-${{ hashFiles('**/package-lock.json') }}

Bump v2 to v3 when you want to force a cold start. A date like 2025-04-01 works just as well and self-documents when the invalidation happened.

If you’re evaluating whether GitHub Actions is the right CI platform, GitHub Actions vs CircleCI — which CI wins in 2026? covers the full cost and feature trade-off.

Turborepo remote cache

Turborepo’s task hash is computed from the content of your inputs — by default, all files in the package not excluded by .gitignore, plus global dependencies and env vars. A hash match replays the cached output and skips the task.

Explicit inputs configuration

The default input set is conservative: any file change in a package busts its task cache. For most packages, many of those files don’t affect the build output — test fixtures, README updates, internal comments. Narrow the inputs:

{
  "tasks": {
    "build": {
      "dependsOn": ["^build"],
      "inputs": ["src/**/*.ts", "src/**/*.tsx", "package.json", "tsconfig.json"],
      "outputs": ["dist/**"]
    }
  }
}

With explicit inputs, a change to README.md or a snapshot file doesn’t bust the build cache. This is the second most impactful Turborepo tuning after getting outputs globs right.

Env vars that shouldn’t be inputs

By default, Turborepo includes env vars listed in globalEnv in the global cache key. Be deliberate about what you include:

{
  "globalEnv": ["CI", "NODE_ENV"],
  "globalPassThroughEnv": ["SENTRY_AUTH_TOKEN", "GITHUB_TOKEN"]
}

globalPassThroughEnv passes the variable through to tasks without including it in the cache key. A rotating token — GitHub Actions GITHUB_TOKEN changes per-run — should be in passThrough, not env. If it’s in env, every run gets a unique cache key and you never hit the remote cache.

Turborepo 2.0 breaking changes (June 2024)

Three changes landed in Turborepo 2.0 that silently break cache hit rates on upgrade:

  1. Workspace root is now an implicit dependency of every package. A change to any file in the repo root busts every package’s task cache. This is intentional — root-level tooling changes can affect all packages — but it means your CI ran cold after you updated the root .eslintrc that nobody touches.

  2. engines field in root package.json is included in the global cache key. If you have "engines": { "node": ">=20" } in your root package.json, a version bump there invalidates all caches. Check your root package.json before assuming a cache regression is a BuildKit issue.

  3. outputMode renamed to outputLogs. If your config still references outputMode, Turborepo silently ignores it. Run the provided codemod:

npx @turbo/codemod@latest migrate

Scoping with --filter

On a large monorepo, most PRs only touch one or two packages. Use --filter to skip unchanged packages entirely:

turbo build --filter='[HEAD^1]'

[HEAD^1] builds only packages with changes since the previous commit, plus their dependents. On a 40-package monorepo where a PR touches 3 packages, this drops the dependency-resolution graph from 40 nodes to 5–8.

For the broader set of production pitfalls that surface 3–12 months into a Turborepo monorepo — including edge cases beyond cache configuration — see Turborepo monorepo pitfalls we learned the hard way.

Nx Cloud: when it’s worth it

Nx Cloud is Turborepo remote cache with distributed task execution (DTE) layered on top. With DTE, tasks from a single build are sharded across multiple CI agents instead of running sequentially on one machine.

The free tier gives 50,000 credits per month (each task-second consumes credits) and up to 5 contributors. The Team tier starts at $19 per contributor plus $5.50 per 10,000 credits. For most teams, the question isn’t the free tier — it’s whether DTE justifies the Team tier cost.

DTE pays off when you have tasks that can run in parallel but are currently bottlenecked on a single runner. A 20-minute test run with 4 independent test suites runs in ~5 minutes with 4 DTE agents. The math is straightforward.

For the full Turborepo vs Nx comparison at the architecture level — including when Nx’s DTE pricing makes sense — see Best monorepo tool in 2026 — pnpm + Turborepo or Nx?.

The downside: Nx’s configuration overhead is real. If you’re already running Turborepo with remote cache and your bottleneck isn’t agent parallelism, DTE adds complexity for no speedup. Self-hosted Turborepo remote cache (any S3-compatible bucket) covers the “share cache across CI machines” case without the Nx Cloud per-contributor pricing.

How to know your cache is working

Docker cache hit rate

Add --progress=plain to your BuildKit invocation:

docker build --progress=plain .

Look for CACHED on each step. Any step missing CACHED is a cache miss. The first miss in the layer sequence is your invalidation point.

Turborepo cache analytics

Pass --summarize to get a per-task breakdown:

turbo build --summarize

Turborepo writes .turbo/runs/<run-id>.json. Each task entry includes cacheStateHIT, MISS, or SKIP. A run where 90% of tasks are HIT but CI is still slow usually means the 10% cold tasks are on the critical path.

For ongoing measurement, the Vercel Remote Cache dashboard shows hit rate over time. For self-hosted cache, add a CI step that parses the summary JSON and posts hit rate as a PR comment or metric.

GitHub Actions cache metrics

The cache action logs Cache restored from key on a hit and Cache not found for input keys on a miss. Search your job logs for these strings. GitHub also exposes cache usage in Settings → Actions → Caches — sort by last accessed to find entries that haven’t been hit in days.

1. Non-deterministic timestamps in Docker layers

A RUN step that embeds the current timestamp in any output file produces a unique layer on every build, regardless of whether source files changed. Common culprits: date in build scripts, Date.now() baked into bundle output, or C compilation with __DATE__.

Fix: SOURCE_DATE_EPOCH. Set it to a fixed value (the git commit timestamp works well) and configure your build tool to respect it:

export SOURCE_DATE_EPOCH=$(git log -1 --pretty=%ct)
docker build --build-arg SOURCE_DATE_EPOCH=$SOURCE_DATE_EPOCH .

Then in the Dockerfile:

ARG SOURCE_DATE_EPOCH
RUN SOURCE_DATE_EPOCH=$SOURCE_DATE_EPOCH npm run build

2. Secrets via build args

Docker build args are recorded in the image history. A --build-arg API_KEY=... leaks the key to anyone with docker history access on the image. This is distinct from the cache problem, but the two often appear together — teams pass secrets as build args to make them available at build time, not realizing the exposure.

Fix: use BuildKit secrets mounts. They’re available at build time but are not included in the image or its history:

RUN --mount=type=secret,id=api_key \
    API_KEY=$(cat /run/secrets/api_key) npm run build
docker build --secret id=api_key,env=API_KEY .

3. Lockfile not in cache key

A package-lock.json not included in the cache key means dependency installs are cached even when packages change. The classic mistake:

# Wrong: key never changes when dependencies update
key: ${{ runner.os }}-node

Fix: always hash the lockfile:

key: ${{ runner.os }}-node-${{ hashFiles('**/package-lock.json') }}

4. Parallel runner cache write race

When two CI jobs share a cache key and both experience a miss, both try to save to that key on completion. The second write wins and overwrites the first. If the two jobs happen to differ in their output (different package version resolved, different test fixtures downloaded), you get a non-deterministic cache state.

Fix: scope the cache key to the job or include a job-unique component when parallel jobs produce different outputs. If parallel jobs produce identical outputs (same lockfile, same inputs), the race is harmless — both writes are equivalent.

5. Cache poisoning via branch access

Any workflow that can write to the Actions cache on the default branch can poison the cache used by other branches, since GitHub’s cache access policy grants read access to the base branch. A malicious PR that writes a compromised node_modules to the cache could affect builds on main.

The practical mitigation: mark cache-write steps with if: github.ref == 'refs/heads/main' on sensitive keys, or use a separate cache key namespace for CI that doesn’t share a write path with PR workflows. The full attack surface is documented here.

Verdict

Docker cache: switch from inline to registry mode with mode=max if you have more than one FROM stage. Audit your COPY order — it’s the highest-leverage single change. If you’re on the gha backend and on Buildx older than v0.21.0, upgrade before any other optimization.

GitHub Actions cache: hash the lockfile in your key, add a version prefix for intentional invalidation, and keep one restore-keys fallback. The 10 GB per-repo limit rarely matters if you evict by key prefix on lockfile changes.

Turborepo: define explicit inputs for every task, put rotating tokens in globalPassThroughEnv, and run the 2.0 codemod if you upgraded without doing so. Check --summarize hit rate before assuming a hardware upgrade will help.

Nx Cloud DTE: worth evaluating if you have independently parallelizable tasks and a Team-tier project. Not worth it if your bottleneck is sequential dependency chains or you’re not hitting the Turborepo cache hit rate ceiling first.

Start with measurements. A CI job that logs Cache not found on every run is a configuration problem, not a hardware problem.

References