All Guides

03 / 25

Reproducible Toolchains

Core Questions

  • How do we guarantee identical behavior across runs?
  • How do we eliminate snowflake laptops?

"Works on my machine" is annoying when humans say it. It's catastrophic when agents produce it. Reproducible toolchains mean identical behavior everywhere: your laptop, your colleague's laptop, the agent's environment, CI. Same input, same output, every time.

Why reproducibility is non-negotiable

When a human hits an environment issue, they debug it. They notice the error, google it, realize their Node version is wrong, update it, move on. It's annoying but manageable.

Agents can't do this well. They'll see an error, try to fix the code, fail, try again, and spiral. They don't have the intuition that says "wait, this looks like an environment problem, not a code problem." Hours of agent time (and your money) get burned on the wrong problem.

Environment drift failure mode

  1. Human writes code on Node 22. Tests pass.
  2. Agent runs in environment with Node 20. Tests fail.
  3. Agent assumes its code is broken. Rewrites it.
  4. New code still fails (because it's still Node 20).
  5. Agent keeps iterating on the wrong problem.
  6. Human reviews PR: "why did you change all this?"

Reproducible toolchains eliminate this class of problem. Everyone (humans, agents, CI) runs the exact same versions of everything.

What needs to be pinned

"Pin everything" is the principle. In practice, prioritize:

Pinning Priority

Language runtime

Node, Python, Go, Rust version. The foundation everything else builds on. Minor version differences cause subtle bugs.

Critical

Package manager

npm, yarn, pnpm, pip, cargo version. Different versions resolve dependencies differently. Use lockfiles religiously.

Critical

Dependencies

All packages with exact versions in lockfiles. Transitive dependencies included. No floating versions.

Critical

System tools

git, make, curl, database CLIs. Versions matter less than presence, but still worth pinning for full reproducibility.

Important

OS / base image

Ubuntu version, Alpine version, macOS version. Affects system libraries, paths, available tools.

Important

Environment identity (for pull-local handoff)

In a pull-local workflow, humans don’t “download the agent’s VM.” They rehydrate locally using an environment identity: the minimal set of pins that makes your local execution behave like the agent’s execution.

Your platform should mint an identity for every agent task and attach it to logs, artifacts, and PRs. This makes drift visible and handoff fast.

What to record per task

Repo ref

Commit SHA (or branch + SHA) the agent ran against.

Toolchain identity

A pinned toolchain reference: Nix flake.lock, a container image digest, or a hash of your version files (e.g. .tool-versions).

Dependency inputs

Lockfile hashes (e.g. pnpm-lock.yaml, package-lock.json, Cargo.lock, go.sum).

Task snapshot ID (optional)

Small, semantic task state: workspace diff + key artifacts. Avoid full filesystem snapshots when possible.

What not to record

  • Credentials, cookies, or any long-lived auth state
  • Mutable global home-directory state (treat it as runtime state)
  • Floating references (tags without digests, unpinned installer scripts)

This pairs naturally with ephemeral runtimes: the runtime resets, but the identity stays stable and shareable. If you haven’t read it yet, see Ephemeral Runtimes for the pull-local “enter context” checklist and reset semantics.

Approaches to reproducibility

Several tools solve this problem with different tradeoffs. The best choice is the one that makes your environment identity easy to compute and hard to accidentally change.

For agentic platforms, a strong default is to lean into Nix Flakes. They can feel scary at first, but they buy you reproducibility, cross-platform environments, and a single “source of truth” that works for humans, agents, and CI.

Practical tip: let an LLM write the first draft of your flake.nix. Then treat it like any other infra change: review it, keep it minimal, and pin everything in flake.lock.

Default recommendation: Nix Flakes

  • Reproducible: content-addressed dependencies and a committed lockfile.
  • Cross-platform: define environments per system/arch while sharing one intent.
  • Safer by default: immutable store + fewer “curl | bash” installs in the critical path.
  • Agent-friendly: consistent nix develop entry point everywhere.
  • Developer credibility: you’ll win a few hearts just for shipping Nix.

Reproducibility Tools

Nix / Nix Flakes

The gold standard for reproducibility. Every dependency is content-addressed. If the flake.lock is the same, the environment is bit-for-bit identical. Steeper learning curve, maximum reproducibility.

For pull-local, flake.lock is your toolchain identity. It’s also a great fit for agents: tools like Claude Code, Codex, and Cursor can generate and maintain flakes well because they’re just declarative code plus a lockfile.

nix develop # Enter reproducible shell
Default
Dev Containers

Docker-based dev environments defined in devcontainer.json. Works with VS Code, GitHub Codespaces, JetBrains. Familiar if you know Docker.

For reproducibility, pin base images by digest and treat the devcontainer config as part of the identity.

devcontainer open . # Open in container
Good

Docker / Compose

Plain Dockerfiles and docker-compose.yml. Pin base images by digest (not just tag) for full reproducibility. More manual than devcontainers.

If you’re doing pull-local, record the final image digest in the environment identity, not just the Dockerfile ref.

docker-compose up -d
Okay
asdf / mise

Version managers that pin tool versions per-project. Lighter than containers. .tool-versions file specifies versions. Good balance of simplicity and control.

For pull-local, hash and record your version files (e.g. .tool-versions) alongside lockfiles.

mise install # Install pinned versions
Okay

Nix Flakes for agent workflows

Nix Flakes deserve special attention for agent workflows. A flake.nix file in your repo defines the exact environment. Anyone (human or agent) who runs nix develop gets the identical setup.

The “scary” part of Nix is mostly upfront: expressing your environment as code. The payoff is that agents and humans stop debugging drift and start debugging product code.

This is also where LLMs shine. Asking an agent to “make a flake that installs Node 22, pnpm, and postgres 16” is a much more reliable workflow than asking it to “install things until tests pass.”

Example flake.nix

{
  description = "Development environment for my-project";

  inputs = {
    nixpkgs.url = "github:NixOS/nixpkgs/nixos-24.05";
    flake-utils.url = "github:numtide/flake-utils";
  };

  outputs = { self, nixpkgs, flake-utils }:
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = nixpkgs.legacyPackages.${system};
      in {
        devShells.default = pkgs.mkShell {
          buildInputs = with pkgs; [
            # Language runtimes
            nodejs_22
            
            # Package managers
            nodePackages.pnpm
            
            # Tools
            git
            jq
            curl
            
            # Database
            postgresql_16
            
            # Infrastructure
            terraform
            doppler
          ];
          
          shellHook = ''
            echo "Dev environment ready"
            echo "Node: $(node --version)"
            echo "pnpm: $(pnpm --version)"
          '';
        };
      }
    );
}

The flake.lock file pins exact versions of everything. Commit it to git. When the agent runs nix develop, it gets exactly what the lockfile specifies — down to the specific git commit of nixpkgs.

A good operating rule: treat flake.lock changes like dependency upgrades. Review them intentionally. Don’t let them drift accidentally.

CI/CD parity

The whole point of reproducibility is that everywhere is the same. If your dev environment uses Nix but CI uses a different setup, you still have environment drift.

A practical rule: CI should be able to print the same environment identity a developer sees locally. If you can’t name the identity, you can’t enforce parity.

GitHub Actions with Nix

# .github/workflows/ci.yml
name: CI
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - uses: cachix/install-nix-action@v26
        with:
          nix_path: nixpkgs=channel:nixos-24.05
          
      - uses: cachix/cachix-action@v14
        with:
          name: your-cache  # Optional: cache Nix builds
          
      - name: Run tests
        run: |
          nix develop --command npm test
          
      - name: Build
        run: |
          nix develop --command npm run build

Same environment definition, same results. If tests pass locally, they pass in CI. If they pass for a human, they pass for an agent.

Lockfiles are sacred

Whatever approach you use, lockfiles are the source of truth. They must be:

  • Committed to git. Always. No exceptions.
  • Updated intentionally. Not accidentally when someone runs install.
  • Reviewed in PRs. Lockfile changes should be visible and scrutinized.

Lockfiles to commit

  • • package-lock.json
  • • pnpm-lock.yaml
  • • yarn.lock
  • • Cargo.lock
  • • poetry.lock
  • • go.sum
  • • flake.lock
  • • Gemfile.lock

Version files to commit

  • • .nvmrc
  • • .node-version
  • • .python-version
  • • .tool-versions
  • • .ruby-version
  • • rust-toolchain.toml

What goes wrong

Floating versions

package.json says "^18.0.0" and different environments resolve to different 18.x versions. Subtle behavior differences. Tests flake.

Missing lockfile

Lockfile not committed (or in .gitignore by mistake). Every install resolves fresh. Today's install is different from yesterday's.

CI environment drift

CI uses ubuntu-latest which quietly updates. Or a different Node version. Dev works, CI fails, nobody knows why.

Agent environment mismatch

Agent runs in a different environment than humans. Agent's fix works in its environment but breaks in yours. Merge → broken main.

Summary

  • Pin everything: runtime, package manager, dependencies, system tools.
  • Use Nix Flakes for maximum reproducibility, or devcontainers for familiarity.
  • Make environment identity a first-class artifact: ref + toolchain + lockfiles (+ snapshot when needed).
  • CI must use the same environment definition as local dev.
  • Lockfiles are sacred. Commit them. Review changes. Never ignore them.

Stay updated

Get notified when we publish new guides or make major updates.
(We won't email you for little stuff like typos — only for new content or significant changes.)

Found this useful? Share it with your team.