AI and worktrees are filling our disks: kache storage, measured
The way we build software is changing, and it is quietly filling our disks. A developer used to live in one checkout. Now work increasingly happens across many at once, and AI coding agents push it to an extreme: a worktree per task, several branches compiling in parallel, trees spun up and thrown away faster than anyone would by hand. Parallel worktrees are becoming the normal shape of a working day, not an occasional trick.
Every one of those worktrees is a full checkout with its own build output, and a large project's build output is not small. Firefox produces roughly 16 GB of objdir per checkout. Run a handful of agents and you are not duplicating a little. You are duplicating that whole tree again and again, the same compiled artifacts sitting in five places at once. That is the shift that turns a cache's storage behavior from a footnote into the thing that matters: the old worry was "can we avoid recompiling this crate," but the one that bites in an agent-heavy workflow is "why do we have five copies of the same artifact across five worktrees."
This post is about how well kache, the build cache we make at Kunobi, answers that second question, measured on the hardest real project we had. The short version: on a copy-on-write filesystem, a second Firefox worktree restored 13.5 GB of build artifacts without copying a single byte of them, by cloning blocks the cache already held. The worktree still added its own non-shared output (about 3 GB), but the shared bulk cost nothing to materialize. Across the whole run, reflink sharing avoided 27.4 GB of duplicate data. Here are the measured numbers and how it works.
Why Firefox
We could have measured this on something small and posted a tidier number. We picked Firefox because it is the hardest realistic target we could find, and storage claims are easy to make and hard to keep on a build that big.
Firefox is not a single-language project with a clean build graph. It mixes Rust and C/C++, drives them through its own build system rather than plain Cargo, generates a large amount of code during the build, links one enormous final library, and pulls in a dependency tree big enough that a cold build runs for the better part of an hour and a single objdir is around 16 GB. That scale is the point. If reflink-based sharing holds up across two checkouts of Firefox, it is strong evidence for builds in the same class, large native and mixed-language trees, even if a smaller project with a different build shape will not behave identically.
A lot of what makes kache behave correctly on a build this complex came from people who know that codebase far better than we do. Sascha Rosylight and Lars Eggert at Mozilla have been generous with issues, edge cases, and pointed questions, and the tool is better for it. Thank you both.
How we ran it
kache is the build cache we develop at Kunobi: a compiler launcher (a RUSTC_WRAPPER, and a wrapper for C/C++ compilers) that stores each compiler output in a local content-addressed store and restores it into later builds by reflink or hardlink. We build it for large projects, many worktrees, and increasingly fleets of agents that each want their own checkout, so we measure it the way it actually gets used: across separate checkouts rather than inside a single tree.
The run was on an Apple M5 Pro (18 cores, macOS 26.5) on APFS, which supports reflinks (copy-on-write file clones). It used kache 0.7.0 and Firefox at tag FIREFOX_151_0_RELEASE, built optimized and non-debug, with tests, LTO, and PGO all disabled (chosen to focus the benchmark on compile-cacheable work rather than final-link or PGO behavior). It validated clean (verdict: ok, key stability 100%), and it followed a deliberate cross-clone shape:
- Build a fresh Firefox checkout with an empty cache. This is the cold build, which also populates the store.
- Build a second checkout, at a different path, against that store. This is the warm build.
Both checkouts are real git worktrees of the same repository at different paths, which is exactly the case we care about: a second worktree, an agent's task branch, or a fresh CI runner. The warm build does not reuse the cold build's directory. For storage that is the question that matters: when the same artifacts are needed again somewhere else on disk, does the cache share the bytes or duplicate them. The whole run is driven by one command (just bench firefox), and the harness refuses to trust a run that fails its checks. This one reported verdict: ok with 100% key stability, which means no cache key leaked across the path change.
Every figure below comes from kache's own kache report output and the run's summary JSON, which are written to disk by that one command. One convention note: we quote sizes in decimal GB (1 GB = 10^9 bytes), while kache report itself renders binary GiB, so a value we call 13.5 GB shows up there as 12.5.
For the record, the warm build also came in at a 10.75x speedup with a 98.5% hit rate, so kache is plenty fast on the time axis too. But speed is the expected payoff for any cache. The disk story is the one that compounds with every worktree, so that is what the rest of this post is about.
What it cost on disk
The mechanism is simple to state. kache stores each unique artifact once, addressed by a hash of its inputs. On a copy-on-write filesystem, it does not copy bytes in or out of that store. It clones blocks. The store entry shares physical blocks with the build that produced it, and a restore shares blocks with the store.
That shows up directly in the report.
| Phase | What happened | Bytes copied |
|---|---|---|
| Cold, populating the store | 13.9 GB reflinked into the store, 100% shared with build output | 0 B |
| Warm, restoring a second worktree | 13.5 GB restored, 100% reflink, 100% zero-copy | 0 B |
| Warm, adding to the store | only ~10 MB of genuinely new content | n/a |
Read the middle row again. A second full worktree pulled back 13.5 GB of compiled artifacts, and the filesystem wrote none of that payload as new data (beyond the directory entries any file needs). Those 13.5 GB are clones of blocks the store already held. The store itself, 10,725 blobs, was filled during the cold build the same way: by reflinking the build's own output rather than copying it, so the cache cost almost nothing to create.
Now the whole-run footprint:
| Measure | Firefox cold + warm (two worktrees + cache) |
|---|---|
| Apparent size (naive sum of both objdirs and the cache) | 47.2 GB |
| Saved by copy-on-write sharing | 27.4 GB (13.5 GB restore + 13.9 GB store) |
| Measured on disk after sharing | 19.9 GB |
Two complete Firefox build trees plus a 13.9 GB cache would naively cost 47.2 GB. Reflink sharing knocks 27.4 GB off that, because the cache shares blocks with the build that filled it and the second worktree shares blocks with the cache. The clearest way to see the win is the marginal cost of that second worktree: its objdir is 16.6 GB on paper, but 13.5 GB of it is reflinked from the cache, so it added only about 3 GB of genuinely new data to the disk instead of a full 16.6 GB.
One note on the totals. For this run the harness measured the actual free-space delta at 19.9 GB, which lines up almost exactly with the 19.8 GB it estimates for the three build pools after reflink sharing. Whole-disk deltas like that carry some run-to-run noise, so the figures we lean on are the directly-measured ones above: 13.5 GB restored with zero bytes copied, 13.9 GB stored with zero bytes copied, and 27.4 GB of duplication avoided.
What this means for many worktrees
The run above is two worktrees, and the marginal-cost number is the one to carry forward. The second worktree cost about 3 GB of real disk rather than 16.6 GB, because everything it shared with the cache was cloned, not copied.
Now scale that. Without cross-checkout sharing, worktree number N costs a full objdir, every time. Five Firefox worktrees is roughly 80 GB of build output and climbing. With kache on a copy-on-write filesystem, the shared artifacts exist once in the store, and each additional worktree adds only the blocks that are genuinely unique to it. The marginal cost of one more worktree is the delta, not another full build tree.
How small that delta stays depends on how alike the worktrees are. The more branches diverge, with different compiler flags or changed generated inputs, the more each one recompiles and the larger its unique share grows. The saving is real, but it tracks how much the trees actually share, so treat the ~3 GB here as the well-aligned case, not a flat guarantee.
That is the property that makes kache fit an agent-driven workflow. The agents can churn through worktrees, and the storage tracks the size of one build plus the deltas, rather than multiplying with every branch.
The same machine, through sccache
To see how much of this is the filesystem and how much is kache, we ran the identical Firefox cross-clone shape through sccache on the same Mac. sccache (0.15.0, in its default configuration) is a widely used compile cache that stores compressed objects and decompresses them on a hit. It does not reflink.
That difference shows up in its own disk report, which describes the layout as three independent pools, where kache's report says three pools sharing blocks via reflinks. With sccache, the second worktree's objdir is a full, independent 16.7 GB. With kache, the same objdir is 16.6 GB on paper but only about 3 GB of new data, because the rest is reflinked from the store.
Two things are worth being precise about. First, the robust difference is per-worktree, and it compounds: every additional worktree adds a full independent objdir for sccache and roughly a delta for kache. We lean on that mechanism rather than the whole-run totals, which carry run-to-run noise. Second, in this benchmark setup sccache's keys were more sensitive to the path change, so on the warm build it recompiled 830 translation units against kache's 68, which is a reuse-efficiency point more than a storage one. We are not comparing wall-clock here: the two runs happened at different times under different machine load, and time is not the claim this post is making.
The honest boundaries
This is a single run, and it is the good case for kache on purpose. A few things to be clear about:
- It needs a copy-on-write filesystem. The zero-copy story here depends on reflinks, which you get on APFS, recent btrfs, and XFS with reflink enabled. On a filesystem without them, kache falls back to hardlinks for restores, which still avoid duplicating restored bytes (as long as the store and the build output live on the same filesystem), but the store has to be populated by copying rather than cloning, so the cache becomes a real second copy. In either mode the store's blobs are content-addressed and marked read-only, so a reflinked or hardlinked restore that shares storage with the cache cannot silently corrupt the cached copy. Many CI runners are in exactly that situation, where the storage numbers are a worst case. This run is the other end of that spectrum, on hardware that has what kache is designed to use.
- Single sample. One run, Firefox 151. The directly-measured reflink numbers (13.5 GB zero-copy restore, 13.9 GB store, 27.4 GB saved) are the solid figures; whole-disk totals carry some run-to-run noise, so treat them as directionally right rather than exact.
- Not every artifact is cached. The final link is not, and a handful of build steps are deliberately passed through. The shared bytes are the compiled crates and objects, which is the bulk of an objdir but not all of it.
Seeing it yourself
These figures are not from a stopwatch or du. They come straight out of kache's own reporting, which tracks the reflink, hardlink, and copy split on every build.
kache reportproduces the storage section above, including restored bytes by method, store blob count, and dedup saved.kache monitorshows hits, misses, and deduplicated bytes live as a build runs.- Each run also emits Perfetto traces (
trace-cold.json,trace-warm.json) you can open in the Perfetto UI to walk the hit and miss timeline event by event.
We are a vendor making a vendor claim, so here is the storage block from this run's warm report, values exactly as reported (kache renders GiB, so 13.5 GB decimal shows as 12.5):
#### Storage
| Metric | Value |
|---|---|
| Restored bytes | 12.5 GB total: 12.5 GB reflink, 0 B hardlink, 0 B copied |
| Zero-copy restores | 100.0% |
| Store footprint | 13.0 GB logical -> 12.9 GB blobs, 50.5 MB dedup saved |
| Store blobs | 10725 |
| Store ingest | 10.0 MB reflinked (CoW), 0 B copied, 100.0% shared with build output |
To reproduce the run on a copy-on-write filesystem:
just bench firefox
It writes the cold and warm reports and a summary under tmp/bench/bench-firefox/. The benchmarks documentation covers the report format and the cross-clone setup.
The takeaway
Time and space are two different reasons to use a build cache, and they do not always travel together. The speedup makes the case on time. This is the case on space: on the filesystem kache is built for, a second worktree of a 16 GB project came back for zero copied bytes, and reflink sharing saved 27.4 GB across the run.
If you run a lot of worktrees, and especially if you run agents that spin them up all day, that is the number worth caring about.
If that sounds like your setup and you are on a reflink-capable filesystem (APFS, recent btrfs, or XFS with reflink), the quickest way to see it for yourself is to point kache at two real worktrees of your own project and read the reflink-versus-copy split in kache report.
GitHub: kunobi-ninja/kache
Cluster updates, in your inbox.
Kubernetes deep dives, GitOps field notes, and platform-engineering essays from the team building Kunobi. Two posts a month. No fluff.

