phasespacecollapse

Notes on Collapse

A first post, which is also an excuse to make sure the typography holds.

This is the first post on this blog, and also the post I use to verify that every small typographic choice I made actually works when real words flow through it. If you’re reading this, one of two things has happened: either I forgot to delete it, or I decided it’s a fine enough opening that it can stay.

The phrase phase space is from dynamical systems. It’s the abstract space whose coordinates are every variable needed to specify a system’s state. The trajectory of a system through phase space is its history. Collapse, in this title, is mine — not a technical term, though the ear wants to make it one. I like that it sounds inevitable. I like that it isn’t quite accurate. The best names point at something and miss.

What this is for

I’m a software engineer, currently between jobs, writing mostly about machine learning infrastructure and the systems I find interesting. Sometimes I’ll write about other things. There is no content calendar. There is no newsletter. Posts appear when I finish them.

If you want to follow along, the RSS feed is the right way. It always has been.

What the prose should feel like

Body text is set in Source Serif 4 at weight 330, which is lighter than the browser default and, to my eye, reads better on screen. A default 400-weight serif looks smudged at body size — like a photocopy of a photocopy. 330 has air.

Italics are for emphasis and titles and the occasional aside. Bold is for when I mean it. Links look like this — one accent color, an oxidized red, used nowhere else. The dot in the masthead is the only other place it appears.

Here is a pull quote, which is really just a blockquote:

The best names point at something and miss. If they hit exactly, they become technical terms, and technical terms are inert. A name that is almost-right still has work to do; it pulls the reader toward the thing, and the pull is the meaning.

Ordered lists:

  1. First, identify the bottleneck.
  2. Then, measure it, because you will be wrong about where it is.
  3. Then, fix it, and watch the bottleneck move somewhere else.

Unordered lists:

  • A short item.
  • A longer item that runs past the end of the first line, which is useful because I want to see how list items wrap in the measure I chose and whether the left alignment on the second line reads as intended.
  • A final short one.

Code

Inline code looks like this — set in JetBrains Mono, on a subtle warm-gray background, slightly tinted so it reads as set apart from prose without shouting. Now for a block. Here’s some Python, which is the language most of the real posts will lean on:

import torch
from torch import nn

class KVCache(nn.Module):
    """A minimal KV cache for decoder-only inference.

    The cache grows one token at a time during autoregressive
    generation. Its memory footprint, not compute, is usually the
    binding constraint for long-context workloads.
    """

    def __init__(self, n_layers: int, n_heads: int, head_dim: int,
                 max_seq_len: int, dtype=torch.float16):
        super().__init__()
        shape = (n_layers, 2, max_seq_len, n_heads, head_dim)
        self.register_buffer("store", torch.zeros(shape, dtype=dtype))
        self.length = 0

    def update(self, layer: int, k: torch.Tensor, v: torch.Tensor) -> None:
        end = self.length + k.size(0)
        self.store[layer, 0, self.length:end] = k
        self.store[layer, 1, self.length:end] = v

    def get(self, layer: int) -> tuple[torch.Tensor, torch.Tensor]:
        k = self.store[layer, 0, :self.length]
        v = self.store[layer, 1, :self.length]
        return k, v

A little Rust, because I’ve been playing with it:

fn phase_portrait(points: &[Point]) -> Vec<Trajectory> {
    points
        .iter()
        .map(|p| integrate(p, STEP, HORIZON))
        .filter(|t| !t.diverged())
        .collect()
}

And a shell one-liner, which should also look right:

$ rg -l 'TODO' --type rust | xargs -n1 -I{} echo 'fix: {}'

Math

Math is rendered with KaTeX. Inline: the Gaussian density is f(x)=1σ2πexp ⁣((xμ)22σ2)f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\!\left(-\frac{(x-\mu)^2}{2\sigma^2}\right). Display, for something more spacious:

L(θ)  =  ExD[logpθ(x)]  +  λθ22\mathcal{L}(\theta) \;=\; \mathbb{E}_{x \sim \mathcal{D}} \Big[ -\log p_\theta(x) \Big] \;+\; \lambda \, \lVert \theta \rVert_2^2

A small table

TechniquePeak mem.ThroughputCaveat
FP16 KV cache100%1.0×baseline
INT8 (per-head)52%1.3×slight quality drop
INT4 (grouped)28%1.6×noticeable on long ctx

Why “collapse”

Wavefunctions collapse. Civilizations collapse. The distinction between the two is partly one of time scale and partly one of whether you were watching. A phase-space trajectory doesn’t collapse in any standard usage — it converges, diverges, wraps around an attractor. But there is a sense, looking at the right system, where it feels like collapse: the dimensionality of what’s possible contracts, and contracts, and the future arrives looking like one particular thing rather than the cloud of things it could have been.

That feeling is what I’d like to write toward.

More soon.