Bandwidth Math: How HBM Feeds the GPU | ChipDocket

HBM's whole reason to exist is bytes per second. Intel's and Xilinx's 2026 grants on stacked memory and vertical connections show how the bandwidth is built.

Bytes per second, not just bits. The entire point of high-bandwidth memory (HBM) is throughput: instead of pushing a narrow bus to ever-higher clock speeds, you make the bus enormously wide and physically short. You do that by stacking DRAM dies on top of each other and drilling thousands of vertical connections straight through the silicon. The bandwidth is a width-times-frequency product, and HBM wins on width.

The stacking is the patentable part. Intel's grant US12653065B2, "Semiconductor package with stacked memory devices" (issued June 9, 2026; CPC includes G11C5/06 and H10B12), describes packaging multiple memory devices in a vertical stack — the structural prerequisite for the wide bus. You cannot get HBM-class bandwidth from memory sitting off to the side on a board; the dies have to go up, close, and densely interconnected.

“The present disclosure is directed to semiconductor packages, and methods for making them, which includes a package substrate, an interposer with a redistribution layer positioned on the interposer.”— U.S. Patent No. 12,653,065 source

Where do the bytes actually flow? Through vertical connections, which is exactly what Xilinx (now part of AMD) claims in US12653050B2, "Memory bandwidth through vertical connections" (issued June 9, 2026; XILINX, INC.). The title is the thesis. Bandwidth in a stacked-memory system is a function of how many vertical paths you can build and how fast each runs — and the patent is a claim on the structures that deliver it. The stack tells you the roadmap: more dies high, more vias through, more bandwidth out.

Do the math at a high level and the intuition holds. A wide interface running at a modest per-pin rate, multiplied across thousands of connections, beats a narrow interface running fast — because total bandwidth is per-pin-rate times pin-count, and stacking explodes the pin-count without lengthening the wires. Short wires also mean less energy per bit, which matters when an AI accelerator is moving terabytes per second all day.

The yield catch, which coverage usually skips: every die you stack is another die that has to be good, and every via is another thing that can fail. Stacking multiplies your exposure to defects, which is why so much of the memory-and-packaging patent flow is really about making the stack manufacturable, not just imaginable. Bandwidth on a slide is free; bandwidth at yield is the hard part.

So when an accelerator spec quotes a headline bandwidth number, that number is a packaging achievement as much as a memory one. The 2026 grants from Intel and Xilinx describe the two halves — stack the dies, then route the bandwidth vertically through them. Bytes per second, not just bits, and the patents show where each byte comes from.

Bandwidth Math: How Stacked Memory Feeds the GPU, Read Through the Patents

Comments