Intel Optane gets a second AI life in a local trillion-parameter model
An Optane-packed workstation shows how local AI is still a memory game.๐ท AI-generated image / TECH&SPACE
- โ The system used 768 GB of Intel Optane PMem DIMM memory and one GPU for local Kimi K2.5.
- โ Tom's Hardware reports that the installation reached roughly four tokens per second.
- โ The experiment shows a cheaper route for AI testing, but it does not erase speed, bandwidth, and power limits.
Local AI usually breaks at the same point: memory. Models may be available, tooling may be open, but if the weights cannot fit into available VRAM or system RAM, the project quickly stops being a workstation experiment. That is why the build reported by Tom's Hardware is interesting without pretending it is a new performance baseline.
According to the report, a Reddit user managed to run local Kimi K2.5, a trillion-parameter-class model, on a workstation with a single GPU. The key ingredient was not an exotic GPU cluster, but 768 GB of used Intel Optane persistent memory DIMMs used as a large memory pool. That does not turn a workstation into a data center, but it changes the entry calculation: instead of requiring a multi-GPU server from the first step, some of the burden can be pushed onto cheaper, slower, but very large memory.
The number that grounds the whole story is roughly four tokens per second. That is enough to prove the system works, but not enough to erase the difference between a demonstration and production use. With large language models, speed is not just a question of how much memory exists. Bandwidth, latency, weight loading, CPU-to-RAM-to-GPU movement, and the software layer that decides where computation happens all matter.
An enthusiast used 768 GB of Intel Optane PMem DIMMs to run local Kimi K2.5 on a single-GPU system, reaching roughly four tokens per second.
The Optane PMem modules here are not an accelerator, but a large memory pool.๐ท AI-generated image / TECH&SPACE
Optane PMem is especially interesting here because it is now a technological remnant of a different strategy. Intel positioned it as a tier between classic DRAM and storage: denser than DRAM, persistent, but slower. After Optane was discontinued, used modules became niche hardware for people who know exactly why they want them. AI enthusiasts now see another use: a place to hold very large model weights when GPU memory is not enough.
Kimi K2.5 comes from the Moonshot AI ecosystem, and the fact that a model in this class can be started locally on such a configuration matters more than the raw speed. It does not mean that local trillion-parameter LLMs are suddenly accessible to everyone. It means the edge of experimentation is moving toward people who can assemble unusual memory layouts, track their constraints, and accept the tradeoff between cost and waiting.
Precision matters here: this is not proof that giant models will soon run comfortably on ordinary home PCs. One GPU plus 768 GB of Optane is still a very specific workstation, and four tokens per second is not a comfortable interactive experience for most work. But the experiment does break a useful psychological barrier. It shows that local execution of extremely large models does not always have to begin and end with the price of a modern GPU cluster.
The larger lesson is architectural. AI accessibility will not expand only through new models; it will also expand through better use of strange, written-off, or undervalued hardware. In this case, old Optane is not a magic accelerator. It is a large, cheap memory surface for testing the boundary between what is theoretically possible and what is actually useful.

