Z80-μLM squeezes “conversational AI” into 40KB on a vintage Z80
What’s notable here isn’t another chatbot demo-it’s the constraint. Z80-μLM runs an interactive, text-generating “conversational” engine within roughly 40KB on a Z80-era platform, proving you can deliver responsive language behavior without GPUs, gigabytes of RAM, or even floating point. Under the hood, anything at this footprint isn’t a modern transformer; think compact statistical modeling and a hand-tuned runtime optimized for 8-bit arithmetic and fixed memory. The result is less about competing with mainstream LLMs and more about showing how far careful data layout, tiny vocabularies, and deterministic inference can be pushed on retro hardware.
The bigger picture is practical: embedded and edge systems can gain lightweight, offline language interfaces without a cloud dependency, improving latency, power draw, and privacy. Worth noting: “conversational” here has limits-this won’t handle complex reasoning or long context-but as an engineering artifact, it spotlights techniques (tight quantization, table-driven probabilities, streaming tokenization) that are increasingly relevant for microcontrollers and low-cost devices. What’s new isn’t the idea of small models so much as the ambition of the target: fitting the entire interaction loop into tens of kilobytes on an 8-bit CPU. That’s a useful north star for developers thinking about on-device AI beyond the datacenter hype.