Maybe we do have “ChatGPT at home,” now.
OpenAI recently released two actually-open LLMs, suitable for running on local, consumer-grade PC hardware. gpt-oss:20b and gpt-oss:120b appear to be among the most capable, efficient, and reliable local LLMs I’ve tested, so far. They’re no GPT-5-Thinking, of course, but even the 20b model has handled all the logic puzzles I’ve thrown at it, so far, including competently writing a C function to find the midpoint of a great-circle path anywhere on Earth.
Performance, at least for the 20b model, is quite good, considering the high quality of the responses. Inference runs at about 12-13 tokens per second, on a Core i9 system with an RTX4070 GPU. (128GB system RAM; ~16GB total used, so it basically fits in the 12GB VRAM.) The 120b model runs at 4-5 tokens per second, which is fair, considering it’s 6x larger. (I believe the 120b model uses a mixture-of-experts scheme, to limit the amount of the model that’s active at any one time.)
The ability to have local intelligent agents handling various tasks will open up a whole range of new, interesting projects. The next step is to try to get an idea of what kind of tasks various LLM model sizes can handle. qwen3:0.6b is really fast, but usually loses the plot when asked anything but a basic question. gpt-oss:120b is very capable, but communication is so slow that it might as well happen via Morse code.
