OpenAI and NVIDIA: New Open-Weight Models gpt-oss-20B and gpt-oss-120B Optimized for Local Use on RTX/RTX PRO

Source: OpenAI’s New Models on RTX GPUs | NVIDIA Blog.

NVIDIA has optimized OpenAI’s new open-source gpt-oss-20b and gpt-oss-120b models for its GPUs, enabling fast AI inference from cloud to PC. These models support advanced agentic AI applications such as web search, in-depth research, and more, with chain-of-thought capabilities and adjustable reasoning effort levels using a mixture-of-experts architecture. The models were trained on NVIDIA H100 GPUs and support context lengths up to 131,072, suitable for tasks like coding assistance and document comprehension.

On the NVIDIA GeForce RTX 5090 GPU, they can achieve performance up to 256 tokens per second, with optimizations for RTX AI PCs and workstations using tools like Ollama, llama.cpp, and Microsoft AI Foundry Local. This emphasizes NVIDIA’s leadership in AI from training to inference and from cloud to AI PC.