Google DeepMind Releases Gemini 2.5 Ultra — Tops Major Benchmarks
Gemini 2.5 Ultra achieves state-of-the-art scores on MMLU, HumanEval, and MATH benchmarks, with native audio and video understanding built in.
Google DeepMind released Gemini 2.5 Ultra, claiming state-of-the-art performance across a comprehensive suite of academic and real-world benchmarks. The model achieves top scores on MMLU (93.8%), HumanEval (91.2%), and MATH (94.5%), and demonstrates particularly strong performance on scientific reasoning tasks.
Unlike its predecessors, Gemini 2.5 Ultra was built natively multimodal from the ground up — meaning audio, video, image, and text are processed through the same architecture rather than bolted together. This results in notably better cross-modal reasoning, such as answering questions about content in a video using information from an accompanying document.
Benchmark performance highlights
- —MMLU: 93.8% (new state-of-the-art at release)
- —HumanEval (code): 91.2%
- —MATH: 94.5%
- —GPQA Diamond (graduate-level science): 86.4%
- —Video-MME (video understanding): 84.1%
Gemini 2.5 Ultra is our most capable model yet, and the results on science and mathematics tasks show what is possible when multimodality is native rather than patched on. — Demis Hassabis, CEO Google DeepMind
The model is available via Google AI Studio and Vertex AI with a 1 million token context window. A 2 million token variant is in preview for enterprise customers.