HomeNewsAI Models
AI ModelsTech Radar

Google DeepMind Releases Gemini 2.5 Ultra — Tops Major Benchmarks

Gemini 2.5 Ultra achieves state-of-the-art scores on MMLU, HumanEval, and MATH benchmarks, with native audio and video understanding built in.

Google BlogMay 16, 2026

Google DeepMind released Gemini 2.5 Ultra, claiming state-of-the-art performance across a comprehensive suite of academic and real-world benchmarks. The model achieves top scores on MMLU (93.8%), HumanEval (91.2%), and MATH (94.5%), and demonstrates particularly strong performance on scientific reasoning tasks.

Unlike its predecessors, Gemini 2.5 Ultra was built natively multimodal from the ground up — meaning audio, video, image, and text are processed through the same architecture rather than bolted together. This results in notably better cross-modal reasoning, such as answering questions about content in a video using information from an accompanying document.

Benchmark performance highlights

  • MMLU: 93.8% (new state-of-the-art at release)
  • HumanEval (code): 91.2%
  • MATH: 94.5%
  • GPQA Diamond (graduate-level science): 86.4%
  • Video-MME (video understanding): 84.1%
Gemini 2.5 Ultra is our most capable model yet, and the results on science and mathematics tasks show what is possible when multimodality is native rather than patched on. — Demis Hassabis, CEO Google DeepMind

The model is available via Google AI Studio and Vertex AI with a 1 million token context window. A 2 million token variant is in preview for enterprise customers.