Abstract digital representation of Google Gemini 3.1 Pro AI model capabilitiesPhoto by Markus Winkler on Pexels

Google released Gemini 3.1 Pro on Friday, its newest large language model that sets new records on key AI tests. The model, built by teams at Google DeepMind, handles tough reasoning tasks better than before and is now available in apps and for developers. This comes one week after Google added a special mode for science work, showing the company's fast push in AI tools.

Background

Google has been building its Gemini series of AI models for over a year. These models process text, images, video, audio, and code all at once. The first Gemini came out in late 2023, and updates have followed quickly. Gemini 3 Pro arrived late last year as a strong player in chats and tasks. Now, Gemini 3.1 Pro builds on that with bigger gains in logic and problem-solving.

The push comes as rivals like OpenAI and Anthropic release their own top models. GPT-5.1 from OpenAI and Claude Opus 4 from Anthropic set high bars on tests. Google aims to match and beat them in areas like coding bugs, math problems, and new logic patterns. Just last week, Google added 'Deep Think' to help with science research. Gemini 3.1 Pro takes those ideas and spreads them to regular use.

Teams at Google DeepMind tested the model on dozens of benchmarks. These are standard tests that check how well AI thinks, codes, and recalls info from long texts. The results show Gemini 3.1 Pro ahead in most spots, though it lags in a few against top rivals.

Key Details

Gemini 3.1 Pro shines on the ARC-AGI-2 test, a tough check for logic on new patterns the AI never saw in training. It scored 77.1%, more than double the 31.1% from Gemini 3 Pro without extras. With Deep Think, it hits even higher at times. This means the model gets the point of hard questions, not just guesses next words.

Benchmark Wins

On GPQA Diamond, a test for hard science questions, Gemini 3.1 Pro got 91.9%. That's a 4-point edge over GPT-5.1's 88.1%. Deep Think pushes it to 93.8%. In math, it hit 100% with code tools, matching GPT-5.1, and 95% without tools. On SWE-Bench for fixing code bugs, it scored 76.2%, close to Claude Sonnet 4.5's 77.2%.

For coding from scratch, it leads with an Elo rating of 2,439 on LiveCodeBench Pro, 200 points above GPT-5.1. On Humanity's Last Exam, the hardest reasoning test, it reached 37.5%, up almost 11% from GPT-5.1. MathArena Apex saw a 20 times jump over past models.

It handles long texts well too. On MRCR v2 with 1 million tokens of context, it beat Gemini 2.5 Pro by 9.9%. The model works across text, images, video, audio, and code with a 1M-token window.

Real-World Examples

Google showed what it can do. The model coded a 3D simulation of starlings flying in groups. Users control the birds with hand tracking and hear music that changes with the moves.

"It doesn’t just generate the visual code; it builds an immersive experience where users can manipulate the flock with hand-tracking and listen to a generative score that shifts based on the birds’ movement." – Google DeepMind demo

It also built an aerospace dashboard with live data streams, a full city simulation with traffic and terrain, and turned flat designs into moving web graphics. In one case, it matched a book's tone to make a personal portfolio site.

Availability

The model rolled out Friday in the Gemini app for all users. Google AI Pro and Ultra subscribers get higher limits. It's the main engine for NotebookLM, Google's note-taking tool. Developers can try it now via the Gemini API, Android Studio, and a new platform called Google Antigravity. This setup helps build agents that handle multi-step jobs like finance models or spreadsheet work.

What This Means

Gemini 3.1 Pro points to AI that tackles full workflows, not just quick answers. It plans steps, uses tools, and stays steady over long tasks. Developers can build agents for coding, data analysis, or automation without much help.

For everyday users, it means smarter replies in the Gemini app. Ask for complex plans, and it breaks them down with real logic. In NotebookLM, it summarizes notes and runs deeper queries. Businesses get tools for finance, spreadsheets, and reports.

The benchmarks show Google closing gaps with rivals. It leads in reasoning without tools and agent tasks. But it trails Claude Opus 4.6 in some spots, like certain coding jobs. Still, the double jump on ARC-AGI-2 suggests core changes in how the AI thinks.

Google plans more updates soon. The fast cycle since November means users see gains often. Pro and Ultra plans expand access, while free tiers get basics. This could speed AI use in apps, work, and daily life. Agents that act on their own, like booking trips or debugging code, feel closer. The model cuts token use too, making it cheaper for big jobs.

Experts watch how it holds up outside tests. Real tasks mix skills, and benchmarks don't catch all. Google says it excels in structured areas like finance and coding. Users test it now to see.

The release fits Google's bet on multimodal AI. Handling video or audio with text opens doors for apps in design, games, and analysis. As models grow, they push toward tools that think and act like pros in narrow fields.

Author

  • Lauren Whitmore

    Lauren Whitmore is an evening news anchor and senior correspondent at The News Gallery. With years of experience in broadcast style journalism, she provides authoritative coverage and thoughtful analysis of the day’s top stories. Whitmore is known for her calm presence, clarity, and ability to guide audiences through complex news cycles.

Leave a Reply

Your email address will not be published. Required fields are marked *