Google DeepMind's May 11 announcement of its AI co-mathematician agent—built atop Gemini 3.1 Pro—doubled performance to 48% on FrontierMath Tier 4, a benchmark of 50 research-level math problems that stump even experts, spotlighting the base model's 19% raw score amid Epoch AI's ongoing problem review flagged May 11. This agentic scaffolding, with parallel workflows and self-review, outpaces OpenAI's GPT-5.5 Pro at 39.6%, but traders focus on raw large language model evals where OpenAI's GPT-5.4 leads at 47.6%. With six weeks to June 30, sentiment hinges on potential Gemini updates or independent benchmark runs, as historical patterns show rapid scaling in AI math capabilities via new releases or compute boosts.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · AggiornatoPunteggio Google Gemini su FrontierMath Benchmark entro il 30 giugno?
Punteggio Google Gemini su FrontierMath Benchmark entro il 30 giugno?
$136,324 Vol.
40%+
86%
45%+
63%
50%+
64%
60%+
54%
$136,324 Vol.
40%+
86%
45%+
63%
50%+
64%
60%+
54%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Mercato aperto: Feb 6, 2026, 6:03 PM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...Google DeepMind's May 11 announcement of its AI co-mathematician agent—built atop Gemini 3.1 Pro—doubled performance to 48% on FrontierMath Tier 4, a benchmark of 50 research-level math problems that stump even experts, spotlighting the base model's 19% raw score amid Epoch AI's ongoing problem review flagged May 11. This agentic scaffolding, with parallel workflows and self-review, outpaces OpenAI's GPT-5.5 Pro at 39.6%, but traders focus on raw large language model evals where OpenAI's GPT-5.4 leads at 47.6%. With six weeks to June 30, sentiment hinges on potential Gemini updates or independent benchmark runs, as historical patterns show rapid scaling in AI math capabilities via new releases or compute boosts.
Riepilogo sperimentale generato dall'AI con riferimento ai dati di Polymarket. Questo non è un consiglio di trading e non ha alcun ruolo nella risoluzione di questo mercato. · Aggiornato
Fai attenzione ai link esterni.
Fai attenzione ai link esterni.
Domande frequenti