Anthropic's Claude Opus 4.6 recently achieved 40.7% on the FrontierMath benchmark—a rigorous test of advanced mathematical reasoning featuring unsolved research problems across four tiers—trailing OpenAI's GPT-5.4 Pro at around 50%, per independent leaderboards. This reflects steady gains, including quadrupling Tier 4 performance from prior versions, driven by scaled training and refined reasoning chains, though Claude lags competitors like DeepMind on hardest tiers. Epoch AI's May 11 review flagged errors in a third of problems, potentially inflating or adjusting scores upon re-evaluation. With six weeks until resolution, traders eye unannounced Claude 4.7 or Mythos updates amid intensifying AI math rivalries, where model releases often shift standings abruptly.
Résumé expérimental généré par IA à partir des données Polymarket. Ceci n'est pas un conseil de trading et ne joue aucun rôle dans la résolution de ce marché. · Mis à jour$61,931 Vol.
50 %+
55%
$61,931 Vol.
50 %+
55%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Marché ouvert : Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...Anthropic's Claude Opus 4.6 recently achieved 40.7% on the FrontierMath benchmark—a rigorous test of advanced mathematical reasoning featuring unsolved research problems across four tiers—trailing OpenAI's GPT-5.4 Pro at around 50%, per independent leaderboards. This reflects steady gains, including quadrupling Tier 4 performance from prior versions, driven by scaled training and refined reasoning chains, though Claude lags competitors like DeepMind on hardest tiers. Epoch AI's May 11 review flagged errors in a third of problems, potentially inflating or adjusting scores upon re-evaluation. With six weeks until resolution, traders eye unannounced Claude 4.7 or Mythos updates amid intensifying AI math rivalries, where model releases often shift standings abruptly.
Résumé expérimental généré par IA à partir des données Polymarket. Ceci n'est pas un conseil de trading et ne joue aucun rôle dans la résolution de ce marché. · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes