Anthropic’s latest Claude Opus 4.7 variant currently trails the FrontierMath leaderboard, posting 43.8 percent against OpenAI’s GPT-5.5 Pro at 52.4 percent on this Epoch AI benchmark of original research-level math problems. Claude models continue to show relative strength on software-engineering tasks while underperforming on pure mathematical reasoning compared with their general capabilities index. Epoch researchers are still correcting roughly one-third of FrontierMath items, which could shift reported scores once the cleaned dataset is released. No major Claude math-specific update has been announced in the past month, leaving traders focused on whether an incremental training run or adaptive fine-tuning before June 30 can close the roughly nine-point gap to the current leader.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于$61,944 交易量
50%+
52%
$61,944 交易量
50%+
52%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
市场开放时间: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...Anthropic’s latest Claude Opus 4.7 variant currently trails the FrontierMath leaderboard, posting 43.8 percent against OpenAI’s GPT-5.5 Pro at 52.4 percent on this Epoch AI benchmark of original research-level math problems. Claude models continue to show relative strength on software-engineering tasks while underperforming on pure mathematical reasoning compared with their general capabilities index. Epoch researchers are still correcting roughly one-third of FrontierMath items, which could shift reported scores once the cleaned dataset is released. No major Claude math-specific update has been announced in the past month, leaving traders focused on whether an incremental training run or adaptive fine-tuning before June 30 can close the roughly nine-point gap to the current leader.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题