As of mid-May 2026, OpenAI’s GPT-5.5 Pro leads the FrontierMath leaderboard at 52.4 percent, while the latest Claude Opus 4.7 variant sits at 43.8 percent on Epoch AI’s set of unpublished, research-level math problems authored by expert mathematicians. Claude models have consistently shown relative weakness on pure mathematical reasoning benchmarks compared with their stronger performance on software-engineering tasks, according to Epoch’s Domain-specific Capabilities Index released this month. With June 30 approaching, traders are watching for any new model updates or extended-thinking techniques from Anthropic that could close the gap before resolution. Epoch AI’s ongoing human review of FrontierMath problems and upcoming open-problems workshops through early June may also produce revised scoring data that influences final outcomes.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트$61,944 거래량
50%+
52%
$61,944 거래량
50%+
52%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
마켓 개설일: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...As of mid-May 2026, OpenAI’s GPT-5.5 Pro leads the FrontierMath leaderboard at 52.4 percent, while the latest Claude Opus 4.7 variant sits at 43.8 percent on Epoch AI’s set of unpublished, research-level math problems authored by expert mathematicians. Claude models have consistently shown relative weakness on pure mathematical reasoning benchmarks compared with their stronger performance on software-engineering tasks, according to Epoch’s Domain-specific Capabilities Index released this month. With June 30 approaching, traders are watching for any new model updates or extended-thinking techniques from Anthropic that could close the gap before resolution. Epoch AI’s ongoing human review of FrontierMath problems and upcoming open-problems workshops through early June may also produce revised scoring data that influences final outcomes.
Polymarket 데이터를 참조하는 실험적 AI 생성 요약입니다. 이것은 거래 조언이 아니며 이 마켓의 정산에 영향을 미치지 않습니다. · 업데이트
외부 링크에 주의하세요.
외부 링크에 주의하세요.
자주 묻는 질문