Anthropic's Claude Opus 4.7 model, evaluated on April 16, 2026, achieved a leading 43.8% score on FrontierMath Tiers 1-4—the Epoch AI benchmark testing frontier mathematical reasoning on expert-vetted problems, including open research challenges—with 22.9% on the hardest Tier 4, tying or edging OpenAI's GPT-5.2 in trader-implied consensus for advanced AI capabilities. This marks explosive progress from prior Claude 4.x single-digit results, fueled by iterative scaling and compute optimizations amid fierce rivalry with OpenAI's GPT-5 series, which holds slight edges on some leaderboards around 50%. No new evals in the past month temper momentum, but upcoming model previews like Claude Mythos or Opus 4.8 before June 30 could push scores higher, resolving market thresholds amid benchmark contamination risks and rapid capability shifts.
Експериментальне резюме, згенероване ШІ з посиланням на дані Polymarket. Це не торгова порада і не впливає на вирішення цього ринку. · ОновленоAnthropic Claude score on FrontierMath Benchmark by June 30?
Anthropic Claude score on FrontierMath Benchmark by June 30?
$61,931 Обс.
50%+
55%
$61,931 Обс.
50%+
55%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Ринок відкрито: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...Anthropic's Claude Opus 4.7 model, evaluated on April 16, 2026, achieved a leading 43.8% score on FrontierMath Tiers 1-4—the Epoch AI benchmark testing frontier mathematical reasoning on expert-vetted problems, including open research challenges—with 22.9% on the hardest Tier 4, tying or edging OpenAI's GPT-5.2 in trader-implied consensus for advanced AI capabilities. This marks explosive progress from prior Claude 4.x single-digit results, fueled by iterative scaling and compute optimizations amid fierce rivalry with OpenAI's GPT-5 series, which holds slight edges on some leaderboards around 50%. No new evals in the past month temper momentum, but upcoming model previews like Claude Mythos or Opus 4.8 before June 30 could push scores higher, resolving market thresholds amid benchmark contamination risks and rapid capability shifts.
Експериментальне резюме, згенероване ШІ з посиланням на дані Polymarket. Це не торгова порада і не впливає на вирішення цього ринку. · Оновлено
Обережно з зовнішніми посиланнями.
Обережно з зовнішніми посиланнями.
Часті запитання