Anthropic’s April 2026 releases of Claude Opus 4.7 and the Mythos Preview have lifted frontier Claude models to leading scores of 35–65% on Humanity’s Last Exam, a 2,500-question benchmark of expert-level reasoning across math, science, and humanities. These gains stem from scaled oversight techniques, extended chain-of-thought prompting, and tool integration that outperform earlier Claude versions while edging OpenAI’s GPT-5 series and Google’s Gemini 3.1 Pro on the public leaderboard. With June 30 just weeks away, trader focus centers on whether a new public Claude variant or benchmark update will push any model past the specific resolution thresholds before competitive releases or evaluation changes alter the picture.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于$283,400 交易量
45%以上
18%
50%+
9%
55%以上
4%
$283,400 交易量
45%以上
18%
50%+
9%
55%以上
4%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市场开放时间: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic’s April 2026 releases of Claude Opus 4.7 and the Mythos Preview have lifted frontier Claude models to leading scores of 35–65% on Humanity’s Last Exam, a 2,500-question benchmark of expert-level reasoning across math, science, and humanities. These gains stem from scaled oversight techniques, extended chain-of-thought prompting, and tool integration that outperform earlier Claude versions while edging OpenAI’s GPT-5 series and Google’s Gemini 3.1 Pro on the public leaderboard. With June 30 just weeks away, trader focus centers on whether a new public Claude variant or benchmark update will push any model past the specific resolution thresholds before competitive releases or evaluation changes alter the picture.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题