Anthropic’s recent releases of Claude Opus 4.7 and the Mythos Preview have driven Claude models to leading positions on Humanity’s Last Exam, a rigorous 2,500-question benchmark testing expert-level reasoning across math, science, and humanities. Current top Claude variants post scores between 36% and 65% depending on the leaderboard and tool-augmented setups, outpacing OpenAI’s GPT-5 series and Google’s Gemini 3.1 Pro in several evaluations. Traders focus on whether further scaling, improved chain-of-thought techniques, or an imminent Claude 5 preview can push any public model across specific thresholds before the June 30 cutoff. Competitive pressure from rapid frontier releases and ongoing benchmark updates at major AI events could shift results quickly, while evaluation variations in tool use and reasoning effort introduce uncertainty for near-term outcomes.
基於Polymarket數據的AI實驗性摘要。這不是交易建議,也不影響該市場的結算方式。 · 更新於$283,406 交易量
45%+
18%
50%以上
9%
55%+
5%
$283,406 交易量
45%+
18%
50%以上
9%
55%+
5%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
市場開放時間: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic’s recent releases of Claude Opus 4.7 and the Mythos Preview have driven Claude models to leading positions on Humanity’s Last Exam, a rigorous 2,500-question benchmark testing expert-level reasoning across math, science, and humanities. Current top Claude variants post scores between 36% and 65% depending on the leaderboard and tool-augmented setups, outpacing OpenAI’s GPT-5 series and Google’s Gemini 3.1 Pro in several evaluations. Traders focus on whether further scaling, improved chain-of-thought techniques, or an imminent Claude 5 preview can push any public model across specific thresholds before the June 30 cutoff. Competitive pressure from rapid frontier releases and ongoing benchmark updates at major AI events could shift results quickly, while evaluation variations in tool use and reasoning effort introduce uncertainty for near-term outcomes.
基於Polymarket數據的AI實驗性摘要。這不是交易建議,也不影響該市場的結算方式。 · 更新於
警惕外部連結哦。
警惕外部連結哦。
Frequently Asked Questions