Anthropic’s latest Claude iterations, including the Opus 4.7 and Mythos Preview variants, currently lead or sit near the top of Humanity’s Last Exam leaderboards with scores ranging from 36% to 65% depending on evaluation settings and model configuration. This performance stems from recent scaling improvements in reasoning depth and multimodal handling on the 2,500-question benchmark spanning expert mathematics, sciences, and humanities, outpacing most OpenAI GPT-5 releases and Google Gemini models in direct comparisons. Competitive pressure from ongoing model updates by rival labs continues to drive rapid iteration, while the benchmark’s design as a saturation-resistant evaluation helps explain sustained trader focus on whether Anthropic can push Claude past key accuracy thresholds before the June 30 deadline. No major new releases have been confirmed for the immediate weeks ahead, leaving room for potential last-minute capability gains or evaluation adjustments to influence final outcomes.
Ringkasan eksperimental yang dihasilkan AI dengan referensi data Polymarket. Ini bukan saran trading dan tidak berperan dalam bagaimana pasar ini diselesaikan. · Diperbarui$283,400 Vol.
45%+
18%
50%+
9%
55%+
4%
$283,400 Vol.
45%+
18%
50%+
9%
55%+
4%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Pasar Dibuka: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic’s latest Claude iterations, including the Opus 4.7 and Mythos Preview variants, currently lead or sit near the top of Humanity’s Last Exam leaderboards with scores ranging from 36% to 65% depending on evaluation settings and model configuration. This performance stems from recent scaling improvements in reasoning depth and multimodal handling on the 2,500-question benchmark spanning expert mathematics, sciences, and humanities, outpacing most OpenAI GPT-5 releases and Google Gemini models in direct comparisons. Competitive pressure from ongoing model updates by rival labs continues to drive rapid iteration, while the benchmark’s design as a saturation-resistant evaluation helps explain sustained trader focus on whether Anthropic can push Claude past key accuracy thresholds before the June 30 deadline. No major new releases have been confirmed for the immediate weeks ahead, leaving room for potential last-minute capability gains or evaluation adjustments to influence final outcomes.
Ringkasan eksperimental yang dihasilkan AI dengan referensi data Polymarket. Ini bukan saran trading dan tidak berperan dalam bagaimana pasar ini diselesaikan. · Diperbarui
Hati-hati dengan link eksternal.
Hati-hati dengan link eksternal.
Pertanyaan yang Sering Diajukan