Anthropic's Claude Opus 4.7 recently posted scores around 30-40% on Humanity's Last Exam, a 2,500-question benchmark spanning frontier-level mathematics, science, and humanities that was designed to resist rapid saturation. This performance positions Claude competitively against OpenAI's GPT-5 variants and Google's Gemini models, which hover in similar ranges, while a Claude Mythos Preview has reached 64.7% in some evaluations. Traders are watching for potential mid-June model updates or reasoning enhancements that could push accuracy higher before the June 30 deadline, especially given Anthropic's history of iterative releases improving benchmark results within weeks. Key swing factors include any new capability demonstrations, such as advanced chain-of-thought techniques or expanded context handling, amid ongoing competition for leadership on this challenging evaluation.
Resumen experimental generado por IA con datos de Polymarket. Esto no es asesoramiento de trading y no influye en cómo se resuelve este mercado. · Actualizado$283,400 Vol.
45%+
18%
50%+
9%
55%+
4%
$283,400 Vol.
45%+
18%
50%+
9%
55%+
4%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Mercado abierto: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Anthropic's Claude Opus 4.7 recently posted scores around 30-40% on Humanity's Last Exam, a 2,500-question benchmark spanning frontier-level mathematics, science, and humanities that was designed to resist rapid saturation. This performance positions Claude competitively against OpenAI's GPT-5 variants and Google's Gemini models, which hover in similar ranges, while a Claude Mythos Preview has reached 64.7% in some evaluations. Traders are watching for potential mid-June model updates or reasoning enhancements that could push accuracy higher before the June 30 deadline, especially given Anthropic's history of iterative releases improving benchmark results within weeks. Key swing factors include any new capability demonstrations, such as advanced chain-of-thought techniques or expanded context handling, amid ongoing competition for leadership on this challenging evaluation.
Resumen experimental generado por IA con datos de Polymarket. Esto no es asesoramiento de trading y no influye en cómo se resuelve este mercado. · Actualizado
Cuidado con los enlaces externos.
Cuidado con los enlaces externos.
Preguntas frecuentes