OpenAI’s GPT-5.4 Pro and related variants currently post 41-44% accuracy on Humanity’s Last Exam, the 2,500-question expert-level benchmark spanning mathematics, sciences, and specialized domains, placing them behind Claude Fable 5 at 53% and near Gemini 3.1 previews around 45%. With resolution in twelve days and no confirmed major model release or undisclosed optimization from OpenAI on the horizon, trader consensus assigns near-certainty to thresholds at or below 40% while viewing jumps above 50% as improbable absent sudden capability gains. Stable leaderboard trends, competitive pressure from Anthropic and Google DeepMind, and the benchmark’s resistance to rapid saturation reinforce expectations that existing frontier large language model performance will hold through June 30.
Résumé expérimental généré par IA à partir des données Polymarket. Ceci n'est pas un conseil de trading et ne joue aucun rôle dans la résolution de ce marché. · Mis à jour$28,638 Vol.
50 %+
2%
$28,638 Vol.
50 %+
2%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Marché ouvert : Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...OpenAI’s GPT-5.4 Pro and related variants currently post 41-44% accuracy on Humanity’s Last Exam, the 2,500-question expert-level benchmark spanning mathematics, sciences, and specialized domains, placing them behind Claude Fable 5 at 53% and near Gemini 3.1 previews around 45%. With resolution in twelve days and no confirmed major model release or undisclosed optimization from OpenAI on the horizon, trader consensus assigns near-certainty to thresholds at or below 40% while viewing jumps above 50% as improbable absent sudden capability gains. Stable leaderboard trends, competitive pressure from Anthropic and Google DeepMind, and the benchmark’s resistance to rapid saturation reinforce expectations that existing frontier large language model performance will hold through June 30.
Résumé expérimental généré par IA à partir des données Polymarket. Ceci n'est pas un conseil de trading et ne joue aucun rôle dans la résolution de ce marché. · Mis à jour
Méfiez-vous des liens externes.
Méfiez-vous des liens externes.
Questions fréquentes