Trader consensus on Polymarket prices a 77.5% implied probability against any AI model achieving ≥90% on the FrontierMath benchmark before 2027, driven by top scores plateauing around 50% despite accelerating progress on this Epoch AI test of research-level math problems. OpenAI's GPT-5.5 Pro leads third-party leaderboards at 52.4% as of mid-May 2026, up from GPT-5.4 Pro's 38-50% on Tiers 1-4 earlier this year, fueled by enhanced reasoning scaffolds and scaling. However, Epoch AI's May 11 update flagged fatal errors in about one-third of problems via AI review, risking score revisions downward. DeepMind's multi-agent "co-mathematician" claimed 48% on Tier 4 but used non-standard 48-hour evals, underscoring evaluation inconsistencies. With seven months left, no confirmed next-gen releases like GPT-6 signal a path to near-perfect performance on unsolved proofs.
Resumen experimental generado por IA con datos de Polymarket. Esto no es asesoramiento de trading y no influye en cómo se resuelve este mercado. · ActualizadoSí
$66,262 Vol.
$66,262 Vol.
Sí
$66,262 Vol.
$66,262 Vol.
The primary resolution source will be information from EpochAI however a consensus of credible reporting may also be used.
Mercado abierto: Nov 12, 2025, 5:15 PM ET
Resolver
0x65070BE91...The primary resolution source will be information from EpochAI however a consensus of credible reporting may also be used.
Resolver
0x65070BE91...Trader consensus on Polymarket prices a 77.5% implied probability against any AI model achieving ≥90% on the FrontierMath benchmark before 2027, driven by top scores plateauing around 50% despite accelerating progress on this Epoch AI test of research-level math problems. OpenAI's GPT-5.5 Pro leads third-party leaderboards at 52.4% as of mid-May 2026, up from GPT-5.4 Pro's 38-50% on Tiers 1-4 earlier this year, fueled by enhanced reasoning scaffolds and scaling. However, Epoch AI's May 11 update flagged fatal errors in about one-third of problems via AI review, risking score revisions downward. DeepMind's multi-agent "co-mathematician" claimed 48% on Tier 4 but used non-standard 48-hour evals, underscoring evaluation inconsistencies. With seven months left, no confirmed next-gen releases like GPT-6 signal a path to near-perfect performance on unsolved proofs.
Resumen experimental generado por IA con datos de Polymarket. Esto no es asesoramiento de trading y no influye en cómo se resuelve este mercado. · Actualizado
Cuidado con los enlaces externos.
Cuidado con los enlaces externos.
Preguntas frecuentes