xAI's Grok 4, released in February 2026, currently sits below the 25% threshold on FrontierMath, a benchmark of unpublished research-level math problems that tests advanced reasoning far beyond standard evaluations like AIME or GPQA. OpenAI's o3 crossed 25% in late 2025, while Google DeepMind models have reached up to 48% on harder tiers through multi-agent approaches, highlighting xAI's competitive gap in frontier math despite strong showings on GPQA Diamond. Traders are watching for any near-term Grok updates, fine-tuning announcements, or test-time compute improvements before the June 30 deadline, though xAI has not confirmed new model drops in the next six weeks and historical release patterns suggest limited movement without a major capability leap.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于$20,870 交易量
25%+
57%
30%+
49%
40%以上
40%
50%以上
18%
$20,870 交易量
25%+
57%
30%+
49%
40%以上
40%
50%以上
18%
This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
市场开放时间: Jan 30, 2026, 12:01 AM ET
Resolver
0x65070BE91...This market will resolve according to the Epoch AI’s Frontier Math benchmarking leaderboard (https://epoch.ai/frontiermath) for Tier 1-3. Studies which are not included in the leaderboard (e.g. https://x.com/EpochAIResearch/status/1945905796904005720) will not be considered.
The primary resolution source will be information from EpochAI; however, a consensus of credible reporting may also be used.
Resolver
0x65070BE91...xAI's Grok 4, released in February 2026, currently sits below the 25% threshold on FrontierMath, a benchmark of unpublished research-level math problems that tests advanced reasoning far beyond standard evaluations like AIME or GPQA. OpenAI's o3 crossed 25% in late 2025, while Google DeepMind models have reached up to 48% on harder tiers through multi-agent approaches, highlighting xAI's competitive gap in frontier math despite strong showings on GPQA Diamond. Traders are watching for any near-term Grok updates, fine-tuning announcements, or test-time compute improvements before the June 30 deadline, though xAI has not confirmed new model drops in the next six weeks and historical release patterns suggest limited movement without a major capability leap.
基于Polymarket数据的AI实验性摘要。这不是交易建议,也不影响该市场的结算方式。 · 更新于
警惕外部链接哦。
警惕外部链接哦。
常见问题