Trader sentiment on Claude achieving a milestone score on Humanity’s Last Exam—a frontier benchmark of 2,500 expert-level questions testing AI reasoning across math, science, and humanities—hinges on the Scale Labs leaderboard, which evaluates models without tools at temperature 0.0. Anthropic's Claude Opus 4.7, released April 16, 2026, scores around 36-47% there, trailing recent frontrunners like xAI's Grok 4 at 50.7% as of early May, while self-reported aggregator scores inflate Claude Mythos Preview (unreleased) to 64.7%. Rapid 2026 releases from Anthropic signal potential for Opus 4.8 or Claude 5 before June 30, but leaderboard verification lags and competitive pressure from OpenAI's GPT-5.5 and Google's Gemini 3.1 keep probabilities modest amid slipping timelines and benchmark discrepancies.
Tóm tắt AI thử nghiệm tham chiếu dữ liệu Polymarket. Đây không phải tư vấn giao dịch và không ảnh hưởng đến cách thị trường này được giải quyết. · Cập nhật$283,400 KL.
45%+
18%
50%+
9%
55%+
4%
$283,400 KL.
45%+
18%
50%+
9%
55%+
4%
The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Thị trường mở: Jan 30, 2026, 12:00 AM ET
Resolver
0x65070BE91...The resolution source will be the official Humanity’s Last Exam leaderboard https://scale.com/leaderboard/humanitys_last_exam.
Resolver
0x65070BE91...Trader sentiment on Claude achieving a milestone score on Humanity’s Last Exam—a frontier benchmark of 2,500 expert-level questions testing AI reasoning across math, science, and humanities—hinges on the Scale Labs leaderboard, which evaluates models without tools at temperature 0.0. Anthropic's Claude Opus 4.7, released April 16, 2026, scores around 36-47% there, trailing recent frontrunners like xAI's Grok 4 at 50.7% as of early May, while self-reported aggregator scores inflate Claude Mythos Preview (unreleased) to 64.7%. Rapid 2026 releases from Anthropic signal potential for Opus 4.8 or Claude 5 before June 30, but leaderboard verification lags and competitive pressure from OpenAI's GPT-5.5 and Google's Gemini 3.1 keep probabilities modest amid slipping timelines and benchmark discrepancies.
Tóm tắt AI thử nghiệm tham chiếu dữ liệu Polymarket. Đây không phải tư vấn giao dịch và không ảnh hưởng đến cách thị trường này được giải quyết. · Cập nhật
Cẩn thận với liên kết bên ngoài.
Cẩn thận với liên kết bên ngoài.
Câu hỏi thường gặp