Close Menu
  • Home
  • AI & Technology
  • Politics
  • Business
  • Cryptocurrency
  • Sports
  • Finance
  • Fitness
  • Gadgets
  • World
  • Marketing

Subscribe to Updates

Subscribe to our newsletter and never miss our latest news

Subscribe my Newsletter for New Posts & tips Let's stay updated!

What's Hot

XRP Price Meets Resistance, Tough Challenge Caps Upside Momentum

April 1, 2026

XRP Is Quietly Leaving Binance. A Hidden Signal Says Something Is Building Beneath It

April 1, 2026

Iran will be at FIFA World Cup and play in US, says Infantino

April 1, 2026
Facebook X (Twitter) Instagram
  • Home
  • About US
  • Advertise
  • Contact US
  • DMCA
  • Privacy Policy
  • Terms & Conditions
Facebook X (Twitter) Instagram
MNK NewsMNK News
  • Home
  • AI & Technology
  • Politics
  • Business
  • Cryptocurrency
  • Sports
  • Finance
  • Fitness
  • Gadgets
  • World
  • Marketing
MNK NewsMNK News
Home » Did xAI lie about Grok 3’s benchmarks?
Finance

Did xAI lie about Grok 3’s benchmarks?

MNK NewsBy MNK NewsFebruary 23, 2025No Comments3 Mins Read
Facebook Twitter Pinterest LinkedIn Tumblr Email
Share
Facebook Twitter LinkedIn Pinterest Email


Debates over AI benchmarks — and how they’re reported by AI labs — are spilling out into public view.

This week, an OpenAI employee accused Elon Musk’s AI company, xAI, of publishing misleading benchmark results for its latest AI model, Grok 3. One of the co-founders of xAI, Igor Babushkin, insisted that the company was in the right.

The truth lies somewhere in between.

In a post on xAI’s blog, the company published a graph showing Grok 3’s performance on AIME 2025, a collection of challenging math questions from a recent invitational mathematics exam. Some experts have questioned AIME’s validity as an AI benchmark. Nevertheless, AIME 2025 and older versions of the test are commonly used to probe a model’s math ability.

xAI’s graph showed two variants of Grok 3, Grok 3 Reasoning Beta and Grok 3 mini Reasoning, beating OpenAI’s best-performing available model, o3-mini-high, on AIME 2025. But OpenAI employees on X were quick to point out that xAI’s graph didn’t include o3-mini-high’s AIME 2025 score at “cons@64.”

What is cons@64, you might ask? Well, it’s short for “consensus@64,” and it basically gives a model 64 tries to answer each problem in a benchmark and takes the answers generated most frequently as the final answers. As you can imagine, cons@64 tends to boost models’ benchmark scores quite a bit, and omitting it from a graph might make it appear as though one model surpasses another when in reality, that’s isn’t the case.

Grok 3 Reasoning Beta and Grok 3 mini Reasoning’s scores for AIME 2025 at “@1” — meaning the first score the models got on the benchmark — fall below o3-mini-high’s score. Grok 3 Reasoning Beta also trails ever-so-slightly behind OpenAI’s o1 model set to “medium” computing. Yet xAI is advertising Grok 3 as the “world’s smartest AI.”

Babushkin argued on X that OpenAI has published similarly misleading benchmark charts in the past — albeit charts comparing the performance of its own models. A more neutral party in the debate put together a more “accurate” graph showing nearly every model’s performance at cons@64:

But as AI researcher Nathan Lambert pointed out in a post, perhaps the most important metric remains a mystery: the computational (and monetary) cost it took for each model to achieve its best score. That just goes to show how little most AI benchmarks communicate about models’ limitations — and their strengths.

This article originally appeared on TechCrunch at https://techcrunch.com/2025/02/22/did-xai-lie-about-grok-3s-benchmarks/



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
MNK News
  • Website

Related Posts

Rite Aid files for bankruptcy — again

May 6, 2025

How to Track Driver Performance Without Micromanaging

May 6, 2025

Ford says its Q1 profit fell by two-thirds and it expects a $1.5 billion hit from tariffs this year

May 6, 2025
Add A Comment
Leave A Reply Cancel Reply

Editors Picks

Iran will be at FIFA World Cup and play in US, says Infantino

April 1, 2026

Fakhar Zaman suspended for two PSL matches for ball-tampering

March 31, 2026

Raza admits hosting visitors but cites lack of awareness of new PSL rules

March 30, 2026

Fast bowler Naseem Shah slapped with Rs20m fine after social media post about Punjab CM Maryam

March 30, 2026
Our Picks

XRP Price Meets Resistance, Tough Challenge Caps Upside Momentum

April 1, 2026

XRP Is Quietly Leaving Binance. A Hidden Signal Says Something Is Building Beneath It

April 1, 2026

Ethereum Price Recovery Picks Up, Is a Breakout Now Brewing?

March 31, 2026

Recent Posts

  • XRP Price Meets Resistance, Tough Challenge Caps Upside Momentum
  • XRP Is Quietly Leaving Binance. A Hidden Signal Says Something Is Building Beneath It
  • Iran will be at FIFA World Cup and play in US, says Infantino
  • Ethereum Price Recovery Picks Up, Is a Breakout Now Brewing?
  • Birthright citizenship case argued before Supreme Court

Recent Comments

No comments to show.
MNK News
Facebook X (Twitter) Instagram Pinterest Vimeo YouTube
  • Home
  • About US
  • Advertise
  • Contact US
  • DMCA
  • Privacy Policy
  • Terms & Conditions
© 2026 mnknews. Designed by mnknews.

Type above and press Enter to search. Press Esc to cancel.