After days of controversy surrounding a flurry of antisemitic responses made lately by his Grok AI-powered chatbot on his social community X (previously Twitter), a seemingly unrepentant and unbothered Elon Musk launched the most recent model of his AI mannequin household, Grok 4, throughout an occasion livestreamed on X final evening, calling it the “the smartest AI in the world.”
As Musk posted on X: “Grok 4 is the first time, in my experience, that an AI has been able to solve difficult, real-world engineering questions where the answers cannot be found anywhere on the Internet or in books. And it will get much better.”
The brand new launch really contains two distinct fashions: Grok 4, a single-agent reasoning mannequin, and Grok 4 Heavy, a multi-agent system designed to resolve advanced issues by means of inside collaboration and synthesis.
Each fashions are optimized for reasoning duties and include native instrument integration, enabling capabilities corresponding to net search, code execution, and multimodal evaluation.
Musk and his group at xAI showcased benchmarks that counsel Grok 4 outperforms all present opponents throughout a variety of educational and coding evaluations, even in comparison with previously main AI reasoning mannequin rivals OpenAI o3 and Google Gemini.
Nonetheless, xAI has not but launched a mannequin card nor any official launch notes documentation for Grok 4 to the general public, making it difficult to independently assess efficiency and the claims made in the course of the stream. We’ll replace if/when these change into accessible.
Nor did Musk and his xAI group members taking part within the livestream handle the obvious controversy going through Grok over the previous week, together with many incidents of Grok making antisemitic remarks or referring to itself as “MechaHitler“, and suggesting that individuals with Jewish surnames ought to be dealt with decisively by Adolf Hitler — a seemingly overt reference to the Holocaust and genocide of 6 million Jews throughout World Battle 2.
The closest Musk got here was when he said: “The thing that I think is most important for AI safety—at least my biological neural net tells me the most important thing—is to be maximally truth-seeking,” and “We need to make sure that the AI is a good AI. Good Grok” in addition to “It’s important to instill the values you want in a child that would grow up to be incredibly powerful.”
Nonetheless, Musk didn’t apologize nor did he settle for accountability for Grok’s antisemitic, sexually offensive, and conspiratorial remarks. Right here’s a cop of the complete stream under:
All through the livestream, the group emphasised Grok 4’s skill to cause from first rules, right its personal errors, and doubtlessly invent new applied sciences or uncover novel scientific insights.
The presentation additionally included demonstrations of Grok 4 Heavy making use of multi-agent collaboration to sort out research-level issues throughout disciplines.
Availability and pricing
Grok 4 is offered now by means of a number of channels, relying on consumer kind and subscription degree:
API Entry (for builders and enterprises):Grok 4 and Grok 4 Heavy are stay through the xAI API. Pricing is structured as follows:
$3 per 1 million enter tokens
$15 per 1 million output tokens
$0.75 per 1 million cached enter tokens
Costs double after 128,000 tokens in a single context windowThe API helps textual content and picture inputs, operate calling, structured outputs, and presents a 256,000-token context window.
Client Entry (through Grok chatbot and apps):Particular person customers can entry Grok 4 by means of the Grok chatbot on X, the Grok app (iOS and Android), and X.com, however solely with one of many following subscriptions:
PremiumPlus: $16/month
SuperGrok: $300/month
A brand new “SuperGrok Heavy” tier, additionally priced at $300/month, gives entry to each Grok 4 and Grok 4 Heavy, the multi-agent variant.(Be aware: SuperGrok and PremiumPlus tiers might differ in availability and utilization quotas throughout X and Grok platforms.)
Launch Timing:Grok 4 grew to become accessible instantly following the July 9, 2025 livestream. Short-term entry limits have been in place in the course of the demo, however full rollout to subscribers started shortly after.
Platform Enlargement:xAI has indicated plans to make Grok 4 accessible by means of Microsoft Azure AI Foundry, the place Grok 3 and Grok 3 Mini are at the moment listed.
For subscription particulars, customers are directed to x.ai/grok and X Premium help. Right here’s the way it compares to different main AI fashions by way of pricing per million tokens.
Supplier & modelContext windowInput ($/Mtok)Cached inputOutput ($/Mtok)Extra notesxAI – Grok 4 / 4 Heavy256 Ok (2× worth >128 Ok)$3.00$0.75$15.00Image enter, operate calling, structured JSON (apidog)OpenAI – o3200 Ok$2.00$0.50$8.0050 % Batch-API low cost accessible (OpenAI, OpenAI Assist Middle)GPT-4o128 Ok$5.00$2.50$20.00Vision, audio, instruments (OpenAI)Anthropic – Claude Sonnet 4200 Ok$3.00$0.30$15.0050 % batch output low cost (Anthropic)Claude Opus 4200 Ok$15.00$1.50$75.00High-accuracy flagship (Anthropic)Google – Gemini 2.5 Pro200 Ok (2× worth >200 Ok)$1.25$0.31$10.0075 % cache hit low cost (Google AI for Builders, Google Cloud)Gemini 2.5 Flash200 Ok$0.30$0.075$2.50Fast, low-cost preview tier (Google Cloud)DeepSeek – deepseek-reasoner64 Ok$0.55 (miss) / $0.14 (hit)$0.14$2.1950-75 % off-peak low cost (DeepSeek API Docs)
In contrast to its predecessor Grok 3, launched in February, which separated tool-augmented responses from basic reasoning, Grok 4 was skilled with instruments from the beginning.
The mannequin integrates capabilities corresponding to code execution, net search, and doc parsing. It additionally introduces Grok 4 Heavy, a multi-agent system the place a number of inside fashions work in parallel to generate and validate solutions.
Grok 4 additionally features a new voice mode that includes expressive outputs with decreased latency, and it helps textual content and picture enter, structured outputs, and performance calling.
Efficiency highlights
The impartial AI mannequin evaluation and benchmarking group Synthetic Evaluation said on X that xAI supplied it with a model of Grok 4 (not Heavy) sooner than the general public launch for scoring.
On technical benchmarks, Grok 4 leads the Synthetic Evaluation Intelligence Index with a rating of 73, forward of opponents corresponding to OpenAI’s o3 (70) and Google’s Gemini 2.5 Professional (70).
It additionally recorded high scores in:
GPQA Diamond: 88%
ARC-AGI 2: 15.9%, double the second-best rating
Humanities Final Examination: 24% on the text-only model, and 44% with instruments
MMLU-Professional and AIME 2024: 87% and 94%, respectively
Coding and Math evaluations: Highest to this point on LiveCodeBench, SciCode, AIME24, and MATH-500
Regardless of its benchmark success, Grok 4’s output pace stands at 75 tokens per second—slower than fashions like Gemini 2.5 Flash (353) or OpenAI’s o3 (187), however nonetheless quicker than Anthropic’s Claude 4 Opus (66).
The mannequin includes a 256,000 token context window, which sits above the 200k context limits of o3 and Claude 4 Sonnet however under the 1 million tokens supplied by Gemini 2.5 Professional and GPT-4.1.
Actual world use instances
xAI supplied a number of demonstrations of Grok 4’s efficiency in utilized situations:
In a simulated enterprise job referred to as VendingBench, Grok 4 considerably outperformed different fashions in long-horizon monetary planning.
On the Arc Institute, researchers used Grok 4 to research CRISPR logs and uncover novel hypotheses.
In radiology, the mannequin interpreted chest X-rays with increased accuracy than main friends.
Within the monetary sector, its mixture of real-time knowledge entry and reasoning made it appropriate for forecasting and evaluation.
The mannequin may also create 3D video video games with minimal enter by autonomously sourcing and integrating property. Moreover, it demonstrated capabilities to simulate astrophysical occasions utilizing grounded approximations from revealed analysis.
Reception and dialogue
Trade response to the Grok 4 launch has been divided, mixing enthusiasm for its efficiency with criticism of the occasion’s supply and broader belief points.
David Shapiro, an AI energy consumer and author, famous: “Grok 4 now takes its place as ‘smart enough to actually help with frontier research’… but has merely caught up with OpenAI.”
Ethan Mollick, a professor at Wharton, remarked on X: “So Grok 3 has had three separate incidents where apparently unvetted changes to the deployed system caused a large-scale ethical issue and an emergency rollback. I don’t think you can do a Grok 4 launch that doesn’t at least address this honestly, if user trust matters,” later including, “Grok 3 was a very good model, and Grok 4 might be amazing but having a very good model is not enough – there are a lot of really good models out there. You actually want to trust the model you are building on.”
Ben Hyak, co-founder and CTO of AI product observability startup Raindrop AI (himself a former Musk worker) criticized the livestream itself: “This xAI livestream is one of the worst things I’ve ever watched in my life. Love y’all, but it’s bad.”
Regardless of the criticisms, benchmarking agency Synthetic Evaluation famous: “Grok 4 is now the leading AI model.”
Ongoing belief points
The launch of Grok 4 comes amid renewed criticism over Grok’s prior conduct in client deployments, significantly as a chatbot built-in into Musk’s social community, X.
Over the July 4 vacation and in subsequent days, Grok generated antisemitic and conspiratorial responses that reignited scrutiny over its system design and governance practices.
As reported by my VentureBeat colleague Michael F. Nuñez, Grok responded to questions on Jewish affect in Hollywood by asserting that Jewish executives “dominate leadership” at main studios and affect content material by means of “progressive ideologies,” and went on to rant about individuals of Jewish surnames as becoming a “pattern” of participating in “extreme leftist activism,” and suggesting Hitler knew “how to handle it decisively, every damn time,” an obvious reference to the Holocaust.
The conspiratorial and antisemitic posting was so prolific, the Anti-Defamation League (ADL), a preeminent U.S.-based non-profit combating anti-semitism and hatred, posted on July 8: “What we are seeing from Grok LLM right now is irresponsible, dangerous and antisemitic, plain and simple. This supercharging of extremist rhetoric will only amplify and encourage the antisemitism that is already surging on X and many other platforms.”
This incident follows a historical past of problematic Grok outputs, together with a Might 2025 case the place the Grok bot built-in into X randomly inserted references to a very nonsensical and non-real “white genocide” in South Africa into unrelated queries, and an earlier case whereby its system immediate was found to direct the Grok chatbot on X to keep away from referencing any sources that declared Musk and his former political funding beneficiary U.S. President Donald J. Trump as spreaders of misinformation. In each of those two instances, xAI blamed the behaviors on anonymous staff and stated they have been being addressed.
Already, at present, customers of Grok 4 on the buyer app have noticed it to as soon as once more be outputting anti-Zionist and anti-Semitic remarks:
As I beforehand famous, Musk has brazenly said on a number of events he needed to change Grok to higher replicate his private beliefs and mistrust in mainstream media and accredited sources. This makes it a poor supply in enterprise contexts the place such views may adversely impression customers and the companies constructing atop the Grok household of fashions.
My prior suggestion stays: For these within the enterprise attempting to make sure their enterprise’s AI merchandise work correctly and precisely… Grok is unfortunately greatest prevented. Fortunately, there are quite a few different options to select from.
Each day insights on enterprise use instances with VB Each day
If you wish to impress your boss, VB Each day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for max ROI.
An error occured.