Tag: benchmarks

MemRL outperforms RAG on complicated agent benchmarks with out fine-tuning

A brand new approach developed by researchers at Shanghai Jiao Tong College…

Editorial Board

Synthetic Evaluation overhauls its AI Intelligence Index, changing common benchmarks with 'real-world' exams

The arms race to construct smarter AI fashions has a measurement downside:…

Editorial Board

Ai2's new Olmo 3.1 extends reinforcement studying coaching for stronger reasoning benchmarks

The Allen Institute for AI (Ai2) not too long ago launched what…

Editorial Board

Google unveils Gemini 3 claiming the lead in math, science, multimodal and agentic AI benchmarks

After greater than a month of rumors and feverish hypothesis — together…

Editorial Board

Author launches a ‘super agent’ that truly will get sh*t completed, outperforms OpenAI on key benchmarks

Author, the enterprise synthetic intelligence firm valued at $1.9 billion, launched an…

Editorial Board

Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free

Moonshot AI, the Chinese language synthetic intelligence startup behind the favored Kimi…

Editorial Board