AI PulseAI Market Pulse

Market dataTradingView

Intelligence Terminal

Search AI News

Source-backed intelligence across model releases, research, policy, tools, funding, and companies.

Loading latest AI news...

Intelligence Terminal

Search AI News

Source-backed intelligence across model releases, research, policy, tools, funding, and companies.

24 signals found for "Benchmark"

companiesNVIDIA Blog (Shruti Koparkar)1d ago

65SIG

98CONF

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

AgentPerf from Artificial Analysis, the industry’s first agentic AI benchmark, gives developers, enterprises and infr...

Nvidia

Research

Nvidia

Read Brief Source

companiesNVIDIA Developer Blog1d ago

65SIG

98CONF

NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to ...

Nvidia

Research

Nvidia

Read Brief Source

researcharXiv (Dian Zheng)2d ago

40SIG

98CONF

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-im...

GPT

Models

OpenAI

Read Brief Source

fundingVentureBeat AI (michael.nunez@venturebeat.com (Michael Nuñez))Jan 22

42.5SIG

40.5CONF

Railway secures $100 million to challenge AWS with AI-native cloud infrastructure

Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a doll...

Microsoft

Funding

Microsoft

OpenAI

Read Brief Source

modelsVentureBeat AI (michael.nunez@venturebeat.com (Michael Nuñez))25d ago

32.5SIG

36.5CONF

Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.

For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin wh...

Google DeepMind

Chips

Google DeepMind

Read Brief Source

researchGitHub (sergiunicoara)18h ago

55SIG

11CONF

sergiunicoara/Generative-AI released an update

Production GraphRAG portfolio — knowledge graph platform (Neo4j, hybrid retrieval, GNN reranking, RAGAS-evaluated), R...

Research

Benchmark

Read Brief Source

researchGitHub (eeshansrivastava89)19h ago

55SIG

33CONF

eeshansrivastava89/offgrid-ai released an update

Privacy-first CLI for running local LLMs — discover, configure, run, benchmark

Research

LLM

Read Brief Source

researchGitHub (EshaRana17)10h ago

55SIG

11CONF

EshaRana17/rp-mentalbench-unified-evaluation released an update

Research Proposal: MentalBench - unified benchmark evaluating LLMs on clinical knowledge, empathy, and safety simulta...

Research

LLM

Benchmark

Read Brief Source

researcharXiv (Jundong Xu)2d ago

45SIG

90CONF

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluation...

Research

Read Brief Source

modelsGitHub (Oaklight)23h ago

55SIG

35CONF

Oaklight/openvino-meteor-lake-ai-inference released an update

AI inference benchmarks on Intel Meteor Lake (Core Ultra 7 155H) iGPU — OpenVINO embeddings, OpenVINO GenAI LLM, and ...

Meta AI

Models

Meta AI

Read Brief Source

researchGitHub (alvesmaia)1d ago

55SIG

35CONF

alvesmaia/llm-benchmark released an update

Benchmark de LLMs como code agents (desafio ETL CEP Correios), inspirado na metodologia do Akita

Research

Read Brief Source

researcharXiv (Seokju Cho)2d ago

45SIG

90CONF

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fu...

Research

Read Brief Source

researchIEEE Spectrum AI (Edd Gent)4d ago

32.5SIG

32.5CONF

AI Can Help Track the World’s Shrinking Glaciers

Tracking how fast glaciers are shrinking is crucial for measuring the pace of climate change and projecting future se...

Meta AI

Research

Meta AI

Read Brief Source

modelsVentureBeat AI (michael.nunez@venturebeat.com (Michael Nuñez))Jan 19

47.5SIG

40.5CONF

Claude Code costs up to $200 a month. Goose does the same thing for free.

The artificial intelligence coding revolution comes with a catch: it's expensive. Claude Code, Anthropic's terminal-b...

Anthropic

Models

Anthropic

OpenAI

Read Brief Source

researchThe Verge AI1d ago

52.5SIG

52.5CONF

Elon Musk is the world’s first trillionaire

Elon Musk's net worth has passed the trillion-dollar mark after SpaceX's IPO. His net worth, which was hovering aroun...

Research

Read Brief Source

modelsarXiv (Zilin Xiao)2d ago

45SIG

90CONF

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowle...

Qwen

Models

Qwen

Read Brief Source

modelsarXiv (Junke Wang)2d ago

45SIG

90CONF

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

This work presents RepWAM, a representation-centric world action model (WAM) built on representation visual-action to...

Models

Read Brief Source

researcharXiv (Daichi Azuma)2d ago

60SIG

90CONF

NavWAM: A Navigation World Action Model for Goal-Conditioned Visual Navigation

Goal-conditioned visual navigation requires a robot to act under partial observability by anticipating how its motion...

Research

Read Brief Source

researchAxios (Amy Harder)2d ago

52.5SIG

52.5CONF

Amazon touts water savings amid data center pushback

Amazon says its data centers use water more efficiently than the industry average and is urging others to improve as ...

Google

Research

Google

Read Brief Source

researcharXiv (Yeongseo Jung)3d ago

60SIG

90CONF

Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

Modern conversational agents condition on an ever-growing dialogue history at each turn, incurring redundant attentio...

Perplexity

Research

Perplexity

Read Brief Source

modelsarXiv (Kian R. Weihrauch)3d ago

45SIG

90CONF

How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology

General-purpose large language models (LLMs) are routinely used as baselines when evaluating specialized pathology mo...

Models

Read Brief Source

researcharXiv (Zefu Lin)3d ago

45SIG

90CONF

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Vision-Language-Action (VLA) models inherit semantic grounding from large-scale pretraining and perform competently a...

Research

Read Brief Source

modelsarXiv (Xingjian Diao)3d ago

45SIG

90CONF

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Long input sequences are central to document understanding and multi-step reasoning in Large Language Models, yet the...

Models

Read Brief Source

researcharXiv (Haotao Xie)3d ago

45SIG

90CONF

System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translatio...

Research

Read Brief Source

24 signals found for "Benchmark"

companiesNVIDIA Blog (Shruti Koparkar)1d ago

65SIG

98CONF

NVIDIA Blackwell Leads on First Agentic AI Infrastructure Benchmark

AgentPerf from Artificial Analysis, the industry’s first agentic AI benchmark, gives developers, enterprises and infr...

Nvidia

Research

Nvidia

Read Brief Source

companiesNVIDIA Developer Blog1d ago

65SIG

98CONF

NVIDIA Achieves Leading Agentic Coding Performance on First Agentic AI Benchmark

AI agents have fundamentally changed the complexity of inference workloads. Until now, the industry has struggled to ...

Nvidia

Research

Nvidia

Read Brief Source

researcharXiv (Dian Zheng)2d ago

40SIG

98CONF

InterleaveThinker: Reinforcing Agentic Interleaved Generation

Recent image generators have demonstrated impressive photorealism and instruction-following capabilities in single-im...

GPT

Models

OpenAI

Read Brief Source

fundingVentureBeat AI (michael.nunez@venturebeat.com (Michael Nuñez))Jan 22

42.5SIG

40.5CONF

Railway secures $100 million to challenge AWS with AI-native cloud infrastructure

Railway, a San Francisco-based cloud platform that has quietly amassed two million developers without spending a doll...

Microsoft

Funding

Microsoft

OpenAI

Read Brief Source

modelsVentureBeat AI (michael.nunez@venturebeat.com (Michael Nuñez))25d ago

32.5SIG

36.5CONF

Google just redesigned the search box for the first time in 25 years — here’s why it matters more than you think.

For a quarter century, the Google search box has been one of the most recognizable interfaces in computing: a thin wh...

Google DeepMind

Chips

Google DeepMind

Read Brief Source

researchGitHub (sergiunicoara)18h ago

55SIG

11CONF

sergiunicoara/Generative-AI released an update

Production GraphRAG portfolio — knowledge graph platform (Neo4j, hybrid retrieval, GNN reranking, RAGAS-evaluated), R...

Research

Benchmark

Read Brief Source

researchGitHub (eeshansrivastava89)19h ago

55SIG

33CONF

eeshansrivastava89/offgrid-ai released an update

Privacy-first CLI for running local LLMs — discover, configure, run, benchmark

Research

LLM

Read Brief Source

researchGitHub (EshaRana17)10h ago

55SIG

11CONF

EshaRana17/rp-mentalbench-unified-evaluation released an update

Research Proposal: MentalBench - unified benchmark evaluating LLMs on clinical knowledge, empathy, and safety simulta...

Research

LLM

Benchmark

Read Brief Source

researcharXiv (Jundong Xu)2d ago

45SIG

90CONF

EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments

Large language model (LLM) agents have achieved strong performance on a wide range of benchmarks, yet most evaluation...

Research

Read Brief Source

modelsGitHub (Oaklight)23h ago

55SIG

35CONF

Oaklight/openvino-meteor-lake-ai-inference released an update

AI inference benchmarks on Intel Meteor Lake (Core Ultra 7 155H) iGPU — OpenVINO embeddings, OpenVINO GenAI LLM, and ...

Meta AI

Models

Meta AI

Read Brief Source

researchGitHub (alvesmaia)1d ago

55SIG

35CONF

alvesmaia/llm-benchmark released an update

Benchmark de LLMs como code agents (desafio ETL CEP Correios), inspirado na metodologia do Akita

Research

Read Brief Source

researcharXiv (Seokju Cho)2d ago

45SIG

90CONF

SpatialClaw: Rethinking Action Interface for Agentic Spatial Reasoning

Spatial reasoning, the ability to determine where objects are, how they relate, and how they move in 3D, remains a fu...

Research

Read Brief Source

researchIEEE Spectrum AI (Edd Gent)4d ago

32.5SIG

32.5CONF

AI Can Help Track the World’s Shrinking Glaciers

Tracking how fast glaciers are shrinking is crucial for measuring the pace of climate change and projecting future se...

Meta AI

Research

Meta AI

Read Brief Source

modelsVentureBeat AI (michael.nunez@venturebeat.com (Michael Nuñez))Jan 19

47.5SIG

40.5CONF

Claude Code costs up to $200 a month. Goose does the same thing for free.

The artificial intelligence coding revolution comes with a catch: it's expensive. Claude Code, Anthropic's terminal-b...

Anthropic

Models

Anthropic

OpenAI

Read Brief Source

researchThe Verge AI1d ago

52.5SIG

52.5CONF

Elon Musk is the world’s first trillionaire

Elon Musk's net worth has passed the trillion-dollar mark after SpaceX's IPO. His net worth, which was hovering aroun...

Research

Read Brief Source

modelsarXiv (Zilin Xiao)2d ago

45SIG

90CONF

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowle...

Qwen

Models

Qwen

Read Brief Source

modelsarXiv (Junke Wang)2d ago

45SIG

90CONF

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

This work presents RepWAM, a representation-centric world action model (WAM) built on representation visual-action to...

Models

Read Brief Source

researcharXiv (Daichi Azuma)2d ago

60SIG

90CONF

NavWAM: A Navigation World Action Model for Goal-Conditioned Visual Navigation

Goal-conditioned visual navigation requires a robot to act under partial observability by anticipating how its motion...

Research

Read Brief Source

researchAxios (Amy Harder)2d ago

52.5SIG

52.5CONF

Amazon touts water savings amid data center pushback

Amazon says its data centers use water more efficiently than the industry average and is urging others to improve as ...

Google

Research

Google

Read Brief Source

researcharXiv (Yeongseo Jung)3d ago

60SIG

90CONF

Context-Driven Incremental Compression for Multi-Turn Dialogue Generation

Modern conversational agents condition on an ever-growing dialogue history at each turn, incurring redundant attentio...

Perplexity

Research

Perplexity

Read Brief Source

modelsarXiv (Kian R. Weihrauch)3d ago

45SIG

90CONF

How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology

General-purpose large language models (LLMs) are routinely used as baselines when evaluating specialized pathology mo...

Models

Read Brief Source

researcharXiv (Zefu Lin)3d ago

45SIG

90CONF

World Pilot: Steering Vision-Language-Action Models with World-Action Priors

Vision-Language-Action (VLA) models inherit semantic grounding from large-scale pretraining and perform competently a...

Research

Read Brief Source

modelsarXiv (Xingjian Diao)3d ago

45SIG

90CONF

Doc-to-Atom: Learning to Compile and Compose Memory Atoms

Long input sequences are central to document understanding and multi-step reasoning in Large Language Models, yet the...

Models

Read Brief Source

researcharXiv (Haotao Xie)3d ago

45SIG

90CONF

System Report for CCL25-Eval Task 5: New Dataset and LoRA-Fine-Tuned Qwen2.5

Recently, large language models (LLMs) have achieved promising progress in the fields of classical Chinese translatio...

Research

Read Brief Source