Blog

Updates and research notes from the CocoaBench team.

CocoaBench v1.0
We release CocoaBench v1.0 — 153 human-authored tasks evaluating unified digital agents across vision, search, and coding — alongside the Cocoa-Agent scaffold and a full results analysis.
Towards General Agent with Compositional Cognitive Abilities
Why existing benchmarks fall short for general agents, and how CocoaBench addresses the gap by focusing on the cognitive abilities — perception, reasoning, and memory — that matter across domains.