April 7, 2026
CocoaBench v1.0
We release CocoaBench v1.0 — 153 human-authored tasks evaluating unified digital agents across vision, search, and coding — alongside the Cocoa-Agent scaffold and a full results analysis.
December 2, 2025
Towards General Agent with Compositional Cognitive Abilities
Why existing benchmarks fall short for general agents, and how CocoaBench addresses the gap by focusing on the cognitive abilities — perception, reasoning, and memory — that matter across domains.