RATS! Patches Talk Through Registers: Emergent Parts in Register Attention Transformers
When humans see a bird, they recognize far more than just "bird" -- they see a head, wings, and talons, a structured ...
Source Evidence
What Changed
When humans see a bird, they recognize far more than just "bird" -- they see a head, wings, and talons, a structured ...
Why It Matters
**Why it matters** RATS introduces a minimalist architectural prior that implicitly learns part‑level knowledge without supervision, boosting segmentation performance while providing a reusable, interpretable register dictionary that can transfer across categories—an attractive trade‑off for both research and production systems seeking modular, explainable vision models.
Confirmed Facts
When humans see a bird, they recognize far more than just "bird" -- they see a head, wings, and talons, a structured assembly of reusable parts that can be identified across every bird they have ever seen. We ask whether a self-supervised visual model can discover the same compositional structure on its own. To this end, we propose RATS (Register Attention Transformers), which decomposes the classification token into N learnable register tokens that route patch information through an L->N->N->L bottleneck via a three-step compress-communicate-broadcast attention. The N registers are partitioned across the H attention heads, so that registers assigned to different heads do not interact with each other. Without auxiliary losses or part annotations, each register spontaneously specializes into a proto-semantic region whose emerging structure resembles object parts. RATS surpasses all baselines by +12 mIoU on average across five segmentation benchmarks, with consistent gains on ADE20K (+1.11 mIoU) and COCO (+0.2 AP^m). Its register dictionary further exhibits part-level consistency and semantic proximity across related categories. Our results suggest that RATS may provide a useful architectural prior for structured and interpretable visual representation learning.
Who Is Affected
- AI product teams
What To Watch Next
- Watch for independent replications, benchmark scrutiny, and whether labs turn this work into shipped systems.
- Look for corroboration from an official source or a second reliable report.
- Watch whether additional sources confirm the same claim.
Still Developing
- The claim is plausible but still developing.
You will be redirected to arxiv.org.