RepSelect: Robust LLM Unlearning via Representation Selectivity

Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilit...

Signal 59

Source Confidence 80%

Claim Status: verified

Source Evidence

Verified

Signal 59

Source Confidence 80%

Primary Source

Hugging Face (Filip Sondej)

huggingface.co

Source Type

research

Source Published

Jun 14, 2026, 20:00 UTC

AIFreshWire Pipeline

Ingested: 6 days ago / Jun 17, 2026, 11:16 UTC

Last checked: 6 days ago / Jun 17, 2026, 11:16 UTC

What Changed

Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilit...

Why It Matters

RepSelect achieves genuinely deep forgetting by isolating only the representations that encode the forget‑set, leaving the rest of the model’s knowledge untouched; this lets deployable LLMs scrub sensitive or harmful data without retraining or performance loss. The technique’s resilience to fine‑tuning and prompting attacks gives operators a practical path to enforce compliance, auditability, and safe‑harbor release, positioning companies that adopt it as early movers in the forthcoming regulation‑driven market.

Confirmed Facts

Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is only shallow. We identify the root cause. Existing methods target representations shared with both the retain set and the subspace recovered by a fine-tuning attacker, making unlearning both disruptive to general capabilities and easy to reverse. We propose RepSelect (Representation Selectivity), isolates forget-set-specific representations by collapsing top principal components of weight gradients before each update, leaving general capabilities intact while limiting what fine-tuning can recover. We evaluate across two forget categories, biohazardous knowledge and abusive tendencies, and four model families spanning dense and Mixture-of-Experts architectures (Llama 3, Qwen 3.5, Gemma 4 E4B, DeepSeek V2 Lite). Compared to five popular baselines (GradDiff, NPO, SimNPO, RMU, UNDIAL), RepSelect achieves a 4-50x larger reduction in post-relearning answer accuracy than the strongest baseline, and is near-perfectly robust to few-shot prompting attacks. Targeting selective representations is thus an important step towards deep and robust LLM forgetting.

Who Is Affected

DeepSeek
Meta AI
Qwen
AI product teams

What To Watch Next

Watch for benchmark validation, API availability, pricing, limits, and early customer adoption.
Watch whether additional sources confirm the same claim.

Read Original Source

You will be redirected to huggingface.co.