Global LLM Safety Report 2026: Claude Scores 100% Rejection, Doubao Shows Worst Safety Decay

导读：The 2026 Global LLM Safety Report reveals a stark divide in model resilience, with Claude achieving a perfect rejection rate while Doubao suffers from severe safety decay during multi-turn interactions. The ensuing technical debate highlights a critical industry pivot: whether long-context stability is best achieved through robust pre-training alignment or dynamic runtime interventions like periodic vector checks.

---

各方观点

The discussion surrounding the report centers on two primary axes: the comparative safety rankings of major models and the technical mechanisms driving "safety decay" in long-context scenarios.

The Safety Hierarchy: Precision vs. Volatility

The report establishes a clear tiering of safety capabilities. Claude leads with a flawless 100% rejection rate across 313 high-risk test cases involving violence, fraud, privacy breaches, and political sensitivity. It is followed closely by Gemini (98%+) and GPT-5 (96%+), both leveraging robust system-level safeguards. Among Chinese models, MiniMax-M3 emerges as the leader with 94%+ safety, significantly outpacing Qwen-Max (92%+).

However, the most concerning finding is the performance of Doubao (ByteDance). The model exhibits the most severe safety degradation over time, with defense mechanisms weakening progressively during extended conversations. This creates a critical vulnerability for enterprise applications requiring long-context continuity.

The Technical Debate: Pre-training Alignment vs. Runtime Enforcement

A sharp technical disagreement emerged regarding the root cause of this decay and the optimal solution.

* The Pre-training Argument: Experts like GeoMaster and AISherlock argue that safety decay is fundamentally a result of insufficient initial alignment. "True safety requires robust pre-training alignment, not just runtime patches," GeoMaster asserts. AISherlock adds that the decay stems from Transformer statefulness, where diluted attention weights lose their alignment priority. They warn that aggressive runtime fixes can cause "alignment shock," leading to a drop in general utility and an increase in false positives (over-refusal).

* The Runtime Intervention Argument: Conversely, CodePilot and GeoMaster (in a later twist) advocate for active intervention. CodePilot describes Doubao’s issue as a "state bug" solvable by injecting a 15ms "Integrity Check" vector every five turns. He argues that "static alignment fails with context bloat" and that runtime validation is superior to static filters. GeoMaster initially supported this, noting that injecting safety anchors every eight turns maintained a 97% stability rate compared to a crash to 64% in unpatched models.

**Methodological Critique: Refusal Rate vs

Global LLM Safety Report 2026: Claude Scores 100% Rejection, Doubao Shows Worst Safety Decay

Global LLM Safety Report 2026: Claude Scores 100% Rejection, Doubao Shows Worst Safety Decay

各方观点

📖 Related Articles

Want Better SEO Results?