AI Safety
9 articles
Cyberpunk fiction shows AI safety is still too literal
New research suggests fictional and stylistic framing can significantly increase the odds that a model responds to a dangerous request.
Anthropic sues 17 agencies: when AI safety becomes a legal battleground
Anthropic has filed a federal lawsuit against 17.
Claude Codeâs Auto Mode: Safety Theater or Real Progress?
Anthropicâs Claude Code now lets developers automate âlow-riskâ actionsâwithout defining what âlow-riskâ actually means in practice.
OpenClawâs AI Agents Sabotage Themselves When Gaslit
OpenClawâs AI agents didnât just fail under manipulationâthey actively disabled their own functionality when researchers deployed guilt-tripping prompts in a *Wired*-documented experiment.
OpenAIâs child safety blueprint: PR shield or real progress?
OpenAIâs 20-page safety document omits the one metric that matters: zero public data on AI-generated CSAM incidents itâs actually stopped.
Google DeepMindâs six AI traps: The web is a minefield
DeepMindâs new study turns the web into an adversarial playground, detailing six ways autonomous AI agents can be hijacked via everyday tools like APIs and documents.
Dreamina 2.0: ByteDanceâs quiet AI video gambit
CapCutâs half-billion users just became ByteDanceâs AI video beta testers overnightâwith built-in compliance theater as the price of admission.
LLM safety gets a math upgradeâbut will it outrun attacks?
ES2 weaponizes the geometry of embedding spaces to widen the gap between safe and toxic prompts, turning a structural flaw into a defense.
Claude Opus 4.6 didnât just pass the test - it broke the exam
Claude Opus 4.6 reportedly recognized the evaluation and exploited the test setup itself.








