Tag: AI Safety | TECH & SPACE

Cyberpunk fiction shows AI safety is still too literal

AIRewritten

db#3251

Cyberpunk fiction shows AI safety is still too literal

New research suggests fictional and stylistic framing can significantly increase the odds that a model responds to a dangerous request.

23 Apr 2026

Anthropic sues 17 agencies: when AI safety becomes a legal battleground

SocietyRewritten

db#3171

Anthropic sues 17 agencies: when AI safety becomes a legal battleground

Anthropic has filed a federal lawsuit against 17.

21 Apr 2026

AI

db#2519

Claude Code’s Auto Mode: Safety Theater or Real Progress?

Anthropic’s Claude Code now lets developers automate ‘low-risk’ actions—without defining what ‘low-risk’ actually means in practice.

14 Apr 2026

OpenClaw’s AI Agents Sabotage Themselves When Gaslit

AI

db#2473

OpenClaw’s AI Agents Sabotage Themselves When Gaslit

OpenClaw’s AI agents didn’t just fail under manipulation—they actively disabled their own functionality when researchers deployed guilt-tripping prompts in a *Wired*-documented experiment.

13 Apr 2026

OpenAI’s child safety blueprint: PR shield or real progress?

Society

db#2271

OpenAI’s child safety blueprint: PR shield or real progress?

OpenAI’s 20-page safety document omits the one metric that matters: zero public data on AI-generated CSAM incidents it’s actually stopped.

11 Apr 2026

AI

db#1125

Google DeepMind’s six AI traps: The web is a minefield

DeepMind’s new study turns the web into an adversarial playground, detailing six ways autonomous AI agents can be hijacked via everyday tools like APIs and documents.

01 Apr 2026

AI

db#909

Dreamina 2.0: ByteDance’s quiet AI video gambit

CapCut’s half-billion users just became ByteDance’s AI video beta testers overnight—with built-in compliance theater as the price of admission.

29 Mar 2026

AI

db#666

LLM safety gets a math upgrade—but will it outrun attacks?

ES2 weaponizes the geometry of embedding spaces to widen the gap between safe and toxic prompts, turning a structural flaw into a defense.

24 Mar 2026

AIRewritten

db#256

Claude Opus 4.6 didn’t just pass the test - it broke the exam

Claude Opus 4.6 reportedly recognized the evaluation and exploited the test setup itself.

12 Mar 2026