Cloudflare shows how the web can stop wasting AI agentsâ tokens
The technical gain is making error states parseable instead of verbose.đˇ Generated editorial visual / Tech&Space
- â 98% reduction in token usage for AI agents
- â RFC 9457-compliant Markdown and JSON error responses
- â No configuration needed for site owners
Cloudflareâs latest move targets a quiet but costly inefficiency in AI infrastructure: error pages. Until now, AI agents parsing Cloudflareâs default HTML error responses wasted thousands of tokens on brittle parsingâonly to extract a handful of machine-readable instructions. The companyâs new RFC 9457-compliant error responses flip this script, serving AI agents structured Markdown and JSON instead. The result? A 98% reduction in token usage, with early tests showing savings of over 1,000 tokens per failed request.
The change is deceptively simple. When a request triggers an errorâinvalid host, DNS routing failure, or similarâCloudflare now returns a lightweight payload that AI agents can process directly. No more scraping HTML for status codes or error messages. The new responses are triggered automatically, requiring no configuration from site owners. Browsers, meanwhile, continue to receive the same HTML experience as before, ensuring no disruption to human users.
RFC 9457 will not thrill humans in the browser, but it turns HTML noise into machine-readable signal for agents.
For agents, structured failures can be as important as successful responses.đˇ Generated editorial visual / Tech&Space
The real question is whether this optimization delivers on its promiseâor just shifts the problem elsewhere. Cloudflareâs Markdown for Agents release earlier this year laid the groundwork for this update, suggesting a broader strategy to streamline AI-agent interactions. But while the 98% token reduction is impressive, itâs worth noting that most AI agents still spend the bulk of their tokens on successful requests, not errors. The true test will be how these structured responses handle edge cases: rate limits, authentication failures, or partial content responses.
For developers building AI-driven automation tools, the change could reduce operational costsâassuming agents actually use the new error responses as intended. The risk? Over-optimizing for error handling while neglecting the far larger token expenditure on successful API calls. In other words, this might be the equivalent of tuning a race carâs brakes while ignoring its engine.

