ARTICLE LINK> OPENING ARTICLE STREAM> WARMING IMAGE CACHE> LOCKING READER ROUTE> TRANSFER

// INITIALIZING GLOBE FEED...

TechnologyREWRITTENdb#5186

InfoQ: DDSketch hunts slow microservices before p99 latency burns

May 28, 2026(1d ago)

Global

Quick article interpreter

Prathamesh Bhope’s InfoQ article describes adaptive hedged requests aimed at slow-but-successful calls in microservice chains. The mechanism uses DDSketch for real-time quantile estimation, rotating windows for distribution drift, and a token-bucket budget so extra requests do not erase their own benefit.

A fan-out chain where controlled hedging cuts tail latency.📷 AI-generated image / TECH&SPACE

AuthorAxel ByteTechnology editor“Can read a spec sheet like some people read bedtime stories.”

★Adaptive hedging targets stragglers in fan-out architectures, where multiple slow calls cumulatively inflate p99 latency.
★DDSketch estimates quantiles in real time, while rotating windows help the system track latency distribution drift.
★A token-bucket budget limits duplicate requests so the optimization does not become a load problem.

InfoQ’s article “Adaptive Hedged Requests for Reducing p99 Latency”, by Prathamesh Bhope, starts from that exact failure mode. Hedged requests are not a new idea: send a backup request when the first one takes too long, then use whichever response arrives first. The problem is that the naive version can create its own load incident. If a system duplicates every suspiciously slow operation without restraint, p99 may improve on a chart while the infrastructure absorbs a new wave of work.

This is where “adaptive” matters. Instead of relying on a fixed timeout, the mechanism uses DDSketch for real-time latency quantile estimation. That means the hedging threshold does not have to be a manually guessed number; it can be tied to the current response distribution. When the system speeds up or slows down, the threshold moves with it. That matters in production, where latency is shaped by traffic, cache hit rates, deployments, regional issues, and external dependencies rather than a clean lab curve.

InfoQ details a mechanism combining DDSketch, rotating time windows, and a token-bucket budget to reduce tail latency without uncontrolled request amplification.

Quantiles, time windows, and budget decide when a backup request is sent.📷 AI-generated image / TECH&SPACE

The second part of the design is windowed rotation. It prevents decisions from being made against stale samples. If the latency distribution shifts, the system needs to forget yesterday’s shape quickly and react to the current one. Otherwise hedging becomes sluggish: it either sends the backup too late to help p99, or sends it too early and piles up unnecessary work.

The third guardrail is a token-bucket budget. This is the damage-control layer: hedging gets a limited number of tokens, meaning it can issue extra requests only while spending stays inside the budget. The same token-bucket logic is well known from rate limiting and traffic shaping, including RFC 2697. Here its purpose is blunt and practical: a tail-latency optimization must not become a hidden denial-of-service attack against the system’s own services.

The article’s signal is strong because it cites a 74 percent reduction in p99 latency. That is not a cosmetic metric. P99 is where the worst user experiences accumulate, where transactions threaten SLAs, and where downstream timeout strategies can start cascading. Google’s classic paper “The Tail at Scale” remains a useful reference for the same lesson: as systems grow, the tail of the distribution becomes an operational problem, not a statistical footnote.

The important part is not just the backup request. It is the combination of three constraints: measure the live tail, discard stale distributions, and spend from a strict budget. That turns adaptive hedging into an operational technique rather than a hopeful trick. For large distributed systems, the takeaway is direct: a slow request is not always a failure, but passively waiting for every straggler is how p99 burns.

TECH&SPACE editorial infographic — Adaptive hedging flow from tail measurement to first successful response.📷 AI-generated image / TECH&SPACE

P99 Prathamesh Bhope Latency Straggleri Google Adaptive Hedging Hedged Requests

// Next from latest and related signals

Scammers Use Real Hotel Reservations for Targeted Phishing Attacks

ReDLat2 reads dementia through genes, exposure and aging clocks

ReDLat2 moves dementia research upstream, before the diagnosis becomes visible

// liked by readers

//Comments

Uredi u foto-review →

ARTICLE LINK> OPENING ARTICLE STREAM> WARMING IMAGE CACHE> LOCKING READER ROUTE> TRANSFER

// INITIALIZING GLOBE FEED...

🇭🇷 HR

TechnologyREWRITTENdb#5186

InfoQ: DDSketch hunts slow microservices before p99 latency burns

May 28, 2026(1d ago)

Global

InfoQ

Quick article interpreter

A fan-out chain where controlled hedging cuts tail latency.📷 AI-generated image / TECH&SPACE

AuthorAxel ByteTechnology editor“Can read a spec sheet like some people read bedtime stories.”

★Adaptive hedging targets stragglers in fan-out architectures, where multiple slow calls cumulatively inflate p99 latency.
★DDSketch estimates quantiles in real time, while rotating windows help the system track latency distribution drift.
★A token-bucket budget limits duplicate requests so the optimization does not become a load problem.

InfoQ details a mechanism combining DDSketch, rotating time windows, and a token-bucket budget to reduce tail latency without uncontrolled request amplification.

Quantiles, time windows, and budget decide when a backup request is sent.📷 AI-generated image / TECH&SPACE

P99 Prathamesh Bhope Latency Straggleri Google Adaptive Hedging Hedged Requests

// Next from latest and related signals

ReDLat2 moves dementia research upstream, before the diagnosis becomes visible

// liked by readers

//Comments

Uredi u foto-review →

InfoQ: DDSketch hunts slow microservices before p99 latency burns

// Next from latest and related signals

Wired: hotel bookings are becoming better bait for targeted phishing

ReDLat2 moves dementia research upstream, before the diagnosis becomes visible

//Comments

InfoQ: DDSketch hunts slow microservices before p99 latency burns

// Next from latest and related signals

Wired: hotel bookings are becoming better bait for targeted phishing

ReDLat2 moves dementia research upstream, before the diagnosis becomes visible

//Comments