Thoughtworks’ lesson: a research agent only matters if it can show its sources
A production research agent as a supervised system, not a standalone chatbot.📷 AI-generated image / TECH&SPACE
- ★Kulkarni frames deep research agents as systems for complex work, not as ordinary chatbots with longer answers.
- ★The production challenge sits in multi-agent orchestration, verifiable retrieval, and reports that a human can inspect.
- ★The signal matters for AI teams, but the report does not provide hard metrics, benchmarks, or proof of a sharp technical leap.
Deep research agents are not just another label for a chatbot that writes a longer answer. According to InfoQ, Sarang Kulkarni of Thoughtworks spoke at Arc of AI Conference 2026 about production lessons from building agentic systems that investigate complex topics across multiple steps, retrieve information through connected sources, and produce structured analytical reports.
That distinction matters. A conventional LLM flow often ends with one prompt and one answer. A deep research agent has to break down the task, decide what it still does not know, retrieve additional evidence, compare it, discard weaker signals, and turn the result into something an organization can actually inspect and use. In that architecture, the model is not the whole product. It is one component inside a system that needs planning, search, memory, validation, constraints, and a reliable output format.
Thoughtworks' Sarang Kulkarni used Arc of AI 2026 to unpack lessons from building multi-step agentic systems for research, retrieval, and analytical reporting.
Multi-step retrieval needs an audit trail a team can inspect.📷 AI-generated image / TECH&SPACE
That is why Kulkarni's topic is more useful than the usual agent demo. If an agent is expected to prepare a research summary for a business, technical, or regulatory decision, fluent prose is not enough. The system has to show where its conclusions came from, where uncertainty remains, and how it handled conflicting information. That is the line between a useful research assistant and a machine for confident synthesis.
The wider context is already visible. OpenAI has described deep research as an agentic mode for multi-step search and synthesis, while Anthropic's note on building effective agents argues that robustness often comes from simple, supervised orchestration rather than maximal autonomy. Kulkarni's InfoQ signal fits that pattern: the real subject is not a magic agent, but a controlled production loop.
It is also important to state what the report does not provide. It does not give public hard metrics, comparative benchmarks, error rates, or evidence of a new jump in model capability. This is an implementation lesson, not a product launch or a research result. For teams deploying AI research systems, that is still useful: less mythology about autonomy, more attention to traces, supervision, sources, and accountability in the final output.
The shortest read is this: deep research agents become serious only when they stop performing as all-knowing conversational partners and start behaving like inspectable research machinery. Production does not merely need a smart answer. It needs a reason to trust it.

