SIEVE Wants Models to Learn From Three Examples, but the Trick Is Cutting Context
AI-generated concept image for SIEVE's sample-efficient context filtering.๐ท AI-generated / Tech&Space
- โ SIEVE targets parametric learning from natural language with as few as three real queries.
- โ SIEVE-GEN decomposes context into units and pairs synthetic queries only with relevant parts.
- โ Evaluation includes custom domains, RuleArena and Machine Translation from One Book tasks.
SIEVE starts from a familiar crack in large-language-model work. In-context learning is fast: put rules, examples or documentation into the prompt and the model behaves better as long as you carry that context. Parametric learning is more durable because it changes model weights, but it usually needs lots of data, high-quality traces or verifiers.
Authors Parth Asawa, Alexandros G. Dimakis and Matei Zaharia propose a middle path. SIEVE needs as few as three real query examples, but it does not rely only on them. Its SIEVE-GEN pipeline decomposes natural-language context into smaller units and generates synthetic queries so that each one is paired only with the relevant context, not the whole document. Context distillation then tries to internalize that knowledge into the model.
The Berkeley paper is not selling data-free fine-tuning magic; it proposes a cleaner way to synthesize examples from only the relevant parts of context.
AI-generated visual showing synthetic data rollouts updating model weights.๐ท AI-generated / Tech&Space
That distinction matters. Bad synthetic data often appears when a model receives too much context and generates plausible but diluted examples. If context can be decomposed, training gets less noise: a rule is paired with the query it actually applies to. The paper tests the method in reasoning settings where context is necessary, including custom domains, RuleArena and Machine Translation from One Book.
The caution is the same as with most arXiv methods. Three examples in a controlled task are not three examples in production, where users send contradictory requests and synthetic rollouts can amplify errors. Still, SIEVE targets a real pain point: many teams have good rules and documentation, but not thousands of labeled examples. If context can be cut precisely, data scarcity becomes a smaller problem and decomposition quality becomes a bigger one.

