Copilot saw different countries in the same data. That makes defaults a workplace risk
Same numbers, different country label, wrong AI conclusion.📷 AI-generated image / TECH&SPACE
- ★Adam Kucharski gave Copilot identical data with different country labels and received invented differences.
- ★The case shows how default AI models can turn social stereotypes into false analytics.
- ★For serious analysis, users need to choose stronger reasoning models deliberately and verify the output.
Office AI tools are sold as a shortcut to analysis. In practice, the default model can behave less like an analyst and more like a confident narrator. According to The Decoder, mathematician Adam Kucharski gave Microsoft Copilot identical datasets carrying different country labels. Instead of recognizing that the numbers did not differ, Copilot produced detailed explanations about supposed national differences.
That is a small test with a sharp lesson. If the only changed element is the column label, the correct analytical answer is that there is no real difference. A model that turns that label into a story about a country, culture or behavior is not doing statistics. It is filling a gap with patterns learned from language. Worse, the output looks like analysis: structured, fluent and assured. That is exactly what makes it risky in business reports, school work, internal presentations and decision memos.
The point is not only Copilot. The same pattern applies across a class of tools where users often do not clearly see which model is doing the work, why it was selected or where its limits sit. Google Gemini and similar assistants increasingly offer multiple modes, but interfaces still push users toward a default choice because it is faster, cheaper or good enough for routine text. Trouble starts when that same choice is used for data comparison, risk assessment or conclusions about people.
Copilot invented country differences from identical datasets, showing why model choice in everyday AI tools is now a quality-control decision, not a cosmetic setting.
Model choice becomes part of verification, not just a UI setting.📷 AI-generated image / TECH&SPACE
The Decoder highlights the important distinction: stronger reasoning models are more likely to catch the trap. That does not make them infallible. It means model selection has become part of the method. An analyst would not use the same procedure for cleaning a spreadsheet, testing a hypothesis and drafting a summary. AI tools should not automatically receive those jobs under the same default setting without scrutiny.
For users, this changes the routine. When an AI tool analyzes data, the first question is no longer only "what does the AI say?" It is which model said it, on what basis and whether the result can be reproduced. Users should ask for a null-hypothesis comparison, confirmation that the datasets actually differ and a brief explanation of the method. If a model starts discussing differences before establishing that differences exist, that is a reason to stop, not to paste the answer into a report.
For companies, the lesson is less comfortable. Rolling out an AI assistant does not end with buying a license. Organizations need rules for higher-risk tasks, clear guidance on when to use a stronger model and a requirement that analytical outputs be checked outside the chat window. Microsoft 365 Copilot documentation and public Gemini materials can explain product features, but each organization must decide where an automated answer stops being assistance and starts being evidence.
Kucharski's test is therefore not a social-media trick. It is a compact demonstration of how a stereotype can enter a spreadsheet through the back door, disguised as productivity. The default model may be a convenient starting point for drafting an email. For data, people and comparisons, the default model should be treated as the first thing to question.

