DeepMind wants robots to read gauges, but the factory floor is the harder test
Manual Codex image generation📷 AI-generated / Tech&Space, manual prompt only
- ★DeepMind describes Gemini Robotics-ER 1.6 as a high-level robotics reasoning model, with improvements in spatial understanding, task planning, success detection and instrument reading.
- ★The DeepMind instrument-reading benchmark shows ER 1.6 at 86% success, or 93% with agentic vision; ER 1.5 is shown at 23% and Gemini 3.0 Flash at 67%.
- ★The model is available through the Gemini API and Google AI Studio, but the documentation labels it as a preview and explicitly flags latency, hallucination, cost and prompt-quality risks.
WHAT ER 1.6 ACTUALLY DOES
Gemini Robotics-ER 1.6 is not a robot hand, a walking machine or a finished autonomous worker. It is a high-level embodied reasoning model: a layer that takes images, video, audio and text, then tries to infer where objects are, what they mean for the task and what should happen next. In robotics, that is less theatrical than jumping over boxes, but often more useful. A machine that cannot tell whether a gauge is out of range is just an expensive camera with legs.
DeepMind says ER 1.6 improves substantially over Gemini Robotics-ER 1.5 and Gemini 3.0 Flash on tasks that usually make robots brittle: pointing to precise locations, counting, spatial reasoning and detecting whether a task has succeeded. The important distinction is that the model is not merely a scene captioner. It can return structured points and boxes, plan steps and call tools, VLA models or user-defined functions that the physical robot still has to execute.
The strongest evidence in the launch is not the broad phrase "physical agents." It is instrument reading. Industrial sites are full of analog pressure gauges, thermometers, sight glasses and digital displays that do not look as tidy as benchmark images. DeepMind says ER 1.6 can interpret circular pressure gauges, vertical level indicators and digital readouts, with Boston Dynamics collaboration highlighted as an important driver. In the instrument-reading benchmark shown by DeepMind, ER 1.5 lands at 23%, Gemini 3.0 Flash at 67%, ER 1.6 at 86%, and ER 1.6 with agentic vision at 93%.
That is a meaningful jump, but the measurement needs to be read precisely. Agentic vision is not magic eyesight. It is a process in which the model can zoom, crop part of an image, use code for calculation and inspect the result again. For a sight glass, for example, the model needs to find the top and bottom of the window, locate the liquid level and calculate the fill percentage. That is closer to inspection work than ordinary object recognition.
A 93% instrument-reading benchmark is a serious signal, but preview status, latency and real robot integration decide how far the demo is from a shift on the floor.
Manual Codex image generation📷 AI-generated / Tech&Space, manual prompt only
WHERE THE DEMO BECOMES AN OPERATIONS PROBLEM
Boston Dynamics gives the story a more practical frame. The company says Gemini Robotics ER 1.6 is integrated into Orbit AIVI-Learning, a system that helps Spot and Orbit perform visual inspections. The examples are boring in the useful way: 5S checks, dangerous debris or standing liquid, conveyor belt damage, pallets, sight-glass levels and analog gauges. This is robotics after the demo ends: not applause, but a routine path through a facility and a report somebody has to trust.
ER 1.6 should still not be pushed beyond the evidence. Google's Gemini Robotics-ER 1.6 documentation identifies the model as gemini-robotics-er-1.6-preview, with text, image, video and audio inputs, but text output. The documentation explicitly says APIs and capabilities may change, complex queries and larger thinking budgets can increase latency, the model can hallucinate, and results depend heavily on prompt clarity. That is not fine print. On a factory floor, latency is the difference between a timely reaction and a late report.
The safety story also needs a cold reading. DeepMind calls this its safest robotics model to date and points to better compliance with physical constraints, including instructions such as not handling liquids or not picking up objects heavier than 20 kilograms. The models were also tested with the ASIMOV benchmark, which asks whether AI systems in the physical world can perceive risk and intervene. Model-level safeguards, however, are not the same as a safety case for a robot on shift. Force limits, supervision, emergency stops, sensor validation and responsibility for a wrong reading still matter.
The useful interpretation of ER 1.6 is therefore narrow, but important. It moves robotics AI from "what is in the image" toward "what does this image mean for the task." If a robot can read a gauge more reliably, determine whether the blue pen really reached the holder or identify which object should move to make room, automation becomes less brittle. The deployment barrier is still familiar: dirty lenses, bad lighting, vibration, networking, API cost, audit logs and the human who has to trust the report. DeepMind has shown a better reasoning layer. The factory will show how well it handles Monday morning.

