A Robotics Startup Bets That Touch, Not Sight, Is the Missing Sense
At ICRA 2026 in Vienna, Xense Robotics makes the case that machines will only master the physical world once they can feel it.
For the past several years, the story of robotics has mostly been a story about vision. Cameras got cheaper, perception models got smarter, and robots got dramatically better at recognizing objects and planning their way around a room. But seeing a coffee cup and reliably picking it up without crushing it or letting it slip are two very different problems — and a Shanghai-based startup is wagering that the second one is where the real bottleneck lies.
Xense Robotics brought that argument to ICRA 2026, the IEEE International Conference on Robotics and Automation, held June 2–4 at the Messe Wien Exhibition & Congress Center in Vienna. From its booth in Hall B-123, the company showed off a lineup built entirely around one idea, captured in its tagline: "Touch Ignites Physical Intelligence." The displays spanned tactile sensing hardware, a new data-collection device, and the AI models the company trains on the touch data it gathers.
At ICRA 2026
The case for touch
The pitch rests on a distinction that sounds obvious once stated but has been easy to overlook amid the excitement over computer vision and large multimodal models. Vision tells a robot what is in front of it. It does not reliably tell the robot whether an object is slipping, how much a soft material is deforming under its grip, whether a contact is stable, or how much force is actually being applied.
Those variables are exactly the ones that decide whether everyday manipulation tasks succeed or fail — grasping, precision assembly, folding flexible materials, aligning parts, pressing components into place. A camera can watch a robot fold a cardboard box, but it can't feel the box buckle. Xense's framing is that vision lets a robot see the world while touch is what lets it interact with the world, and that the leap from lab demos to factory-floor deployment runs through that second capability.
It is worth noting this is a company describing its own technology, so the framing is naturally favorable to what Xense sells. But the underlying point is widely shared in robotics research: tactile sensing has lagged vision, and closing that gap is broadly seen as one of the harder, more consequential problems in embodied AI.
The headline demo: folding a box, no script
The centerpiece at the booth was a two-armed robot assembling a cardboard carton from a flat blank — grasping the cardboard, unfolding it, aligning the edges, folding and shaping it, then pressing it into its final form. It looks mundane. That is rather the point.
Box-forming is deceptively hard for a robot because it combines several of the things automation handles worst at once: a floppy, easily deformed material; a long sequence of dependent steps; heavy physical contact; and conditions that shift moment to moment as the cardboard bends. Traditional industrial robots manage tasks like this with carefully pre-programmed trajectories, which work right up until the material doesn't cooperate.
According to Xense, its system used no preset trajectory. Instead it ran on continuous "touch plus vision" feedback, sensing contact position, force, deformation, and stability in real time and adjusting the coordinated motion of both arms on the fly. The company presents this as a shift in kind: from robots that passively execute commands to robots that perceive, adapt, and correct themselves through a long, multi-step task. If it generalizes beyond a demo, that is the sort of capability that matters for handling soft and irregular materials in unstructured settings.
A device to capture what touch actually feels like
Alongside the demo, Xense gave the global debut of a product aimed at a quieter but arguably more fundamental problem: where the training data comes from.
The device, called TacCap-Gripper, is a wearable two-finger tool for collecting physical-interaction data. It combines a high-resolution visuo-tactile sensor with an inertial measurement unit and an encoder, so it can record motion, vision, and touch together and keep them synchronized. The aim is to capture the fleeting, hard-to-measure moments after contact — when pressure builds, a material deforms, or an object begins to slip — and turn them into clean, labeled data that AI models can train on.
This addresses a real chokepoint. Robotics has enormous quantities of visual data to learn from, but comparatively little high-quality tactile interaction data, partly because it has been so awkward to capture. A practical tool for recording it, if it works as described, helps fill a gap that has constrained the whole field, not just one company's products.
A full stack, and the China angle
Rounding out the booth was Xense's broader sensor lineup — tactile sensors in fingertip, gripper, flat, and curved forms, meant to attach to dexterous hands, industrial arms, and humanoid robots. There was also a deliberately lightweight bit of showmanship: an interactive game controlled entirely by touching and pressing one of the company's flat tactile sensors, with no buttons or joystick, meant to make the sensitivity and millisecond response of the hardware tangible to passersby.
Taken together, the company is presenting not a single product but a vertically integrated stack: sensing hardware at the bottom, data-collection tools in the middle, and its own "tactile world models" and VTLA (vision-tactile-language-action) model algorithms on top. The thesis is that owning the whole loop — capture real touch data, model it, feed it back into manipulation — compounds over time, as each physical contact becomes training data for the next task.
Xense Robotics is young. It was founded in May 2024 by Daolin Ma, an associate professor at Shanghai Jiao Tong University and a past ICRA Best Paper Award winner. It describes itself among the first globally to articulate a theory of "tactile spatial intelligence," and its stated ambition is to make tactile sensing a standard, foundational capability across humanoids, industrial robots, and other embodied systems — and, in the company's own framing, to push Chinese hard-tech tactile innovation onto the world stage.
Why it matters
Strip away the promotional gloss and the throughline is a credible one. The field has spent years teaching robots to see. The next phase — moving embodied intelligence out of controlled demos and into messy, real-world deployment — increasingly looks like it depends on teaching them to feel: to read force, deformation, and contact state, and to act on that information in the moment.
Whether Xense specifically becomes the "global leader in tactile intelligence" it aspires to be is an open question, and a single trade-show demo doesn't settle it. But the company is pointing at the right problem. As robots are asked to do more than recognize the world and are expected to manipulate it reliably, touch stops being a nice-to-have and starts looking like the sense they can't do without.