RoboScience Unveils Visics General-Purpose Embodied Foundation Model with VLOA Dual-Engine Architecture

Chinese robotics startup RoboScience (机器科学) publicly disclosed its Visics embodied foundation model and VLOA (Vision-Language-Object-Action) architecture at a June 24 event in Shenzhen. The model introduces Object Trajectory — a continuous 3D point-cloud representation of objects — as an intermediate interface that decouples high-level semantic reasoning from low-level motor control. Visics integrates two engines: a world model trained on internet video for physical trajectory prediction, and a manipulation model trained on hundreds of billions of simulated operation trajectories via the company's proprietary RoboMirage physics engine. The company demonstrated furniture assembly, cross-embryonic dexterous grasping, and millimeter-precision force control tasks. RoboScience reports it has reduced per-trajectory data cost to 1/20th to 1/200th of traditional real-robot collection methods, and is scaling toward trillions of manipulation trajectories.

Why it matters: RoboScience is pursuing the same 'sim-to-real data flywheel' strategy that has become a recurring pattern in Segment 10 (Robotics/Physical AI), following the playbook pioneered by companies like Covariant with its RFM architecture and Physical Intelligence with π0. However, Visics' explicit decoupling of world-model reasoning (video-pretrained) from motor control (simulation-pretrained) via the Object Trajectory interface is architecturally novel — it directly attacks the long-standing 'Sim-to-Real Gap' that has historically limited simulation-trained models. The company's focus on object-level generalization (grasping any object, across any gripper) rather than task-level automation distinguishes it from incumbents like Boston Dynamics or Agility that optimize for specific locomotion or manipulation tasks. The key test will be whether the trillion-scale simulated dataset yields reliable real-world performance under distribution shift, an open debate that has claimed prior startups like DRL and Vicarious. RoboScience lists prominent Chinese strategic investors including JD.com, SenseTime, and CMB Venture Capital, indicating strong domestic capital support for what it frames as the 'App Store for robotics' — a platform play reminiscent of the hyperscaler-distribution pattern seen in Segment 03.

Expert take: RoboScience's VLOA architecture represents a genuine attempt to solve the 'embodied perception-action loop' by inserting Object Trajectory as an invariant intermediate representation — a design that inherently supports cross-embodiment transfer. The sim-data cost metric (1/20th to 1/200th of real-robot collection) is compelling if validated independently. However, RoboScience has not published independent benchmarks on furniture assembly success rates or failure modes. The industry has seen similar claims from Skild AI and others; the substrate's skepticism memory (Segment 10 §6) records multiple simulation-only approaches that failed under real-world friction, deformation, and lighting variation. The credible signal here is the simultaneous pull of strategic CVC backing from JD.com (retail logistics) and SenseTime (computer vision) — suggesting RoboScience's go-to-market thesis (e-commerce pick-and-place with high-SKU variability) has early commercial buy-in. Its path to enterprise deployment will depend on whether the VLOA abstraction layer truly enables 'universal control' across third-party hardware, or whether integration costs erode the platform economics.

RoboScience Unveils Visics General-Purpose Embodied Foundation Model with VLOA Dual-Engine Architecture

The AMW Read

#RoboScience #EmbodiedAI #FoundationModel #SimToReal #Robotics #VLOA #ChinaAI

How This Connects

More news from RoboScience

Discover AI Startups