ROBOTICS • Monday, 18 May 2026

Your DoorDash Driver Is Training Tomorrow's Robots

By AI Daily Editorial • Monday, 18 May 2026

On March 19, 2026, DoorDash launched a standalone app called Tasks. It pays the company's 8 million US delivery couriers to strap on body cameras and film themselves washing dishes, folding clothes, and making beds. The job is not about improving food delivery. The footage goes to train humanoid robots. Many couriers doing the filming do not know which robot companies will ultimately use their footage, and the app has not published data retention or consent policies.

This arrangement sits at the heart of one of the least understood constraints in modern AI. Language models were trained on hundreds of billions of web pages that already existed. Image generators drew on hundreds of millions of photographs already online. For robots, none of that pre-existing data works. A robot learning to wipe a counter needs multidimensional sensor traces: vision, force, joint position, and motor commands, all captured in tight synchronisation during a real physical interaction. Each useful movement trajectory has to be recorded from scratch, on actual hardware. The industry calls this the data drought.

The scale of the shortfall is documented. Google's robotics team ran 13 robots for 17 months in an office kitchen to gather 130,000 movement trajectories for its RT-1 model in 2022. The largest cross-institution open dataset ever assembled, Open X-Embodiment, pooled 60 separate datasets from 21 institutions to reach roughly 1 million trajectories. Language model training corpora routinely contain 1.5 to 4.5 billion examples. More than $6 billion went into humanoid robots in 2025 alone, and the fundamental problem remained unchanged. Capital can buy more hardware; it cannot conjure training data that does not exist.

The industry has settled on four parallel approaches, each with real costs. Teleoperation produces the highest-quality data but costs $118 per hour even after a 65 percent price drop since early 2024. Simulation is cheap and scalable but suffers from a "sim-to-real gap": physics engines approximate the world rather than reproduce it, and policies trained in simulation often fail on real robots because of friction, soft materials, and the dynamics of half-filled containers. Motion capture generates spectacular movement demonstrations but breaks down the moment a task requires the continuous tactile feedback a human hand provides without thinking. Internet video is the most abundant source but carries no force values or joint angles: only pixels.

Corporate strategies have adapted in different ways. Tesla abandoned its motion-capture suits in June 2025 and switched to helmet-mounted cameras worn by factory workers during ordinary tasks. Figure AI secured a partnership with Brookfield Asset Management, which controls over 100,000 residential units, to record first-person task video at scale across its properties. In November 2025, Figure reported that its Helix model had learned to navigate cluttered home environments from natural language instructions using only this egocentric human footage, without a single robot demonstration. China has industrialised the approach: as of January 2026, the government had funded 40 dedicated robot training centres where workers repeat the same household motions hundreds of times daily alongside the robots built to learn from watching them.

The transparency failures running through these programmes are significant. DoorDash's Tasks app has excluded California, New York City, Seattle, and Colorado from its rollout: all jurisdictions with stricter data privacy laws. That geographic exclusion is itself a signal about how the programme is structured. Workers in participating states bring cameras into their homes, capturing their routines, voices, and the interiors of their residences. The footage's stated end use is training AI models for unnamed third-party partners in retail, insurance, hospitality, and technology. A standard courier agreement implies none of this.

The deeper question the industry has not yet answered is whether scaling robot training data will produce the same emergent generalisation that language models showed. The hope is a data flywheel: deployed robots generating failure data that automatically feeds the next training cycle. That flywheel has not been demonstrated to work at scale. Until it is, the data drought sets the speed limit for the entire robotics sector, and the people most directly involved in closing that gap are delivery couriers and apartment residents who, in many cases, have no clear picture of where their footage ends up.

Sources

The Data Drought: Why Embodied AI Can't Just Read the Internet — TechTimes