Designing inference chips for robots is extremely challenging
In data centers, chips get heavy cooling and constant monitoring; if a chip fails it can be hot-swapped with a spare unit
GPUs in data centers have relatively high fault rates โ the industrial annual fault rate for H100-class units is around 9%. Even in ideal conditions it drops to about 2%, rarely below single digits
In robots, chips operate in harsher conditions and must recover quickly on their own. Fault-tolerance needs are far higher; many robotics teams struggle to keep a chip running more than a few hours without rebooting
That situation favors chip vendors, who often recommend buying extra modules for hot swapping
For robotics companies, that's not a scalable solution and results in endless vendor support tickets
Great threadโฆ reliability in the wild is a huge bar for robotics.