Generative artificial intelligence is moving from digital applications such as ChatGPT and into the physical world, including helping to power self-driving vehicles, a sector that has been rife with setbacks despite billions of dollars of investment.
Waabi Innovation Inc., an autonomous vehicle company in Toronto, has adapted concepts from generative AI and large language models into its autonomous driving system. Other companies, meanwhile, are leveraging LLMs to communicate with industrial robots, allowing human operators to give instructions through plain language, images and video.
But the problems with generative AI, including accuracy, mean that there are still hurdles to overcome.
Waabi published a paper in March detailing Copilot4D, an AI model that the company says can help self-driving cars and trucks predict the behaviour of surrounding vehicles and pedestrians more accurately than existing approaches. The model works owing in part to strategies from the generative AI applications that produce text and images, but customized by Waabi.
At a high level, the LLMs that power chatbots break language into what are called tokens, which could be a word or part of one. The token, as far as the AI model is concerned, is represented as a string of digits. By analyzing lots of data, the AI model can then predict the likeliest tokens – or words – to appear after another, mimicking human writing patterns.
Waabi’s Copilot4D works in a similar way, according to founder and chief executive Raquel Urtasun. Autonomous vehicles are equipped with light detection and ranging sensors, or LIDAR, to view the world. Waabi’s AI model converts the visual data from these sensors into tokens.
From there, the model uses a generative AI image concept called diffusion to take that data and predict new tokens, which are then rendered back into LIDAR data – scenarios for how other vehicles and pedestrians are likely to behave a few seconds into the future. It’s effectively an AI model that seeks to understand the three-dimensional world, with the added component of time.
“If you look at the foundation models developed to date, they are all for a 2-D or 3-D world, for text, images and video,” Ms. Urtasun said. “But they don’t have an understanding of 3-D, plus time.”
Waabi is already using a more advanced version of the Copilot4D model on self-driving trucks delivering commercial loads with Uber Freight in Texas, a partnership announced last year. But the company only publicly detailed its methods for the academic community in March.
“You don’t want to enable your competitors to be on par quickly,” she said. A human safety driver is still onboard for Waabi’s deliveries.
In our self-driving future, why must we corrupt cars by turning them into offices?
The approach is in keeping with Waabi’s philosophy toward autonomous driving. Other companies are teaching AI systems to learn by piloting vehicles on the road, which has proved to be expensive and dangerous in some cases. Waabi, founded in 2021, has instead built a simulator to first teach AI models about driving.
Gary Marcus, an emeritus professor of neural science at New York University and AI entrepreneur, said Waabi’s approach is interesting, but he remained skeptical. “The road to self-driving is filled with broken promises,” he said. “We are still a long way from autonomous systems that are as reliable as humans when coping with unusual circumstances.”
The past few years have seen numerous flameouts and failures. Cruise, the autonomous driving unit of General Motors, suspended operations in the U.S. in October after regulators found that the company’s driverless taxis posed a risk to public safety. That same month, a Cruise robotaxi ran over a pedestrian that had been hit by a human driver. The pedestrian was pinned under the wheel of the Cruise taxi and dragged for about six metres.
Apple recently abandoned its efforts to build an electric vehicle with autonomous capabilities after spending billions on the project, and is shifting its focus to generative AI.
Waabi shows that self-driving and generative-AI concepts are not mutually exclusive, but it does raise questions. LLMs and chatbots suffer from reliability problems, and are known to produce false information. A chatbot inventing legal citations is arguably not that harmful. But a self-driving car that erroneously forecasts the behaviour of a pedestrian puts physical safety at risk.
Waabi, in its explanation of Copilot4D, also talks about anticipating the behaviour of other “vehicles.” But human beings, with all of their quirks and errors in judgment, are still the ones operating those other vehicles. How can an AI model so accurately forecast what a person is going to do?
Ms. Urtasun said Waabi’s system learns much like we do – by predicting how a scene could evolve based on experience.
“As kids, the way we learn is to interact with the environment. We poke, and then we see the answer,” she said. “That’s what the system can do as well, and we do this on the simulator to learn very naturally.”
Missy Cummings, an engineering and computer science professor at George Mason University, said that simulation is only helpful in the early stages of autonomous-vehicle development.
“I appreciate that Waabi’s approach is likely helpful for some development work,” she said, “but ultimately to be safe for operations on public roads, companies have to transition to primarily driving on actual roads with other human-driven cars to determine whether their systems are truly safe.”
While Waabi is focused on self-driving trucks today, Ms. Urtasun said its Copilot4D model has applications for robotics, too. “Any robotics system that utilizes LIDAR as the main modality can use this,” she said.
Other companies are already combining generative AI with industrial robots. Covariant, a company based in California and backed in part by Toronto’s Radical Ventures, has developed a platform that allows people to type plain-language instructions to a robotic arm, such as “pick up a banana.”
Recent media demonstrations show the system isn’t perfect. When asked to retrieve a banana, the robot first picked up a sponge, an apple and a few other items, according to MIT Technology Review.
Figure AI, which recently raised US$675-million to develop humanoid robots, announced a partnership with ChatGPT-creator OpenAI in February to improve the ability of its technology to process and reason through language.
Ms. Urtasun hasn’t been deterred by the setbacks faced by other self-driving companies. Instead, it’s proof for her that Waabi’s approach is the right one. Some observers thought it was too late for her to get into the space when she founded Waabi.
“I said this is exactly the time. I didn’t build a company before because I didn’t think the technology was ready,” she said.