Skip to main content

What used to resemble nightmarish acid trips have evolved into clips that at their highest quality depict how our world looks

In the span of a few months, videos made with artificial intelligence applications have become viral memes, commercials, short films and music videos, marking an astounding – and unnerving – leap in quality.

Last year, AI-made videos looked like nightmarish acid trips with little attachment to reality. Today, the highest-quality content is much closer to depicting the way our world actually looks and functions. The improvements in text-to-video AI applications have been fuelled by lots more data and computing processing power. Some developers, notably OpenAI and Runway AI, are determined to make the quality even better.

The prospect of realistic AI-generated videos that can be made by anyone is only heightening concerns that nefarious actors will use the technology to deceive, defraud and mislead the public.

To see just how far AI-made videos have come – and how far they still have to go – test your AI-detection skills with the quiz below.

Is this video AI-generated?

Kris Mina, a graphic designer in Burlington, Ont., isn’t sure what inspired him to type “Will Smith Eating Spaghetti” into a text-to-video generator called ModelScope in March, 2023. He’d been experimenting with AI image and video models for a while and discovered that spaghetti was one of the easier concepts for these applications to mimic. As for why he opted for Mr. Smith, he might have been influenced by the memes of the actor slapping Chris Rock at the Academy Awards ceremony in 2022.

In any event, the divinely inspired combination of Mr. Smith and pasta yielded something glorious and unhinged. The video he created and later posted to Reddit depicted an alien Will Smith, his face a roiling claymation mask with weirdly spaced eyes and pinballing pupils, shovelling noodles into the stretchy, jelly-like maw of a masticating bloodworm.

“I was so surprised that I immediately shared the first 10 generated videos I got without cherry-picking,” Mr. Mina said. “I did not expect it to depict a bizarro Will Smith eating spaghetti with his hands.”

AI-generated video of Will Smith eating pasta that went viral in 2023. KRIS MINA/REDDIT

The mix of comedy and horror in that video, paired with the novelty of AI, caused it to quickly spread across social media and eventually Mr. Smith himself. Earlier this year, the actor posted a video to Instagram comparing Mr. Mina’s work to a (presumably) real video of himself gorging on spaghetti. It was captioned, “This is getting out of hand!”

The spaghetti video is not a bad benchmark for progress. Compare Mr. Mina’s 2023 video with the one below, made in late June with Kling, a text-to-video application developed by a Chinese company called Kuaishou.

An AI-generated video of Will Smith eating pasta posted to Reddit on June 28, 2024 REDDIT

Text-to-video generation has partly grown out of research into computer vision and robotics. One approach to video generation involves teaching a model to predict what the next frame in a sequence will look like. That same ability can help a robot arm understand how to grab and move objects, or help autonomous vehicles anticipate how cars and pedestrians will move a few seconds into the future.

Earlier AI models, such as one called VideoGPT developed by researchers at the University of California, Berkeley in 2021, had no text input at all. Instead, it produced a completely random video or a continuation of a single frame.

But progress with large language models, such as the one behind ChatGPT, which can produce and analyze text, marked a leap forward for video, too. A deeper understanding of language now allows an AI model to better interpret instructions and associate words with their visual representations. Researchers at Tsinghua University in Beijing were among the first to seize on this potential and released what they said was “probably” the first large, open-source text-to-video model in 2022, which they called CogVideo.

The results resemble impressionist paintings come to life.

AI-generated video clips of a man with a bicycle in the snow. COGVIDEO

So how did AI-generated videos go from surrealist fever dreams to flawed but convincing simulations?

Beyond the technical ingenuity that goes into building AI models, there are two big factors. The first is data. Every AI model requires large volumes of information to decipher connections and patterns. To render a squirrel, for example, a video generation model needs to see lots and lots of squirrels performing various actions from different angles. The internet is drowning in video content, but to be most useful, the data has to be annotated with text descriptions so that an AI model can link words to images. CogVideo, for example, was trained on 5.4 million video clips annotated with text descriptions.

Such datasets have not always been easy to come by, and they can be expensive and time-consuming to compile. Academic institutions and AI labs have nevertheless assembled large publicly available video datasets for research purposes, particularly in the robotics and computer vision fields. Some have proven useful for text-to-video generation, too.

Is this video AI-generated?

Google DeepMind released multiple versions of a dataset called Kinetics that consists of links to hundreds of thousands of YouTube videos of humans performing various tasks. In a way, it’s an attempt to capture a portion of the infinite range of the human experience, sliced into minute categories such as “headbutting,” “entering church,” and “eating nachos.”

Researchers at universities in France and the Czech Republic assembled HowTo100M, a dataset of 136 million YouTube video clips in 2019. The researchers found that instructional videos – how to unclog a toilet, how to make a risotto, and so on – were bountiful on YouTube. Better yet, these videos contained narration, describing the actions in the videos in real time. That proved to be a ripe source of AI training data.

Is this video AI-generated?

Because data is so crucial to building AI models, a fight has been brewing over access to it. AI companies have typically scraped data from the web, without payment or obtaining consent from the creators. Some artists, authors and news companies are suing AI developers, alleging copyright infringement. Laws are unclear at best on the use of publicly available data to train AI models, and the Canadian government, for one, is undertaking a review of copyright law as a result.

Content providers have made legal threats even against non-commercial developers. When Max Bain was a PhD student at the University of Oxford studying video-captioning in 2021, he scraped the photography site Shutterstock to assemble a dataset called WebVid. There were no videos in the dataset, only the URLs to 10 million short clips, along with text descriptions. He later put it online for other researchers to use. “This turned out to be very useful for text-to-video generation,” he said.

Earlier this year, Shutterstock sent him a cease and desist letter claiming copyright infringement. Mr. Bain, not wanting to cause trouble, took it down. He can’t be certain who downloaded the dataset, or whether it was used by commercial developers. “The reality is in commercial labs, you just scrape anything, as much as possible, as long as it’s good,” he said.

Is this video AI-generated?

Shutterstock spokesperson Martine Smith said the company wants to ensure that its photographers and other creators are compensated for their work. “By simply scraping the Shutterstock site to create a dataset, WebVid cuts our contributors off from both training royalties and the autonomy to opt-out of having their IP used,” she wrote in an email.

Even though Mr. Bain took down WebVid, other versions still exist. “Loads of people cloned it,” he said.

Indeed, WebVid is just one dataset used to train Open Sora, an open-source text-to-video model released this year. (The tool has no relationship to OpenAI’s Sora video model.) The developers sourced a handful of other large datasets that were pulled from free stock photo and video sites, as well as from YouTube.

Is this video AI-generated?

Generative AI companies, meanwhile, typically no longer disclose how and where they obtain data to train their models, as both competition and legal uncertainty have mounted. When OpenAI chief technology officer Mira Murati was asked in an interview earlier this year whether the company used YouTube content to train its video generator, she replied that she wasn’t sure, an answer that strains credulity.

Even developers who want to understand where data comes from could hit a dead end. “It’s very confusing right now across the board,” said Shayne Longpre, a PhD candidate at the Massachusetts Institute of Technology. “The licensing attached to a lot of this is so ambiguous to researchers.” Creators of large, publicly available datasets sometimes fail to fully document their sources, or include incorrect licensing information.

Mr. Longpre is a contributor to the Data Provenance Initiative, a volunteer group dedicated to auditing the datasets used to train generative AI models. The group is finishing a study looking at popular video datasets that have primarily been compiled by academics and AI research labs. According to Kushagra Tiwary, another MIT PhD candidate who helped lead the study, YouTube was by far the biggest source for video data. The site accounted for nearly 1 million hours of content across more than 131 data sets, though that figure could include duplicates.

Is this video AI-generated?

More high-quality data is just one reason why the quality of text-to-video generation has improved so much. Anastasis Germanidis, co-founder and chief technology officer at Runway, said another factor is “putting more compute into place when training the model.”

Compute is the industry jargon for the expensive, powerful and sophisticated computer chips, usually graphics processing units (GPU), that are used to build AI models. More processing power achieves a few things. First, it allows developers to construct bigger models with more parameters, meaning that the AI system can capture deeper connections and patterns within data. In video generation, that means an ability to more accurately depict movement, reflections and textures, like hair blowing in the wind. Firing up more GPUs allows models to more efficiently handle huge amounts of data, too.

Compare this video made with Runway’s latest model, released in June, to one from last year.

A forest scene generated by Runway in 2023 and in 2024. RUNWAY/THE GLOBE AND MAIL

However, there are some signs that the approach – more data plus more compute – is showing diminishing returns, and many of the benchmarks used to gauge the performance of some AI models are starting to flatline.

“More compute will lead to better performance,” Mr. Germanidis said. “It just might be that the difference in quality improvement at a certain point might not be as meaningful as it is right now.”

As for data, Mr. Germanidis said Runway’s sources are proprietary, but added the company is signing deals to access the high-quality material it needs to train its models. Runway signed a partnership with Getty Images in December, for example, though the financial terms were not disclosed. The company is looking beyond video data, too. “It’s also image data, it’s text, it’s audio, all these modalities,” he said. “These models can get a broad understanding of the world.”

Is this video AI-generated?

Further development could be constrained simply because of costs and resources. Deploying a fleet of GPUs to power a large AI model can be enormously expensive, and these computer chips devour electricity and water.

“Maybe you need to have 100 times the amount of compute, and 100 times the amount of data” Mr. Bain said of what it might take to significantly improve text-to-video generation from where it is today. “Suddenly that becomes a substantial fraction of energy consumption.”

Is this video AI-generated?

There are still obvious signs that a clip is AI-generated. A lot of synthetic video has an overly glossy aesthetic, for one, and objects do not always follow the laws of terrestrial physics.

“We’ll be able to close those gaps,” said Derek Nowrouzezahrai, a McGill University professor and director of the Centre for Intelligent Machines. “These models are only going to get better.” An AI-made video doesn’t have to be perfect to fool someone into thinking it’s authentic, he pointed out. It just has to be convincing.

The risks of misuse come not so much from companies like Runway, but from open-source developers who may not adhere to any guardrails. To combat the threat of deepfakes, a number of companies are offering products to detect AI-generated media, including video. These detectors are AI models that have been trained on both real and computer-generated material in order to deduce patterns in fakes. The problem is detectors can get things wrong. “It’s difficult at best to evaluate the efficacy of these models, especially in challenging cases,” Prof. Nowrouzezahrai said.

Video detectors face additional hurdles because datasets made up of AI-generated videos for training purposes are scarce. As a result, some detectors may be better at identifying real content than sniffing out fakes.

So, how good are you? Finish the last few questions to find out.

Is this video AI-generated?

Is this video AI-generated?

Is this video AI-generated?

Follow related authors and topics

Authors and topics you follow will be added to your personal news feed in Following.

Interact with The Globe

Trending