Sheema Khan is a patent agent and the author of Of Hockey and Hijab: Reflections of a Canadian Muslim Woman.
Scarlett Johansson recently heard herself in artificial intelligence. After turning down a request from OpenAI chief executive Sam Altman to voice ChatGPT, she said the company’s new voice assistant Sky sounded a lot like her. “When I heard the released demo, I was shocked, angered and in disbelief that Mr. Altman would pursue a voice that sounded so eerily similar to mine that my closest friends and news outlets could not tell the difference,” she said in a statement.
While OpenAI has denied using Ms. Johansson’s voice without her consent, saying that it hired a different actor, the incident highlights the tension between AI companies and the creators of the content that these companies need to improve their products. As generative AI advances by the day, a question has arisen: could AI lead to the death of intellectual property?
IP relates to the legal right of an owner to prevent others from copying, using, making or selling what they created. There are different types: Copyright refers to the protection of original works of authorship, such as novels, computer code, songs and works of architecture; a trademark protects brand identity; patents provide an exclusive right for an invention, while a design patent protects the look or shape of an article. An IP owner can decide the terms of its use, such as by licensing.
Let’s say that an engineer – tasked with creating a tool to take apart a wooden crate – comes up with a crowbar, and after testing different prototypes, sends the final specification to a design engineer. The marketing team then brands her tool as “The Crow.” There are three types of IP available: a patent (for the crowbar itself), a design patent (for the crowbar’s design) and a trademark (for “The Crow”).
These legal conventions now have to contend with AI, a technology that enables machines to mimic human intelligence and problem-solving capabilities. Machine learning is a subfield of AI by which data and algorithms are used to imitate how information is transmitted through neurons in the human brain. Generative AI uses machine learning to generate new content based on different inputs, such as text, images and sounds; this output can be indistinguishable from human-generated content.
A traditional algorithm takes an input, manipulates it according to specific instructions, and produces an output, much like a recipe. A machine-learning algorithm, on the other hand, is provided with many examples of inputs and its corresponding output, and tries to figure out the recipe to use itself. This requires huge amounts of data to “train” models; the more data that are used for training, the better the model will be in predicting an output from a new input.
AI’s first challenge to IP is in the inputs. The large language models (LLMs) that power generative AI require trillions upon trillions of portions of text (or “tokens”) that are gleaned from the web. Companies such as Google, Microsoft and Meta scrape the internet for digital English-language text, video and audio content they can use to train their LLMs to find statistical patterns among the tokens. So massive is this demand that some predict that no high-quality human-generated data will be left to train LLMs by 2026.
Many of these companies appear to have ignored copyright laws in their pursuit of data. As a result, The New York Times is suing OpenAI and Microsoft for the unauthorized use of its content; Getty Images is taking Stability AI to court for unauthorized use of its library; the Authors Guild has filed a class-action lawsuit against OpenAI for unauthorized use of its members’ works to train ChatGPT. Any of these lawsuits could halt the generative AI juggernaut in its tracks, since those copyright fees will be quite onerous to pay. Perhaps the question will become: Will IP be the death of AI?
The second challenge relates to who owns the AI-generated products. Jurisdictions across the world currently stipulate that IP ownership hinges on human creators; if content is generated exclusively by AI, the IP cannot be owned, since there is no human involved. To return to our previous example, if ChatGPT was to have created a crowbar for the crate problem, there is no human inventor, and thus no patent; if ChatGPT is used exclusively to generate and brand the crowbar’s final design, neither a design patent nor a trademark can be applied for. Yet IP rights are key to innovation, as they provide a limited monopoly to monetize investments in research and development. AI represents an existential threat in this regard.
Clearly, the law has not caught up. But sitting idly by is not an option, as there are too many important policy issues at play.