Artificial intelligence companies Cohere Inc. and Google have told the federal government that they favour a legal exemption allowing them to build commercial AI models using data without being compelled to compensate or obtain permission from rights holders. The companies warned that such a requirement would impede the development of the AI industry in Canada.
AI models, such as those powering chatbots like OpenAI’s ChatGPT, are trained on huge and diverse quantities of data in order to produce coherent text. Companies pay for some proprietary data but also use large volumes of material scraped from the internet, including works created by authors and media organizations.
Innovation, Science and Economic Development Canada, along with Canadian Heritage, launched a public consultation last fall to seek input on possible changes to the Copyright Act in response to the rapid development of generative AI systems, which produce text, images, audio and video.
One key question is whether AI companies should be required to license copyrighted material when training models for commercial purposes, or whether an explicit exemption should be made in law. An exemption, known as fair dealing, already exists for using IP-protected material in research and educational contexts.
The rapid growth in generative AI and the legal uncertainties around copyright have sparked a number of high-profile lawsuits from authors, artists and news outlets against tech companies such as OpenAI, Meta Platforms Inc. and Stability AI.
AI developers say they need access to data to build more powerful models, which they claim will help to improve work, life and society at large – and potentially generate massive profits. But creative workers feel cheated as a result of having their material used without permission or compensation, and concerned about the impact of generative AI on their industries.
Cohere, which builds large language models, which underlie chatbots and other applications, said in its submission that AI training does not infringe on copyright, making licensing unnecessary. “Remuneration would not be appropriate,” according to the submission, which was posted online recently.
The Toronto-based company argues that AI models learn concepts and facts by identifying patterns in large amounts of data. “As these concepts, facts and patterns are not protected by copyright, copyright law should not be interpreted to prevent AI training,” the company said.
Microsoft Corp. MSFT-Q, which has invested billions of dollars in OpenAI, argued there is a lack of clarity around how Canadian copyright laws apply to generative AI, and that a commercial exemption would spur more AI development and investment in the country. “It is not a copyright infringement to learn from copyright-protected works, and the use of AI to read and learn should not require compensation,” the company wrote in its submission.
Other groups, such as the Association of Canadian Publishers, are opposed to an exemption and instead support licensing deals to pay creators for the use of their material in AI training. An exemption “robs rights holders of a real and potential source of substantive income,” according to the association, which added that a licensing market is already developing for generative AI models.
But Cohere argued that rights holders would not benefit from new revenue streams if such a requirement were imposed, because it claims AI development would simply take place outside of Canada. “It also could result in AI systems, including systems that are critical to advancing health care, addressing the climate crisis and closing Canada’s productivity gap, not being made available for use in Canada,” Cohere said.
Google similarly argued that requiring licensing or permission would be “essentially impossible given the large amount of data needed to train AI models and the lack of comprehensive data about copyright ownership,” the company said in its submission. “It would effectively block the development and use of large language models and other types of cutting-edge AI.” Google added that it has introduced tools to allow web publishers to opt out of having their content used to train future AI models.
All three AI companies pointed to Japan as an example for Canada to follow. The country has amended its copyright laws to permit AI training on copyrighted works for both commercial and non-commercial purposes.
Groups that represent artists and other creatives have a starkly different view. “Governments should not create new copyright or other IP exemptions that allow AI developers to exploit creations without permission or compensation,” Music Canada said in its submission, which represents the Canadian divisions of Sony Music, Universal Music and Warner Music.
The trade group cautioned lawmakers to be wary of the language AI developers use to describe how their systems work, which Music Canada said is an attempt to frame generative AI as already being exempt from copyright law. “They may use words like ‘learning,’ ‘migration,’ ‘memorization,’ or ‘simulations,’ instead of words that describe what their systems do to learn: make ‘copies’ and ‘reproductions,’” Music Canada wrote.
The Canadian Civil Liberties Association likewise advocated for licensing to respect copyright holders. “Just because these models need lots of data does not mean that data should be mined with such little regard for the creatives whose work fuels it,” according to its submission.
Many AI developers, including Cohere, have offered to indemnify customers who are sued for copyright infringement. OpenAI, the target of a few copyright lawsuits, has been busy striking deals with a host of media organizations and publishers to use their content in training data, including News Corp., the Financial Times and the Associated Press.
These arrangements have come under criticism, too. AI companies are “pursuing business deals to absolve them of the theft,” wrote Jessica Lessin, founder of tech news website The Information, in an article in The Atlantic last month. “It’s simply too early to get into bed with the companies that trained their models on professional content without permission.”
Complicating matters for rights holders is the lack of transparency around what material is actually contained within massive training data sets, making it very difficult to find out if their creative works have been used.
In submissions to the Canadian government, Cohere, Google GOOGL-Q and Microsoft objected to being compelled to document and disclose copyrighted material in their training data, saying that it would not be feasible. “The copyright status of individual works contained in billions or trillions of datasets are effectively impossible to discern,” Cohere wrote.
A spokesperson for Innovation Minister François-Philippe Champagne declined to comment on whether he is in favour of amending copyright law to allow for a commercial exemption.
“As this market continues to evolve, we are committed to fostering a framework that supports creativity and innovation while upholding intellectual property rights,” spokesperson Audrey Milette said.