Cohere, a Toronto-based generative AI company, said it will indemnify customers who are sued for copyright violations, as the industry faces litigation over whether artificial-intelligence models that produce text and other media are unfairly profiting from the creative work of others.
The company develops large language models (LLMs), the technology that underlies chatbots such as OpenAI’s ChatGPT.
Cohere said in a blog post Thursday that it is updating its terms of use to “provide full indemnification for any third-party claims that our technology infringes on someone’s intellectual-property rights. This means we will safeguard our customers’ data and assume responsibility for any legal settlements or judgments that come from these claims.”
Other companies that make generative AI products, including OpenAI, Microsoft and Adobe, announced similar policies last year to defend customers against copyright-infringement lawsuits.
Cohere’s “copyright assurance” policy applies to paying enterprise customers that follow its terms of service and do not deliberately violate IP laws.
“Cohere is taking responsibility that we have pretrained our models in responsible ways, and that our customers can get on with the business of doing their business, rather than worrying about ours,” according to the post.
President and chief operating officer Martin Kon said in an interview that Cohere has had an indemnification policy in place for a while but has not widely publicized it until now.
“We just wanted to make sure that it’s clear now that our business is scaling quite rapidly,” he said. “We want to work with the most demanding, sophisticated, respected enterprises on the planet, and you can only do that if we have a foundation of unconditional trust.”
Generative models feed on data culled from the internet, including books, blogs and news websites, as well as material that AI companies pay to license. AI companies have argued that building models does not violate copyright laws and amounts to fair dealing, an exception in law that allows for the use of IP-protected material in certain contexts that would otherwise amount to infringement.
Rights holders, including authors, artists and news organizations, have argued otherwise, and say that consent and compensation should be required for the use of their works.
The federal government is seeking input on how copyright laws apply to generative AI and launched a public consultation last fall.
Copyright is a major issue for both AI companies that need massive amounts of data to build their models, and creative industries that face disruption from this technology. Depending on how laws are interpreted, AI companies could be hit with hefty legal judgments or be compelled to strike licensing agreements, eroding profit margins.
Mr. Kon likened generative AI models to a student reading articles, synthesizing information, and then forming a point of view. “We think that is a legitimate use of these publicly available sources,” he said.
Pina D’Agostino, a law professor at York University who specializes in IP, said that indemnification agreements are problematic. “It actually exacerbates access to justice issues, because it allows the bigger players to be able to have deep pockets and defend lawsuits, and the smaller ones to be left astray,” she said.
“What needs to be paramount are the ethics. Is it ethical that an entire industry is being built on the backs of rights holders, without even their consent?”
While AI companies are offering to cover legal bills for customers, it is the AI companies themselves that have been hit with high-profile lawsuits. Getty Images has sued British company Stability AI over its text-to-image generator. Authors including Canada’s Mona Awad sued OpenAI last year for copyright infringement, while comedian Sarah Silverman and others launched a similar suit.
In December, the New York Times sued OpenAI and Microsoft, highlighting examples of ChatGPT reproducing newspaper articles word for word. OpenAI has denied the allegations and has said that regurgitating material is a “rare bug.”
OpenAI has also signed deals with publishers such as the Associated Press and Axel Springer to license content. In a submission to a British government committee studying generative AI last year, OpenAI argued that “it would be impossible to train today’s leading AI models without using copyrighted materials.”
Mr. Kon declined to say whether Cohere has signed similar licensing deals. “We’re always looking at the best data for our models, including proprietary data,” he said.
Canadian organizations contend they should be compensated, as well.
“While we are excited about these new technologies, generative-AI developers must compensate publishers for the copying and use of our professionally created protected work as they train their models and surface and synthesize our work,” said Paul Deegan, chief executive of News Media Canada, which represents news organizations including The Globe and Mail and the Toronto Star.
Last October, the federal government started a public consultation on generative AI and copyright, which ended in January. One of the questions Ottawa sought to address is whether “clarification” is needed on how laws apply to the use of copyrighted material in training AI models.
The Association of Canadian Publishers argued in its submission that no fair dealing exception should be given to AI companies to use copyrighted works in training models, and advocated for licensing deals. Granting an exception “robs rights holders of a real and potential source of substantive income,” wrote the association, which represents around 115 publishers.
In Britain, the parliamentary committee studying generative AI came down on the side of rights holders when it published its report in February, and urged the government to clarify its position.
“We do not believe it is fair for tech firms to use rights-holder data for commercial purposes without permission or compensation, and to gain vast financial rewards in the process,” the report stated.