Lost in translation: Cohere’s AI software wrote parts of this story. But is it ready for the world?

Cohere’s AI software wrote parts of this story. But is it ready for the world?DOMENIC BAHMANN/The Globe and Mail

Before Aidan Gomez co-founded an artificial intelligence company, he worked as an intern at Google Brain in Toronto alongside Geoffrey Hinton, a luminary in the field of AI. Gomez was the kind of person, Hinton recalled, who had so many ideas that it was difficult to get him to focus on what he was supposed to be doing.

But Hinton noticed that Gomez was particularly interested in learning to translate languages. At that time, machine learning wasn’t advanced enough to be useful for translation, but it wasn’t far off. When he was about to enter his third year at UBC, Gomez co-founded Unbabel with Daniel Jinich, an Argentinian who had studied at Oxford. At first, Gomez said, it was a way to earn money on the side. Then he and Jinich realized that it could become a legitimate business. They have raised $22 million from Y Combinator and other investors, including billionaire Elon Musk. “It’s kind of like magic,” Gomez said in an interview last month. “There’s this thing that has no resemblance to how the brain works, yet we’re able to use it to communicate.”

Except Gomez didn’t do any of that. Cohere, the AI startup that Gomez actually co-founded, made it up. Gomez never described AI as “magic,” never attended the University of British Columbia and never started a company with Daniel Jinich, who, as far as I can tell, does not exist. I wrote the first paragraph, which is true, and pasted it into Cohere’s web application before clicking a button labelled “Generate.” Cohere conjured the rest, piecing together a plausible, if fabulated, article. Cohere generated endless new realities for Gomez with the same two sentences, in fact. In one creation, he was the CEO of AIDAN.AI, still grappling with too many ideas, inspired to pursue AI after seeing Jurassic Park at 10 years old. “I was blown away,” Gomez (didn’t) say.

It might seem like magic, as Cohere told us, but it’s the result of countless hours of work and billions of words. Cohere, based in Toronto, is a natural language processing company, a branch of AI broadly devoted to improving the ability of computers to generate and interpret text. Cohere’s large language models (LLMs), the programs that do this work, have been trained to understand language by digesting essentially the entirety of the publicly available internet—blogs, digital books, news articles and so on. Cohere’s models can be used to write fluently, answer questions, distill a paragraph to its essence, extract important details from a mass of text and many more tasks that Cohere hopes other developers will make real. Applications relying on LLMs are already here—the technology is being used to power more sophisticated customer service chatbots, assist with writing computer code, summarize and analyze customer feedback, improve search results, generate marketing copy, and help writers mired in a creative rut.

Gomez, who is 26, started Cohere in 2019 with two friends, Nick Frosst and Ivan Zhang. The trio seems typecast to work in the cerebral and eclectic world of AI: The first time we spoke, Gomez had pinkish hair parted straight down the middle; Frosst harbours a love for Magic: The Gathering, a fantasy card game; Zhang, meanwhile, dropped out of university. Since teaming up, they’ve raised US$175 million, built a team of about 135 people, and set up additional offices in Silicon Valley and London. It’s a small company up against some of the biggest firms in the world. OpenAI, founded by a handful of tech power players, is perhaps the best known, but both Alphabet’s Google and Meta (formerly Facebook) have their own LLMs, not to mention deep pockets and access to the most powerful supercomputers in the world, which is necessary to ensure their models keep improving.

Cohere is taking a different approach. “The technology that’s being developed, it’s isolated within these huge organizations,” Gomez says. He wants to make Cohere’s language models available to all and ensure they’re so easy to work with that the average developer can write their own applications or start entirely new companies. In effect, Cohere aims to be a platform powering countless products and services. “For every one machine-learning expert, there’s a thousand non-expert developers,” Gomez says. “If we really want to see this stuff permeate technology more broadly, we need to give that 1,000 the capability to build with it, instead of just the one.”

If he has his way, anyone, anywhere, will be able to make the machines talk, listen and learn. One can only hope it proves to be a good idea.

One morning this past summer, I met with Nick Frosst at the company’s office in downtown Toronto. Cohere was undertaking a renovation, and with piles of cardboard on the floor and a lone couch sheathed in plastic in the reception area, the place had the feel of a work-in-progress.

Frosst stopped by the kitchen for coffee first. Humming to himself, the wiry 29-year-old in jeans and Nike sneakers could have passed for a barista as he worked a high-end La Marzocco espresso machine. We then went to a meeting room, and he opened his laptop to the Playground, an application where people can test Cohere’s language models. Frosst immediately demystified the technology he’d spent the past couple of years working on. “It’s seeing all the text ever written on the internet—like, almost all of it. And it sees a sequence of words. It just says, ‘Cool, what will be the likely word after this, based on what I’ve seen?’” he said.

The day before, he explained, he had whipped up a travel search bot with Cohere and Twillio, which provides tools for sending and receiving text messages. He took out his phone and typed: “I want the cheapest way from Toronto to Paris, leaving tomorrow, getting back next week.” He then received a text with a link to travel site Kayak, providing him with a list of flights sorted by price. As a lowly writer bereft of technical abilities, I expressed surprise that he’d programmed the bot so quickly, but Cohere had taken care of the hard part: interpreting Frosst’s request, which was written in a conversational style, not inputted through a series of cumbersome drop-down menus and calendar boxes.

Outside of Cohere, Frosst is the singer for an indie band called Good Kid, which operates a Discord server for fans. A while back, he noticed that people tended to ask the same questions over and over again. To make life easier, he used Cohere to program a bot to monitor queries. If a question is similar to one that’s been answered before, it will provide a canned answer. He pulled up Discord and saw with satisfaction that when someone recently asked if the band had plans to tour the U.K., the bot immediately piped in with a list of Good Kid’s upcoming shows.

Frosst, who grew up in Ottawa, came to AI through a job at a board-game café in Toronto. A suitably nerdy conversation with a customer about the computability of board games led to a recommendation that he get in touch with John Tsotsos, a York University professor working on computer vision, who took him on as a research assistant. Later, Frosst became Hinton’s first hire at Google Brain’s Toronto office. “He’s exceptionally good at social interactions,” Hinton says. “Most computer scientists aren’t.”

Gomez, meanwhile, grew up in the small town of Brighton, Ont., with agonizingly slow dial-up internet. His frustration led to an obsession with technology and programming, as he fruitlessly searched for hacks to improve his web access. He enrolled in computer science at the University of Toronto in 2013 and took a year off to work at a tech incubator in Vancouver. He was assigned to integrate machine learning into a piano sheet-music app—his first real exposure to the field—and started reading papers by Hinton on artificial neural networks. Gomez, imbued with the kind of hubris only a young man can possess, emailed Hinton to critique his approach to a particular AI concept and propose an alternative. Hinton wrote back to politely explain all the reasons why Gomez’s idea wouldn’t work.

When he later returned to U of T, he fell in with its small crew of machine-learning profs and students. He’s never stopped to consider why he’s interested in AI. “I just take as an axiom that artificial intelligence is the coolest problem you can work on,” he says. In 2017, he scored an internship at Google Brain in San Francisco, where he contributed to a seminal paper that revolutionized natural language processing (NLP).

At the time, the latest advancements relied on what are called recurrent neural nets, which effectively processed words one at a time. Only traces of previous words remained in a neural net’s memory, and these systems could not grasp context. A recurrent neural net could be tripped up by something as simple as “may,” which can indicate permission, possibility, the calendar month or a person’s name.

Gomez and his fellow researchers outlined a new method dubbed transformers. Rather than process words sequentially, transformers consider all previous words in a sentence when calculating the probability of the next one. Transformers deploy a mechanism called “attention” that essentially helps the model more accurately guess the meaning of a word based on those around it, parsing, for example, whether “bat” refers to the animal or the implement used to whack a ball.

Gomez is one of eight authors on the paper, titled “Attention is All You Need,” and he’s modest about his contributions, crediting two others as the principal investigators. “My contribution was mostly on the code itself and the framework for training these models,” he says.

Transformers are now everywhere in natural language processing. In 2020, for example, OpenAI released its latest model, called Generative Pre-Trained Transformer 3, or GPT-3. It was shockingly sophisticated, capable of writing prose and essays, and answering questions with a level of precision not seen before. The advent of transformers is one reason why one can command GPT-3, as I did, at my six-year-old’s behest, to write a story about Thomas the Tank Engine in which his friend Percy runs out of fuel:

Percy was having a busy day, shunting trucks and taking passengers around the island. He was so busy that he ran out of fuel and had to stop. Percy was very embarrassed and had to call for help. Thomas and Gordon came to Percy’s rescue and pushed him back to the shed. Percy was very grateful and promised to be more careful in the future.

I was more impressed with this than my son, particularly with how GPT-3 accurately brought in other elements from the Thomas universe—an island, “shunting,” Gordon—and how it reflected the series’ promotion of friendship, slavish devotion to work and puritan morality.

Hinton, like everyone in the field, took notice of the transformers paper and invited Gomez to join Google Brain in Toronto in 2018. He met Ivan Zhang around that time, who’d dropped out of computer science at U of T but still popped into the program’s Slack channel. Zhang, now 25, quit school to take a job at a friend’s genomics startup. He was itching to do something of his own and proposed starting a business with Gomez. Every day, they sent each other one idea until they hit upon training an AI model on the internet. “I didn’t even care about the business aspects,” Zhang says. “I was so down to work with Aidan, and I thought it was a really cool engineering problem.”

In the midst of the brainstorming, Gomez gave a talk to an AI company in Toronto, whose co-founders had recently started a venture capital firm. Gomez mentioned he had an idea, and the next morning he found himself pitching to a room full of VCs. The firm, Radical Ventures, soon cut Gomez and Zhang their first cheque, and Frosst joined in January 2020.

They also approached Hinton to invest, though he rarely does so. He turned them down. “It seemed very ambitious to me,” he says. “It wasn’t clear how they were going to get the resources to train these models.” A large language model has to feed on many terabytes of data, which requires immense computing power. But Cohere managed to strike a financial deal with Google to use its supercomputers.

After that, Hinton was in. He’s never played around with Cohere’s language models, though. His investment was based on “gut instinct.”

Even before investing in Cohere, Radical Ventures had concluded that large language models were going to be, well, big. “We had a thesis, independent of meeting them, that this technology was going to be so important to virtually every business in the world,” says Jordan Jacobs, a Radical co-founder and Cohere board member. “The applications are really endless.”

Here are a few ideas: Marketers can generate ad copy; programmers can generate code; lawyers can extract important details from contracts; customer feedback can be summarized and categorized; social media posts can be analyzed for sentiment; sophisticated customer service chatbots can help clients and reduce costs; more powerful virtual assistants can make life easier; clever semantic search engines can replace clunky keyword searches on websites and apps; and content moderation services can be improved. “There’s a lot of these really mundane tasks that I think this stuff is really great for handling, because it will do a way faster, way cheaper job,” Frosst says.

Startups and other tools are already flourishing, too. One called Viable summarizes customer feedback. Copy.ai generates marketing plans, cold emails, resignation letters and more. Sudowrite promises to help smash writer’s block by assisting with novels and screenplays. Epsilon Code uses plain language descriptions to create and debug computer code. Grok summarizes mountains of Slack messages.

All of these companies are built on GPT-3. When I pressed Cohere about exactly who’s doing what with its LLMs, everyone retreated to generalities. Gomez said about 3,000 developers are signing up to Cohere each month and that the company is earning revenue. (Cohere charges a fee per “token,” which is a word or part of one.) He described a bifurcated market of sorts. At one end, there’s a grassroots community of machine-learning experts, developers and hobbyists forming around Cohere. They gather in the company’s public Discord server, where the founders are very active, to trade tips, ask questions, give feedback and share what they’ve built. Cohere hosts hack-a-thons, too, and one group of past winners used the company’s models to build a job interview app for tech workers. The program asks questions about technical concepts and provides feedback on the quality of the answers, along with suggestions for improvement.

At the other end, large enterprises are interested in what can be done with Cohere’s LLMs. Gomez notes it’s still early, while Jacobs at Radical Ventures is unabashedly bullish. “They have a massive customer pipeline,” he says, adding that content moderation, particularly for multiplayer gaming applications, is a promising area. The volume of text flowing online is virtually impossible for humans to police, and automated solutions are only so effective. An AI-powered moderation tool, though, can marry the judgement of a human moderator with the speed of automation, while distinguishing between the jocular trash talk common in gaming and actual harassment. “The nuance is hard to pick up,” Jacobs says. “Having AI monitoring for healthy interactions versus toxic ones is inevitable if we want to protect anyone online.”

The closest I came to learning about how Cohere is used in a real-world business setting was through Ada Support Inc., a Toronto company developing AI-powered customer service chatbots for clients such as Zoom Video Communications and Shopify. Ada initially used out-of-the-box NLP software, such as programs made by Google and Amazon, to power its chatbots, but eventually collected enough data to build its own models, which it uses today. Those models excel at what NLP experts call intent classification—essentially, understanding what the customer is talking about. But the models are not designed to write back.

Cohere has the potential to both interpret customer requests and respond. “Being able to do both those things at once is very significant,” says Ada CEO and co-founder Mike Murchison. (He’s also an investor in Cohere.) “It adds a whole other level of savings, in theory, to the business.” Today, Ada is only using Cohere’s technology in a limited fashion, such as coming up with potential responses to customer inquiries. A human agent still has to approve which message to send.

Hurdles remain. First, the response time isn’t fast enough to handle a large volume of inquiries simultaneously. Second, Ada and its clients have to trust the technology to work all the time. “The error rate is too high for us to put this in a production environment right now,” Murchison says.

While he says Cohere’s models are rapidly improving, that’s also true of its competitors. No one around Cohere seems too worried, though. “My general line is that we have competition,” says Mike Volpi, a partner at Index Ventures in San Francisco, which invested in Cohere. “But they’re oriented in slightly different ways than we are.”

OpenAI is a hybrid—part for-profit company, part non-profit research institute—with a much broader goal of achieving artificial general intelligence, meaning software that’s just as capable as we are at learning and completing tasks across domains. Google appears more interested in using LLMs for its own products, such as improving search.

“Google makes so much more money doing what it does that I don’t know that it would be consistent with their objectives to try to go directly and compete with Cohere,” Volpi says. Meta, bruised by privacy scandals and demands for more transparency, is taking a cautious approach, focusing on the AI research community and offering full access to its LLM model to only a few “highly resourced labs.”

That can change, of course, particularly if the potential really is as limitless as Cohere’s backers contend. Both Google and Meta can rely on profits from their existing businesses to continually improve their models, a luxury a startup like Cohere does not have. OpenAI is not entirely without commercial aspirations, either. It enjoys a close relationship with Microsoft, which invested US$1 billion in 2019 and then secured a commercial licence to integrate GPT-3 into its products. Last year, Microsoft announced a service to help its own business customers work with the technology.

But the more fundamental question about LLMs has nothing to do with market size or competition. It’s about how to use them responsibly.

In 2016, Microsoft set up a Twitter account for a chatbot named Tay. Behind the bot were machine-learning algorithms designed to improve its conversational skills the more Tay traded messages with users. Within hours, trolls inundated Tay with racist and toxic language, which the bot parroted back in unpredictable ways. Microsoft apologized and pulled the plug.

The experiment underlined a few important points. Perhaps the most obvious is that releasing AI in the wild without rigorous testing and risk analysis is both careless and potentially harmful. Also, AI can only learn from the data humans decide to feed into it. An LLM trained on the totality of the internet will ingest vast amounts of horrible language, biases and stereotypes, all of which it can regurgitate. As a couple of AI researchers put it in a 2020 paper, “Feeding systems on the world’s beauty, ugliness and cruelty, but expecting it to reflect only the beauty, is a fantasy.”

Cohere’s own documentation makes the risks clear. “Despite our ongoing efforts to remove harmful text from the training corpus, models may generate toxic text. This may include obscenities, sexually explicit content, and messages which mischaracterize or stereotype groups of people,” according to its user guide.

A wide body of research has emerged around how to build responsible AI, and in 2021, a group of researchers, including Emily Bender at the University of Washington and Timnit Gebru, the former co-lead of Google’s ethical AI team, published a paper outlining the risks of LLMs. Toxicity is a pernicious issue, they note, as language generated by LLMs and put online can find its way into training data for successive models, entrenching the problem. Language and our understanding of social concepts is constantly evolving, and the authors note “it isn’t likely feasible for even large corporations to fully retrain them frequently enough” to keep up with the pace of change.

Just as the commercial opportunities for large language models are endless, so too is the potential to hijack them for nefarious purposes. Bad actors could use LLMs to flood the internet with conspiracy theories or extremist ideology, for example. When I prompted Cohere with some text about world elites orchestrating the pandemic, it was happy to burrow down the rabbit hole:

The next step in this plan is the introduction of the vaccine to the world’s population. Once the vaccine is rolled out, it will bring about the Great Reset. COVID-19 is no more dangerous than the flu. This is what doctors have been saying all along. What we have seen in recent weeks is a co-ordinated effort to change public opinion and control the narrative. The Bill & Melinda Gates Foundation has funded many think tanks and has put together a vast global network of media assets to push the COVID narrative.

Cohere’s terms of service technically prevent someone from building a bot to convert people into paranoid anti-vaxxers, and the company has an extensive list of prohibited activities, such as inciting violence, hate speech and sexual exploitation. Some categories are harder to define, like political manipulation and misinformation—a problem that every large social media platform has struggled to label and manage. “We disallow a lot of stuff,” Gomez says. “When we catch someone doing it, their accounts are gone.”

From the left - Ivan Zhang, Aidan Gomez and Nick Frosst, the co-founders of Cohere, an AI company, are photographed on May 3, 2021.Fred Lum/The Globe and Mail

In June, Cohere partnered with OpenAI and AI21 Labs, based in Israel, to put out a set of best practices for LLMs, including documenting weaknesses and vulnerabilities, disclosing lessons regarding safety and misuse, and building diverse teams. Bender at the University of Washington says the principles are a step in the right direction, but adds she’s “very skeptical of the companies’ commitment to them [and] ability to carry them out properly.” She notes the document starts off with AI hype: “Computers that can read and write are here.” Without qualifying the statement, it leads people to believe these systems have human-like capabilities, a popular misconception Bender and other researchers have been at pains to correct so as not to overstate what these systems actually do. “If they mean ‘read and write’ in the sense of store data and spit it back out, that’s nothing new,” she said over email. (“Computers understanding and generating text—reading and writing, for lack of better terms—is a major technological achievement, one Cohere aims to continue to drive forward in the coming decades,” a company spokesperson said in response.)

Cohere also formed an independent advisory council to monitor its work, though the company has yet to disclose its members. The size of Cohere’s dataset means that no matter how much filtering and scraping of toxic language it does, not everything can be caught. “It’s impossible,” Gomez says. “That’s what you’re working toward—minimizing that probability of it saying something harmful. But, absolutely, that system is not foolproof.”

Earlier this year, Gomez compared Cohere’s reading abilities to that of a high school graduate; its skills at composing language that’s indistinguishable from a human were not quite as advanced. “It’s still figuring out its footing,” he said. But it’s doing so rapidly. When it’s off and running, the implications—good and bad—are unpredictable.

In the interest of journalistic fairness, it seemed appropriate to ask Cohere’s LLM what to expect when the technology is more advanced. Of course, a large language model doesn’t possess any intrinsic knowledge. All it does is endlessly run the odds on the likeliest word to follow another, plucking it from its vast repository, and moving on with no fundamental understanding of what it’s stringing together. It’s a kind of linguistic chimera.

When I prompted Cohere with some text about the implications of LLMs, the words it produced were strangely suitable, though, the sort of thing an entrepreneur would say to build excitement without really saying much at all. “It’s a question that both fascinates and terrifies people who work in computer science,” Cohere wrote. “The only thing that’s certain is that we are going to be surprised.”

Your time is valuable. Have the Top Business Headlines newsletter conveniently delivered to your inbox in the morning or evening. Sign up today.

Follow related authors and topics

Interact with The Globe

Trending

The Globe 100: The best books of 2024

What is walking pneumonia? As cases rise in Canada, the symptoms to look out for

Swifties are live-streaming Eras Tour concerts, saying they’re part of a ‘community’

Bev Priestman says she hopes Canada Soccer spying scandal will ‘clean up’ sport

‘This should all but close the door to a 50 bps rate cut’: How economists and market bets for BoC moves are reacting to Trudeau’s tax holiday

With the Laurentian elite’s power fading, a new and less stable Canada is emerging

The time is now to start training for a spring triathlon

How can I best transfer my TFSA to my loved ones? Our investment experts answer your questions

Latest in