Still working out the kinks in generative AI, developers rush ahead with multitasking ‘agents’

Scott Stevenson, co-founder and CEO of Spellbook, in the company's office in St John's, Newfoundland.Johnny C.Y. Lam/The Globe and Mail

Today, some companies are still figuring out what, if anything, to do with generative artificial intelligence. But AI developers are charging ahead with the next big thing: agents.

AI agents differ from chatbots that can answer questions and perform simple tasks, such as writing or summarizing e-mails. Instead, agents promise to autonomously complete multistep jobs on behalf of a user, sometimes by tapping into other software applications to get things done, after receiving high-level instructions in plain English. While a chatbot can tell you how to do something, an agent will actually make a plan and do it.

They’ve been popping up everywhere the past few months. “The opportunity for agents is gigantic,” Nvidia Corp. NVDA-Q founder Jensen Huang said at an event recently. In October, the chipmaker partnered with Accenture to train some 30,000 consultants to help their clients integrate agents. And then there are developers big and small building AI agents for legal, coding, compliance and customer-service applications.

Some companies building agents liken them to junior employees on whom boring but important work can be dumped – automatically updating batches of documents, tracking down the status of a delivery on behalf of a customer or tackling a software development project. Agents, the logic goes, will unlock productivity gains and liberate office workers from drudgery.

Of course, the enthusiasm behind agents is also an attempt to boost generative AI adoption when questions still remain about the cost, return on investment and utility of the technology, which can suffer from accuracy and reliability problems. The definition of an agent can be fuzzy, too. Another term – agentic – has arisen to describe tools that have agent-like qualities, but might not be considered full-fledged autonomous bots. The ambiguity could allow some companies to ride the AI hype by slapping the “agent” label on products that are janky at best.

It’s not hard to find acolytes for AI agents in the tech world, though. “Over the next two years, we are truly going to have AI colleagues that are working with us on many more advanced things,” said Scott Stevenson, who is co-founder and chief executive officer of Spellbook, which is based in St. John’s. The company, which raised US$20-million this year, already has an AI product for law firms to help draft and review contracts. The product has been a hit, according to Mr. Stevenson, garnering more than 2,500 paying customers.

In August, the company announced Spellbook Associate, which it describes as an AI agent for the legal industry. Mr. Stevenson said it can perform some of the monotonous tasks typically delegated to a junior associate, who might end up copying and pasting between documents – over and over again.

The application, which has a chatbot interface, can take a term sheet and update multiple related documents with new information. It can prepare a whack of documents as part of an employment package, or examine the contents of a data room – the term for a repository housing confidential information for a potential corporate acquisition – review the material for pitfalls, and rank the issues in order of severity. Associate leaves a trail of its activities, and its actions require approval.

The product has not yet been widely released. Chris Brown, a solo practitioner in Colorado who operates under the banner of Pixel Law, has been tinkering with Associate lately, and acts as a product adviser to the company. There are two main tasks he’s given it: updating a bunch of documents with new information, and replacing some calendar dates (but not all) in another batch. It’s the kind of tedious work that’s more complicated than using find-and-replace in Microsoft Word, and might ordinarily be done by a paralegal or associate. With Associate, he can simply explain the problem, and the agent will figure out the necessary steps.

“It kind of works perfectly sometimes,” Mr. Brown said. “And then other times, it just completely doesn’t do it at all.” The time savings can be significant when it works; Associate can finish the job in three minutes, whereas it might take Mr. Brown three-quarters of an hour. Still, he’s a bit puzzled by some of the tasks it did not complete. “I have no idea why it couldn’t figure it out,” he said. He’s provided feedback to the company, which has been quick to fix problems in the past.

The really tricky thing about building agents is the huge number of ways they can go about executing a task. “This creates an incredibly large surface area for testing and improvement,” according to Mr. Stevenson, who added that the company is collecting feedback from early users on how to improve Associate.

AI agents are not limited to the law field. A company called Cognition in San Francisco released a computer coding agent this year that claims to be able to complete some software projects, rather than just suggest the next line of script, which Github Copilot already does. British startup Convergence, founded by former Shopify and Cohere employees, recently raised US$12-million for its AI agent that the company says will handle administrative tasks at work and at home, such as booking vacations and ordering groceries. A startup called Norm AI has built agents to review regulations and run compliance checks for companies.

Salesforce Inc.’s annual AI conference in September was a multiday agent love-in, with keynote after keynote touting the potential. The company has launched a platform for customers to build their own bots, such as for customer service, or agents that handle in-bound sales inquiries, answer questions and book follow-up meetings. “Everyone’s been waiting for the big transformation from AI, and there’s been so much hype,” said Clara Shih, the CEO of AI at Salesforce. “The exciting thing about agents is that they really take it to the next level.”

That doesn’t mean an AI agent is going to be accurate all of the time. Generative AI applications are known to hallucinate and make up information. Salesforce has built in a confidence check of sorts, meaning that if an agent is unsure about how to proceed, it can hand off a task to a human. “There’s always going to be edge cases where the agent is encountering a new question or a new problem,” Ms. Shih said.

Microsoft Corp. has encountered reliability problems when testing agents internally, Mustafa Suleyman, the company’s head of AI, told Wired recently. “The problem with this technology is that you can get it working 50, 60 per cent of the time, but getting it to 90-per-cent reliability is a lot of effort,” he said.

Reliability issues bring a couple of implications. Constant human oversight and checking could limit the desired productivity gains, or open the door to embarrassing errors. There are already examples of chatbots offering incorrect information. An Air Canada customer pursued the airline through a B.C. civil resolution tribunal after its chatbot provided bad advice about a bereavement policy, for example.

“Imagine the chatbot is connected to the reservation system, and it can automatically change the flight or provide a refund. We’re amplifying the problem when we let the agent act autonomously,” said Emmanuel Maggiori, an AI consultant and engineer in Britain. “There will be instances of companies going through what happened to Air Canada. We’ll see a lot of that from agents.”

Ada Support Inc. in Toronto has been building AI customer-service agents for some time. The bots chart their own path through an interaction and can take actions such as checking the status of an order, resetting passwords, initiating refunds, and changing and cancelling flights.

The limitations today are not so much about the technology, but integrating AI agents with other internal systems, tools and websites that a company has, while managing privacy and security concerns. “That’s one of the biggest things that’s slowing us down,” said Ada chief product and technology officer Mike Gozzo. “If I look at my own product road map for the next six months, it’s all about trust.”

There are more than 100 customers using Ada’s “fully agentic” platform, according to Mr. Gozzo. Growing that number requires convincing companies that the technology won’t go awry. Hallucinations, he said, can be greatly mitigated for Ada’s customers by setting up firmer guardrails, even if that means the natural feel of a conversation is lost.

The proliferation of agents – assuming, of course, they work as advertised – raises questions about how the technology will change certain jobs. Will law firms need as many paralegals and junior associates if grunt work can be passed to an agent? “They’ll just be doing something different,” said Mr. Brown at Pixel Law. “They’ll be the ones managing the AI, double-checking that the AI output is correct,” he said.

At Ada, Mr. Gozzo has a similar prediction for customer-service representatives. There will be situations in which the AI agent is unsure how to proceed with a customer problem and needs to ask a real person for guidance. The human doesn’t take over the customer interaction, but merely provides the AI agent with direction. “It’s really just coaching: ‘Give the guy a 20-per-cent discount and be done with it,’” he said.

But is that role, a passive adviser awaiting a ping from an AI agent, going to be higher-quality, better-paying and more satisfying than the traditional call-centre job today? And will there be as many people employed in that role?

Many years before he joined Ada, Mr. Gozzo worked in a call centre providing technical support. Those on the other end of the line were not always in good spirits. “It was a hard, abusive kind of job,” he said. Each week, he and his colleagues would meet with their manager to review what had gone well and what had not. To Mr. Gozzo, the managerial role was much more appealing than mollifying aggrieved customers.

With AI, today’s phone jockeys could be elevated into that very same coaching role, only for agents instead of fellow humans, he said. “We’re going to create folks that can be more thoughtful and use skills that are less repetitive, by being able to think deeply about how to improve a system.”

With reports from Sean Silcoff

Tickers mentioned in this story

Follow related authors and topics

Interact with The Globe

Latest in

Tickers mentioned in this story

Follow related authors and topics

Interact with The Globe