For Mohammad Norouzi, starting a text-to-image generation company is like a homecoming, in a way. He always had an interest in art growing up in Iran, and a teacher once suggested he should pursue drawing more seriously. But instead of the arts, he studied computer science and machine learning, moving to Canada at age 22 for grad school and later earned a PhD from the University of Toronto.
Fast-forward a few years – and skip past a few contributions to artificial intelligence research – and Mr. Norouzi is now the chief executive of Ideogram AI, a Toronto startup that recently emerged from stealth mode. The company has built and trained its own AI image generation model from scratch, a complex and costly endeavour. If Ideogram succeeds, it will be in part because Mr. Norouzi has more than a business interest in the project. “This is more of a personal passion,” he explained in his first interview about the company.
In late August, Ideogram announced it had raised US$16.5-million in seed funding from lead investors a16z and Index Ventures in Silicon Valley, along with Canadian funds Golden Ventures and Two Small Fish Ventures. Ideogram debuted its free image generator a week later, and as of mid-September, it had accrued more than 500,000 users who have generated more than 35 million pictures. Interest has been so great that its servers have occasionally been overwhelmed, and service temporarily suspended at times.
While Ideogram is similar to other AI image generators such as Midjourney and OpenAI’s Dall-E, the company has an edge when it comes to text. Ask Midjourney to create a picture of a billboard depicting the words “Car for sale,” and it might churn out “Carr for ssle,” or something even more mangled. Ideogram’s text, while not always perfect, is generally more accurate.
But the company’s launch comes at a time when the initial interest in generative AI may be showing some signs of fatigue. Monthly visits to ChatGPT, the OpenAI chatbot that kicked off a wave of AI hype when it debuted last November, have declined for three straight months, while the amount of time people spend with the application has been falling since March, though numbers are ticking up again as people return to school. Visits to Midjourney also fell each month from May to July. And there are still questions about how some generative AI startups will turn a profit.
“We’ve got to think beyond the short-term,” Mr. Norouzi said. “This is the future of the creative industry. Somebody will figure it out, and we think we’ll be the one.”
His confidence stems from experience. Each of Ideogram’s founders, which also include William Chan, Chitwan Saharia and Jonathan Ho, spent time at Google Brain, the search giant’s AI research lab. All four of them were researchers on Imagen, the company’s text-to-image and video generator. While at the University of California, Berkeley, Mr. Ho was the lead author on a 2020 paper with the unwieldy title of Denoising Diffusion Probabilistic Models, which outlined an image generation method that helped provide the foundation for many models today.
“There just aren’t a lot of teams that know how to do this,” said Martin Casado, general partner at a16z. Other than OpenAI and Midjourney, Stability AI is the only company making headway in the image space, he said. Mr. Casado contends that the advent of generative AI is akin to the dawn of the internet and presents massive economic opportunities – even if no one can say for certain how the technology will play out. “You couldn’t quite predict Yahoo, you couldn’t predict Amazon, but they came about, and I feel like this is what we’re seeing today,” he said.
Investors were impressed by the team’s lead role in developing Imagen at Google, which the company first announced in May, 2022. But Google did not release the tool more widely until a year later. Google initially said that, as a result of the data used to train Imagen, some of which is scraped from the web, the tool could further embed harmful stereotypes. A preliminary assessment by the company found Imagen tended to generate images of people with lighter skin tones, and portrayed different professions in ways that perpetuated gender stereotypes. As a result, Google said more safeguards were needed.
Mr. Norouzi and his fellow founders left Google to start Ideogram last December, in part because they felt they could move faster outside of a large company. “We wanted to push it a little further,” he said. “We wanted to make it available to everybody.”
The team moved into a small downtown office in Toronto – where there is still no Ideogram nameplate on the directory – and the company has all of seven employees today, one of whom is an intern.
Producing an AI model that could render more accurate text within an image was part of the goal from the beginning, Mr. Norouzi said, though he declined to explain in detail how the team did so. “We did have a hunch that the way we’re building these models will help with text rendering and logos,” he said.
Users can type a prompt on Ideogram’s website, and the AI model will generate four different image variations within seconds. All of the images are public and displayed on a feed, and while users can follow one another, they’re not yet able to delete their creations. So far, many users are taking advantage of Ideogram’s text capabilities, rendering logos, memes and fake movie posters. Depicting public figures wearing T-shirts with text or holding up signs is another popular option, such as Vladimir Putin wielding a sign reading “My war was a bad idea,” or Justin Trudeau wearing a T-shirt emblazoned with a phrase that we will not repeat here.
Eva Lau, co-founder of Two Small Fish Ventures, has used Ideogram to generate ideas for the cover of a book she’s writing called Chocolate Umbrella, a fictionalized account of Wattpad, the self-publishing platform where she was a member of the founding team. Ideogram has something in common with Wattpad, in that both services help to enable creativity, she said. “We can see how creative people have great ideas, but have few tools or expertise to execute on them,” she said.
What Ideogram doesn’t have yet is revenue. Every image users generate costs Ideogram money, and the most recent funding doesn’t provide unlimited runway. A subscription option could be in the future, Mr. Norouzi said, but he doesn’t seem to be in a hurry. “We’re a bunch of engineers,” he said. “We’re focused on building the best product on the market, as opposed to making money too early.”
Mike Volpi, a partner at Index Ventures, said that the success of Midjourney proves there is a business model for image generators. Based in San Francisco, Midjourney charges between US$10 and US$120 a month for subscriptions, and is on track to earn US$200-million in revenue this year, according to The Information.
But it’s too early to say how exactly Ideogram will monetize, and who will be willing to pay for the service. “I would say the journey is much less premeditated, and much more like, let’s put it out there, see how people use it, and improve the product,” Mr. Volpi said.
Many artists, meanwhile, have been dismayed by the advent of AI image generation, fearing the technology could put people out of work and devalue the profession. The data sets on which AI models have been trained are often opaque and include works by artists who never consented to their creations being used in this manner. In February, stock photo company Getty Images filed a lawsuit against Stability AI in the U.S., alleging the company copied more than 12 million pictures from its database to train an AI model.
Mr. Norouzi declined to provide details about Ideogram’s data sets. “We’ve put a lot of thought into how to train models and how to ensure that what we’re doing is legal,” he said. Ideogram could potentially work directly with artists down the road, he added. “If you’re an artist and you have a lot of assets and if you want to help other people create in your style, I think that should be feasible, and you deserve to have a share of the profit,” he said.
As for bias and stereotypes, Mr. Norouzi said that Ideogram has removed certain types of imagery from its data set, but acknowledges that more research is needed.
The ability to spread misinformation with image generators is another huge area of concern. Dall-E, for example, does not allow users to render public figures, whereas Ideogram has no such restrictions. “The last thing that we want is our tool to be used to portray public figures in an inappropriate context,” Mr. Norouzi said. “Our strategy is to improve the safety of our product on an ongoing basis. We think the best approach is to have a consistent conversation with our users, and the press and everyone else.”