Allen Institute for AI's new model points to items in images, aims to make bigger point in industry

Molmo, the new multimodal AI model from the non-profit Allen Institute for AI, correctly points to the ketchup bottle in a refrigerator door. (GeekWire Photo / Todd Bishop, screenshot from Ai2 demo site.)

Turing Test? Whatever. Meet the Refrigerator Challenge.

A new multimodal artificial intelligence model from the Allen Institute for AI (Ai2), works with visual data in novel ways. It can analyze and describe images, like other AI models, but it goes further by pointing to different parts of the image — annotating them with glowing pink dots.

It’s called “Molmo,” and it’s actually four models, ranging in size from 1 billion to 72 billion parameters. Leaders of the Seattle-based AI nonprofit say Molmo shows the power of an open approach to AI, proves the value of high-quality training data, and unlocks new capabilities for AI agents, robots, and augmented and virtual reality.

But after getting access to the Molmo demo site in advance of its unveiling Wednesday morning, I decided to test the technology on another frontier — my family fridge — challenging the AI with a task known to stump certain humans.

In an impressive display of visual perception, Molmo pointed correctly to the ketchup in my refrigerator door, as shown in the image above, despite the plastic bottle being turned around. It also found the lettuce and grapes in the drawers, the yogurt on the first and second shelves, and the package of chicken.

For the record, Molmo wasn’t able to find the bottle of beer tucked into the back of the lowest shelf, despite the “Modelo” label being just barely visible in the image. Hey, I can empathize. We’ve all got room for improvement.

Household tech tests aside, there’s a lot going on behind the scenes.

Ai2 uses an open approach to artificial intelligence — releasing its training data, annotations, underlying code, model weights, and other data for researchers and developers to understand and use themselves. This contrasts with the proprietary approach from companies such as OpenAI, Google, Anthropic, and others.

Ai2 CEO Ali Farhadi, speaking with reporters Tuesday at the nonprofit’s headquarters north of Lake Union in Seattle, said Molmo shows that open models can now rival proprietary alternatives on key performance benchmarks.

While cautioning that he’s not a fan of these benchmarks, due to what he described as scientific flaws, Farhadi acknowledged that they are widely used in the industry, and he showed them to make a larger point.

“Open and closed are getting very, very close together,” he said.

In addition, smaller models are performing on par with larger models. For example, a lightweight version of Molmo with 1 billion parameters performs as well as the 12 billion parameter Pixtral 12B model released last week by Mistral AI, the French AI startup in which Microsoft invested earlier this year.

The smaller size is “a key enabler, because now you could start having these things run on your phone, on your wearables, on your desktop, on your laptop, and that just expands the footprint of what these models can do,” Farhadi said.

Ai2 says its largest Molmo 72B model likewise compares favorably with OpenAI’s GPT-4V, Anthropic’s Claude 3.5, and Google’s Gemini 1.5.

The unveiling of Molmo comes in advance of the Meta Connect conference on Wednesday, where the Facebook parent company is expected to show the latest version of its open-source Llama large-language model.

A key differentiator with Molmo, Farhadi explained, is Ai2’s focus on high-quality, curated data. Rather than relying on large, noisy, web-crawled datasets, Momo was trained on a smaller but higher quality dataset, using careful human annotations. This improves the model’s accuracy and reliability.

In demos at Ai2 this week, Lead Researcher Matt Deitke showed Molmo’s ability to identify seemingly every detail in a picture of the bustling entrance to Pike Place Market in Seattle, and to identify and count the number of dogs in a photo. Molmo was even able to count the number of dogs with their tongues out.

Another notable (if symbolic) breakthrough: Molmo can tell time from a traditional clock face, something that other AI models have struggled to do.

Molmo’s visual recognition capabilities also include the ability to read web pages, which creates the possibility for developers to use the model to create new forms of autonomous AI agents. An Ai2 highlight video (above) includes an AI agent that browses the Starbucks website and places a coffee order, for example.

Ai2 CEO Ali Farhadi in his office earlier this year at the nonprofit’s Seattle HQ. (GeekWire Photo / Todd Bishop)

Ai2, founded by the late Microsoft co-founder Paul Allen, has been led for more than a year by Farhadi. He previously founded and led Ai2 spinout Xnor.ai as CEO, and sold it to Apple in 2020 in an estimated $200 million deal that represents one of the institute’s biggest commercial successes to date.

Farhadi rejoined Ai2 in July 2023, after leading Apple’s machine learning initiatives.

The institute released its Open Language Model, or OLMo, in February last year, part of a larger effort to bring more transparency to the rise of generative AI models. OLMo won Innovation of the Year at the 2024 GeekWire Awards.

As a nonprofit AI research institute, Ai2 doesn’t focus on developing products of its own, but instead seeks AI breakthroughs that serve society, and offers its technology for others to use and learn from.

However, with the Molmo demo site, Ai2 is taking a more public approach this time, seeking to bring new attention to the technology, to help serve its mission.

“This is the first time we’re also putting a live demo out,” Farhadi said, acknowledging some angst. “We did the best we could to make sure that it’s safe, and it doesn’t do weird things. But with these kind of models, you never know what’s going to happen. This is an experiment for us to see and learn if this strategy works or not.”

Allen Institute for AI’s new model points to items in images, aims to make bigger point in industry

Most Popular on GeekWire

Job Listings on GeekWork

Related Stories

Most Popular on GeekWire

Job Listings on GeekWork