Cohere adds vision to its RAG search capabilities

Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


Cohere has added multimodal embeddings to its search model, allowing users to deploy images to RAG-style enterprise search. 

Embed 3, which emerged last year, uses embedding models that transform data into numerical representations. Embeddings have become crucial in retrieval augmented generation (RAG) because enterprises can make embeddings of their documents that the model can then compare to get the information requested by the prompt. 

The new multimodal version can generate embeddings in both images and texts. Cohere claims Embed 3 is “now the most generally capable multimodal embedding model on the market.” Aidan Gonzales, Cohere co-founder and CEO, posted a graph on X showing performance improvements in image search with Embed 3. 

“This advancement enables enterprises to unlock real value from their vast amount of data stored in images,” Cohere said in a blog post. “Businesses can now build systems that accurately and quickly search important multimodal assets such as complex reports, product catalogs and design files to boost workforce productivity.”

Cohere said a more multimodal focus expands the volume of data enterprises can access through an RAG search. Many organizations often limit RAG searches to structured and unstructured text despite having multiple file formats in their data libraries. Customers can now bring in more charts, graphs, product images, and design templates. 

Performance improvements

Cohere said encoders in Embed 3 “share a unified latent space,” allowing users to include both images and text in a database. Some methods of image embedding often require maintaining a separate database for images and text. The company said this method leads to better-mixed modality searches. 

According to the company, “Other models tend to cluster text and image data into separate areas, which leads to weak search results that are biased toward text-only data. Embed 3, on the other hand, prioritizes the meaning behind the data without biasing towards a specific modality.”

Embed 3 is available in more than 100 languages. 

Cohere said multimodal Embed 3 is now available on its platform and Amazon SageMaker. 

Playing catch up

Many consumers are fast becoming familiar with multimodal search, thanks to the introduction of image-based search in platforms like Google and chat interfaces like ChatGPT. As individual users get used to looking for information from pictures, it makes sense that they would want to get the same experience in their working life. 

Enterprises have begun seeing this benefit, too, as other companies that offer embedding models provide some multimodal options. Some model developers, like Google and OpenAI, offer some type of multimodal embedding. Other open-source models can also facilitate embeddings for images and other modalities. The fight is now on the multimodal embeddings model that can perform at the speed, accuracy and security enterprises demand. 

Cohere, which was founded by some of the researchers responsible for the Transformer model (Gomez is one of the writers of the famous “Attention is all you need” paper), has struggled to be top of mind for many in the enterprise space. It updated its APIs in September to allow customers to switch from competitor models to Cohere models easily. At the time, Cohere had said the move was to align itself with industry standards where customers often toggle between models. 

Related Posts

I Keep an Eye on My Bags Using Apple AirTags, Which Are on Sale for Black Friday

I don’t often get jealous of an inanimate object, but my rolling suitcase got to enjoy an extra day in Paris while I waited at baggage claim at the airport in Seattle….

Read more

NYT Connections hints today: Clues, answers for November 22

Connections is the latest New York Times word game that’s captured the public’s attention. The game is all about finding the “common threads between words.” And just like Wordle, Connections…

Read more

Everything you need to know about those showstopping Wicked cameos

Wicked is already bursting with stars, from Ariana Grande and Cynthia Erivo to Jonathan Bailey and Jeff Goldblum. But the film takes its star meter up a notch by incorporating…

Read more

Yes, You Might Still Want To Keep Your Landline Phone: Here’s Why

Mobile network outages can take down networks for hours, leaving us stranded without one of our most critical lines of access to the world. What good is a smartphone if it can’t…

Read more

Bluesky Explained: Luke Skywalker and 21 Million Others Are Here, Should You Join?

The social network Bluesky has been growing at a rapid rate ever since the US presidential election concluded. A representative for the company said in an email on Monday that…

Read more

Best Black Friday Apple Deals 2024: We Found Record-Low Discounts on iPhones, MacBooks, AirPods and More

Everyone loves a good bargain, and now that Black Friday is finally here, there are thousands to shop at Amazon, Walmart, Best Buy and most major retailers. You’ll find tons…

Read more

Leave a Reply