Major search engines like Google help people find a needle in a haystack by searching through hundreds of billions of Web pages and bringing up the top 10 most relevant results for user inspection. Still, it would be very frustrating if Google just showed links and forced users to click each one to read the content to validate if each result is really relevant to their needs. Imagine if the world was like the image below:
Terrible, right? That's why search engines include relevant snippets from a webpage in the search results and highlight parts of the snippet to help the user understand how the result is relevant to the query. This helps the user find what they’re looking for easier and faster, which leads to a better user experience.
Google and other search engines “understand” the text on webpages to make snippets and highlights possible.
Extending Highlights to Multimodal Search
In a multimodal world where users expect non-text data such as images, video, and audio to also be searchable, we need to extend this understanding to all forms of content. Multimodal Highlights is our reinvention of snippeting and highlights to work seamlessly across all types of information, and often with multiple types simultaneously.
Multimodal Highlights is now part of our search and discovery platform, and it’s a renewed opportunity to surprise and delight users with a results page that shows the user they’re understood. Let’s look at a few concrete examples.
Using Multimodal Highlights for Product Variants
When a customer searches an online store for a product, there are often variants of that product. For example, if a store sells groceries, a particular beverage might come in several flavors.
In this screenshot, we see a black cherry flavored soda that also comes in lemon and cola flavors among others.
But on this particular site, when we search for “lemon soda”, we see the black cherry variant appear in the search results rather than the lemon variant:
Here’s the problem: the search system knows this product has a lemon variant so it shows the product, but it doesn’t show the right variant image in the search results. Instead, it shows the default image (black cherry). To discover that this particular product has a lemon variant, the user would need to click on the search result showing a black cherry variant. This is counter-intuitive and it’s something the user probably wouldn’t do.
Indeed, the user would probably perceive this as an irrelevant result and completely miss what might’ve been exactly what they’re looking for. The consequence? Lost sales and damaged user perception of the business. In the absence of AI that can pick the most relevant variant, companies are forced to either settle for poor user experience or make each variant its own product. Making each variant its own product then leads to every variant flooding search results in response to a basic query like “soda”, which takes us back to a poor user experience.
With Multimodal Highlights, this problem is solved seamlessly and automatically. We turned this feature on with one of our customers, and here’s the before and after when searching for “lemon soda”.
Before multimodal highlights, products with a lemon variant were surfaced but only the default images were shown, which might be correct or incorrect.
With multimodal highlights enabled, the right metadata (in this case, images) for the right variant are automatically selected and shown. The user sees what they actually want, easily and quickly.
Multimodal highlights work across all the ways a product can vary including size, color, flavor, materials, scents, resolutions, storage space, and more. Here are a few other examples our system handles:
Automatic Thumbnail Selection
Multimodal Highlights can be understood simply as selecting the most relevant parts of any product description. Our system considers all your metadata when returning search results, so it can highlight relevant information coming from text, images, video or any combination of them.
Your users get more than just relevant results, but are also presented the right metadata to instantly understand why the results are relevant.
For example, if a user searches for “dresses with cross backs”, not only will the right products be surfaced, but the most relevant product image will be shown (i.e. highlighted) so that the user immediately sees what they need to see to move toward conversion.
Other types of highlighting could include super-imposed bounding boxes. Here, a search for “car with gullwing doors” results in relevant images with the key feature (gullwing doors) highlighted with a red box:
Multimodal Highlights ensure your users see what they need to see so that they explore, engage, and ultimately buy.
Multimodal Highlights Help Users Understand Why They’re Seeing What They’re Seeing
Imagine a job board where users can search for designers to work on specific projects for them. In this example, a search for a designer with experience in something particular leads to results with text snippets signaling why the user should be interested:
Although most examples we have shown so far are focusing on images, the highlighting feature also works for good ol’ text. And it's semantic: it determines the relevancy of a piece of text based on its meaning, not just keyword overlap. In this search for articles related to “Open AI people”, even though the title or other information from the top result may not look relevant at first, bolding sections of text helps the user understand why the result was presented:
We’re excited to roll out Multimodal Highlights to all our customers. And we have a lot more coming down the pipeline. Interested in exploring how we can offer delightful, cutting-edge search experiences to your users? Get in touch.