AI Is Changing Search. Learn How to Adapt

March 21, 2023

Team Objective

Google Trained People to Search Using Natural Language
Google has transformed how people search by supporting longer, more conversational queries. By using technology that understands context, it allowed users to search in ways that feel natural and get direct answers to their questions.

Search queries on Google have become longer over time

Takeaway: People running searches increasingly expect to use more natural sounding language and be understood.

ChatGPT Put That Trend Into Overdrive
Putting aside the breathless hype around ChatGPT being a "Google killer", the fact is ChatGPT is conditioning more people to search using natural language descriptions of what they want.

To take it a step further, Bing leveraged technology from OpenAI (the company behind ChatGPT) and integrated it with search-specific systems:

In this example, Bing recognizes all the constraints of the query (e.g., "using only cardboard boxes") and returns results that address every desire in the query.

Takeaway: The shift to natural language search is accelerating, and the query complexity is increasing.

Users Now Expect to Search Sites and Apps the Same Way
E-commerce stores will let users search using detailed product descriptions ("light-colored floral dress for a summer vacation").

Code editors will let users search based on what code actually does ("wherever we retrieve featured items from the database").

Sales support software will let users search for prospect actions and intents ("all the calls where the prospect asked for pricing information").

And the list goes on. In all cases, users want to:

Search the way they think and speak using natural language.
Get results based on the query's meaning, not only on keyword matches.

The Good News: You Have Options
We gathered a list of next-generation search solutions for you to check out. They're on a spectrum with turnkey solutions (i.e., "set it and forget it") on one end, and total do-it-yourself implementations on the other.

Create Your Own Stack (i.e. “Build”)
There's plenty to choose from if you want to assemble your own stack, and the upside is flexibility.

Notice how many components must be assembled in this example search stack to deal with content in multiple modalities

The downside is making sure you make the right technology and integration choices from the vast array of options.

For ingestion, depending on what kind of data you're ingesting, where you're ingesting from, and how you're ingesting it (e.g. stream vs. batch), there's Apache NiFi, Apache Spark, Apache Kafka, AWS Kinesis, Google Cloud Pub/Sub, Logstash, and Microsoft Event Hubs to name a few.

Once the data's coming in, there are plenty of preprocessing options depending on the type of data. Keep in mind the amount of preprocessing needed depends on the type of models you intend to use and your goals.

For text, there's spaCy, Hugging Face, or even good old-fashion Unix tools such as sed. For images, there's OpenCV, Pillow, and scikit-image. For video, there's FFmpeg and MoviePy . And for audio, librosa and PyDub. For more exotic data types (e.g. gene sequences), you'll need to roll your own solution.

It's best to store preprocessed (and maybe raw data as well) before moving on. For that, there are relational stores such as Postgres, Aurora, BigQuery, etc. NoSQL databases such as DynamoDB among many others, and columnar storage formats such as Apache Parquet and Apache Avro. You'll also need blob storage as well through something like AWS S3, Google Cloud Storage, Azure Blob Storage, etc.

Once your data is ready, it's time to generate representations such as embeddings. Some models work on one mode of data, while others work on multiple (but not all) modes. For textual embeddings, there are APIs from companies like OpenAI and Cohere (though the quality differs and you'll need to do a quality vs. cost analysis), or you can use several open-source models from Hugging Face. For images, there's CLIP, ALIGN, and VLP which all work on both images and text. For videos, there are models which extend image embedding models to capture spatio-temporal features, and there are models such as BEATs for audio. More models come out every week.

These representations will need to be stored for retrieval. For that, you'll probably want to use multiple stores (because pure vector search probably isn't enough for a professional grade solution) and ensure they provide efficient querying and scalability. Your collection would include vector stores (Weaviate, Milvus, Pinecone, etc), and beyond that, goal-specific solutions which could include relational databases, graph databases, and document stores.

After all that, it's time to start serving results. The infra will need to cache intelligently for operational efficiency and user experience, load balance to stay responsive, and tuned and monitored for optimal performance. Over time, it'll need to collect usage feedback for fine-tuning the models.

If that sounds like a lot to think about, you've hit on the downsides of rolling your own:

It takes significant engineering resources and many months to get something production-worthy.
It requires ongoing maintenance, devising ways to improve the models over time, and monitoring the system to ensure SLAs (e.g., latency) are met.

Also, many details need to be addressed which we haven't covered. For example, are you parsing PDFs correctly? How do you search over multiple file types (images, videos, audio)? How do you evaluate search quality? What if you want a hybrid option that combines keyword and neural search? How do you improve search performance based on user feedback?

Go With a Turnkey Solution (i.e. “Get”): Objective
Objective is the only company which offers an end-to-end multimodal search solution. All you need to do is point us to your data, then search with our API.

We'll handle everything in between including ingestion, parsing, embedding, maintaining uptime and low latency, scaling, and improving the system over time.

Our content understanding engine is at the core of our platform, which works on text, images, video, audio, and anything else encoded in bits. Check out our demos:

Image search over stock images. Just describe what you're looking for, like "cat lazing in the sun".
Podcast search. Search podcasts for critical ideas and concepts and go straight to those segments, like "t-shirt for a rock concert".

So whether you're a media company or a research firm, we can help you make anything searchable.

Search Needs to Be Intuitive and "Just Work"
Regardless of which approach you choose, if you want to provide a modern search experience that meets the expectations of today's users, it's time to update your search capabilities.

We'd love to share more about what we're learning about modern search and hear what you think of search. Get in touch, and let's share ideas.

‍