When Shazam came out in 2002, it was magic. Hold your phone up to a music clip being played, and the app would identify the song. It was great for specific tracks, but what if you want a particular type of music based on an idea, mood, scene, goal, or whatever else you have in mind?
A creator looking for music has had thoughts like “I need something inspiring with a beat and some strings for this training montage” or “I want some light electronica with a bright happy vibe for this travel segment”. But actually describing the kind of music you want in natural language and getting useful suggestions is still a challenge.
Searching for music today is still time-consuming and tedious. Music is tagged or categorized into large buckets (“happy”, “rock”, “corporate”, etc) which lands you in a pile of hundreds or thousands of tracks. From there, you have to work through the catalog looking for the right song. This stretches production time for the user, and requires tedious manual effort for the music vendor.
Tagging music with keywords and metadata is not only hard and expensive, it can’t capture all the nuances of a particular track.
With the explosion of short form video, faster pace of social media, and higher production standards, finding attention-grabbing music and effects right when you need is more important than ever.
Semantic Multimodal Search Helps Users Find
Music the Way They Want To
Let’s break it down. Semantic means the system understands the existing tags and text descriptions better. Rather than relying only on exact text match, semantic search can handle synonyms, meanings, context, and misspellings.
For example, maybe you have a track tagged with “dark”, “industrial”, and “electronic”:
With only exact text matching, if someone searches for music using the term “cyberpunk”, they would get a zero results message.
But with semantic search, the system understands that cyberpunk is associated with dark, industrial, and electronic, and returns relevant results:
This enables different users, from musicians to video producers, to search for music the way they already think and talk about it.
With semantic search, your users gain greater access to your catalog and a more natural search experience using your existing metadata. Our Search and Discovery platform already supports semantic search out of the box (get in touch to learn more).
Multimodal takes things a step further.
With multimodal search, the system understands
content natively whether it’s text, images,
video, or audio. Given a piece of music, the
system recognizes its genre (classical, rock),
mood (upbeat, dark, suspenseful), and even
scenarios where the piece would be suitable
(tropical beach vacation). It does this by
listening to the audio of the track, not by
utilizing any text or tags.
If you already have tags you want to reuse or need to add information that can’t be extracted from the audio, that works too. Our multimodal system can leverage information from multiple modalities – e.g. any text, tags or other metadata – and integrate them with music understanding. And it can even learn from feedback over time to give better recommendations.
With semantic multimodal search, your user is free to express what they want based on their personal way of thinking and talking about music. So when they search for “some light electronica with a bright happy vibe for this travel segment”, they’ll get what they’re looking for, when they need it.
And since the system understands music natively, it can also help your user find music that’s similar to another song. Imagine hearing a copyrighted song, and being able to find a similar track that’s copyright free from a stock music site just by using the reference track.
We’ve made this possible for the visual medium, and we’re now building a similarly delightful search experience for audio. Want to make exploring your music catalog feel like magic? Get in touch.
Want to make exploring your music catalog feel like magic? Get in touch.