Cameron MacLeod wrote a nice piece on how Shazam actually works. I’ve been curious since I’ve used Shazam to gather timestamps when I listen to music at the HiFi shows. Sometimes Shazam’s able to recognize a song within a second. It’s impressive. Check out MacLeod’s analysis here.
Here’s a bird’s eye view of how it works:
At first glance, one might question why identifying a song is considered a challenging problem. To comprehend the complexities involved, consider a graphical representation of a song’s audio waveform. Each song is essentially a collection of sound waves, and when visualized, these waves can appear intricate and irregular.
For instance, take a brief section of a song’s waveform. To determine if this audio snippet matches a particular song, a brute-force approach would involve sliding this section along the entire song, checking for a match at every point. This method would be computationally intensive and time-consuming, particularly when dealing with vast music libraries.
Furthermore, the challenge intensifies when dealing with real-world audio recordings, which are often affected by background noise, changes in amplitude, frequency variations, and other distortions. The simplistic sliding approach becomes inadequate under these conditions.
Shazam employs a more sophisticated approach to tackle these challenges. Here’s an overview of how it works:
The choice of using spectrogram peaks as the foundation of Shazam’s fingerprinting technique is deliberate. Spectrogram peaks are less susceptible to noise and can withstand various audio distortions. Moreover, they provide a more concise representation of the audio, reducing the computational load and storage requirements.
The final step involves matching the audio sample’s fingerprint with those in the database. Shazam groups matching fingerprints by songs and calculates a score for each potential match. The song with the highest score is likely the correct identification. This scoring process considers the time alignment of peaks, ensuring an accurate match.
In essence, Shazam’s technology is akin to a musical detective. It listens to a song, extracts unique audio features, and then hunts for the song’s identity within a vast music library. The result is a seamless user experience that transforms the magic of song recognition into a technological reality.
For more in-depth information, check out MacLeod’s awesome article.
Treehaus often avoids calling product “The Best.” The world of audio is far too… Read More
Next-generation headphone developed with community feedback to deliver an even better listen and stunning… Read More
TIDAL Audio, the world renowned manufacturer of luxury high-end audio speakers and electronics, is delighted… Read More
Treehaus Audiolab introduces new “Texture Black Metallic” finish at the Florida International Audio Expo 2024… Read More
As an audiophile constantly in pursuit of the pinnacle of sound quality, I've had the… Read More
HIFIMAN Debuts Two New Desktop DAC/Amps EF500 and EF499 Put HIFIMAN Design and Performance Within… Read More
This website uses cookies.