How Does Shazam Work?

Cameron MacLeod wrote a nice piece on how Shazam actually works. I’ve been curious since I’ve used Shazam to gather timestamps when I listen to music at the HiFi shows. Sometimes Shazam’s able to recognize a song within a second. It’s impressive. Check out MacLeod’s analysis here.

Here’s a bird’s eye view of how it works:

The Challenge of Song Recognition

At first glance, one might question why identifying a song is considered a challenging problem. To comprehend the complexities involved, consider a graphical representation of a song’s audio waveform. Each song is essentially a collection of sound waves, and when visualized, these waves can appear intricate and irregular.

For instance, take a brief section of a song’s waveform. To determine if this audio snippet matches a particular song, a brute-force approach would involve sliding this section along the entire song, checking for a match at every point. This method would be computationally intensive and time-consuming, particularly when dealing with vast music libraries.

Furthermore, the challenge intensifies when dealing with real-world audio recordings, which are often affected by background noise, changes in amplitude, frequency variations, and other distortions. The simplistic sliding approach becomes inadequate under these conditions.

The Shazam Solution: Spectrograms and Fingerprinting

Shazam employs a more sophisticated approach to tackle these challenges. Here’s an overview of how it works:

Calculating a Spectrogram: Shazam starts by converting the audio signal into a spectrogram. A spectrogram is a graphical representation that displays how the frequencies in the audio signal change over time. This provides a detailed snapshot of the song’s audio characteristics.
Fingerprinting Peaks: Rather than analyzing the entire spectrogram, Shazam focuses on identifying significant peaks in the spectrogram. These peaks represent the most pronounced frequencies at specific moments in the song. Peaks are valuable because they are less susceptible to noise and distortions.
Hashing Peaks: To create a unique fingerprint for a song, Shazam pairs these peaks together and hashes them into a compact representation. This hashing process combines the frequency and timing information of each peak, resulting in a robust fingerprint for the song.
Database Matching: When a user requests a song identification, Shazam records a short snippet of the audio and repeats the process to create a fingerprint. It then searches its extensive database of precomputed fingerprints for a match. The song with the closest matching fingerprint is considered the identification.

Why Spectrogram Peaks?

The choice of using spectrogram peaks as the foundation of Shazam’s fingerprinting technique is deliberate. Spectrogram peaks are less susceptible to noise and can withstand various audio distortions. Moreover, they provide a more concise representation of the audio, reducing the computational load and storage requirements.

Matching and Scoring

The final step involves matching the audio sample’s fingerprint with those in the database. Shazam groups matching fingerprints by songs and calculates a score for each potential match. The song with the highest score is likely the correct identification. This scoring process considers the time alignment of peaks, ensuring an accurate match.

In essence, Shazam’s technology is akin to a musical detective. It listens to a song, extracts unique audio features, and then hunts for the song’s identity within a vast music library. The result is a seamless user experience that transforms the magic of song recognition into a technological reality.

For more in-depth information, check out MacLeod’s awesome article.

How Does Shazam Work?

The Challenge of Song Recognition

The Shazam Solution: Spectrograms and Fingerprinting

Why Spectrogram Peaks?

Matching and Scoring

About The Author

Jay Luong

Leave a reply Cancel reply

Exclusive Updates

Recent Comments

YouTube

Facebook

How Does Shazam Work?

The Challenge of Song Recognition

The Shazam Solution: Spectrograms and Fingerprinting

Why Spectrogram Peaks?

Matching and Scoring

About The Author

Jay Luong

Related Posts

How To Wire Your Home for Hi-Fi – Tips & Tricks

Treating Small and Medium-Sized Rooms for Better Sound

Linear Power Supplies for Audiophiles – Getting Closer to Live

Aurum Cantus Leisure 2 SV MKⅡ Loudspeaker Review

Leave a reply Cancel reply

Exclusive Updates

Recent Comments

YouTube

Facebook

Tags