Education

How Does Shazam Work?

Cameron MacLeod wrote a nice piece on how Shazam actually works. I’ve been curious since I’ve used Shazam to gather timestamps when I listen to music at the HiFi shows. Sometimes Shazam’s able to recognize a song within a second. It’s impressive. Check out MacLeod’s analysis here.

Here’s a bird’s eye view of how it works:

The Challenge of Song Recognition

At first glance, one might question why identifying a song is considered a challenging problem. To comprehend the complexities involved, consider a graphical representation of a song’s audio waveform. Each song is essentially a collection of sound waves, and when visualized, these waves can appear intricate and irregular.

For instance, take a brief section of a song’s waveform. To determine if this audio snippet matches a particular song, a brute-force approach would involve sliding this section along the entire song, checking for a match at every point. This method would be computationally intensive and time-consuming, particularly when dealing with vast music libraries.

Furthermore, the challenge intensifies when dealing with real-world audio recordings, which are often affected by background noise, changes in amplitude, frequency variations, and other distortions. The simplistic sliding approach becomes inadequate under these conditions.

The Shazam Solution: Spectrograms and Fingerprinting

Shazam employs a more sophisticated approach to tackle these challenges. Here’s an overview of how it works:

  1. Calculating a Spectrogram: Shazam starts by converting the audio signal into a spectrogram. A spectrogram is a graphical representation that displays how the frequencies in the audio signal change over time. This provides a detailed snapshot of the song’s audio characteristics.
  2. Fingerprinting Peaks: Rather than analyzing the entire spectrogram, Shazam focuses on identifying significant peaks in the spectrogram. These peaks represent the most pronounced frequencies at specific moments in the song. Peaks are valuable because they are less susceptible to noise and distortions.
  3. Hashing Peaks: To create a unique fingerprint for a song, Shazam pairs these peaks together and hashes them into a compact representation. This hashing process combines the frequency and timing information of each peak, resulting in a robust fingerprint for the song.
  4. Database Matching: When a user requests a song identification, Shazam records a short snippet of the audio and repeats the process to create a fingerprint. It then searches its extensive database of precomputed fingerprints for a match. The song with the closest matching fingerprint is considered the identification.

Why Spectrogram Peaks?

The choice of using spectrogram peaks as the foundation of Shazam’s fingerprinting technique is deliberate. Spectrogram peaks are less susceptible to noise and can withstand various audio distortions. Moreover, they provide a more concise representation of the audio, reducing the computational load and storage requirements.

Matching and Scoring

The final step involves matching the audio sample’s fingerprint with those in the database. Shazam groups matching fingerprints by songs and calculates a score for each potential match. The song with the highest score is likely the correct identification. This scoring process considers the time alignment of peaks, ensuring an accurate match.

In essence, Shazam’s technology is akin to a musical detective. It listens to a song, extracts unique audio features, and then hunts for the song’s identity within a vast music library. The result is a seamless user experience that transforms the magic of song recognition into a technological reality.

For more in-depth information, check out MacLeod’s awesome article.

Jay Luong

Mr. Audio Bacon himself. An open-minded electrical engineer and software developer by trade. I have an obsession with the enjoyment of all things media - specifically in the realm of music and film. So much heart and soul (and money) go into the creation of this artistry. My aim is to find out which products get me closer to what the musicians and directors intended.

Recent Posts

Bowers & Wilkins Unveils New Finishes for PX7 S2e & PX8 Headphones

Two new premium finishes for the multi-award-winning Px7 S2e and Px8 over-ear wireless headphones The… Read More

3 months ago

XACT unveils revolutionary XACT S1 EVO Music Server and PHANTOM Cables

XACT is proud to announce the launch of two groundbreaking products that set new benchmarks… Read More

3 months ago

Zeitgeist Germany Launches with Extraordinary IEMs

THE ART OF NOISE: INTRODUCING ZEITGEIST GERMANY AND ITS MISSION TO HANDCRAFT UNIQUE EARPHONES FROM… Read More

5 months ago

JCAT Unveils USB Card XE EVO: The Next Generation of USB Audio  Performance

JCAT, a leader in high-performance streaming-specific products, is proud to  announce the release of the… Read More

5 months ago

Bowers & Wilkins and David Beckham: Bringing Together Two British Symbols of Style and Performance

Worthing, UK, 30th May 2024: British audio brand Bowers & Wilkins and global icon David… Read More

7 months ago

Treehaus Audiolab Brings “The A/Machines” New 300B Amplifier

Treehaus often avoids calling a product “The Best.” The world of audio is far too… Read More

9 months ago

This website uses cookies.