This publish continues Behind the Faucet, a collection exploring the hidden mechanics of on a regular basis tech — from Uber to Spotify to serps. I’ll dive below the hood to demystify the programs shaping your digital world.
first relationship with music listening began at 6, rotating by way of the albums in the lounge’s Onkyo 6-disc participant. Cat Stevens, Groove Armada, Sade. There was all the time one track I stored rewinding to, although I didn’t know its title. 10 years on, moments of the track returned to reminiscence. I searched by way of boards, ‘previous saxophone melody’, ‘classic track about sand dunes’, in search of years with no success. Then, sooner or later at college, I used to be in my pal Pegler’s dorm room when he performed it:
That lengthy search taught me how essential it’s to have the ability to discover the music you’re keen on.
Earlier than streaming and sensible assistants, music discovery relied on reminiscence, luck, or a pal with good music style. That one catchy refrain may very well be misplaced to the ether.
Then got here a music-lover’s miracle.
A number of seconds of sound. A button press. And a reputation in your display.
Shazam made music recognisable.
The Origin: 2580
Shazam launched in 2002, lengthy earlier than apps had been a factor. Again then it labored like this:
You’d dial 2580# in your cell (UK solely).
Maintain your telephone as much as the speaker.
…Wait in silence…
And obtain a SMS telling you the title of the track.
It felt like magic. The founding staff, Chris Barton, Philip Inghelbrecht, Avery Wang, and Dhiraj Mukherjee, spent years constructing that phantasm.
To construct its first database, Shazam hired 30 young workers to run 18-hour shifts, manually loading 100,000 CDs into computer systems and utilizing customized software program. As a result of CD’s don’t comprise metadata they needed to kind the names of the songs manually, referring to the CD sleeve, to ultimately create the corporate’s first million audio fingerprints — a painstaking course of that took months.
In an period earlier than smartphones or apps, when Nokia’s and Blackberry’s couldn’t deal with the processing or reminiscence calls for, Shazam needed to keep alive lengthy sufficient for the know-how to catch as much as their concept. This was a lesson in market timing.
This publish is about what occurs within the second between the faucet and the title, the sign processing, hashing, indexing, and sample matching that lets Shazam hear what you may’t fairly title.
The Algorithm: Audio Fingerprinting
In 2003, Shazam co-founder Avery Wang published the blueprint for an algorithm that also powers the app at this time. The paper’s central concept: If people can perceive music by superimposing layers of sound, a machine may do it too.
Let’s stroll by way of how Shazam breaks sound right down to one thing a machine can recognise immediately.
1. Capturing Audio Pattern
It begins with a faucet.
Once you hit the Shazam button, the app data a 5–10 second snippet of the audio round you. That is lengthy sufficient to determine most songs, although we’ve all waited minutes holding our telephones within the air (or hiding in our pockets) for the ID.
However Shazam doesn’t retailer that recording. As an alternative, it reduces it to one thing far smaller and smarter: a fingerprint.
2. Producing the Spectrogram
Earlier than Shazam can recognise a track, it wants to know what frequencies are within the sound and after they happen. To do that, it makes use of a mathematical device referred to as the Fast Fourier Transform (FFT).
The FFT breaks an audio sign into its element frequencies, revealing which notes or tones make up the sound at any second.
Why it issues: Waveforms are fragile, delicate to noise, pitch modifications, and machine compression. However frequency relationships over time stay steady. That’s the gold.
If you happen to studied Arithmetic at Uni, you’d bear in mind the struggles of studying the Discrete Fourier Transform process.Quick Fourier Rework (FFT) is a extra environment friendly model that lets us decompose a fancy sign into its frequency elements, like listening to all of the notes in a chord.
Music isn’t static. Notes and harmonics change over time. So Shazam doesn’t simply run FFT as soon as, it runs it repeatedly over small, overlapping home windows of the sign. This course of is called the Quick-Time Fourier Rework (STFT) and types the premise of the spectrogram.
The ensuing spectrogram is a metamorphosis of sound from the amplitude-time area (waveform) into the frequency-time area.
Consider this as turning a messy audio waveform right into a musical heatmap.
As an alternative of displaying how loud the sound is, a spectrogram reveals what frequencies are current at what instances.

A spectrogram strikes evaluation from the amplitude-time area to frequency-time area. It shows time on the horizontal axis, frequency on the vertical axis, and makes use of brightness to point the amplitude (or quantity) of every frequency at every second. This lets you see not simply which frequencies are current, but in addition how their depth evolves, making it doable to determine patterns, transient occasions, or modifications within the sign that aren’t seen in a typical time-domain waveform.
Spectrograms are extensively utilized in fields resembling audio evaluation, speech processing, seismology, and music, offering a robust device for understanding the temporal and spectral traits of alerts.
3. From Spectrogram to Constellation Map
Spectrograms are dense and comprise an excessive amount of information to match throughout thousands and thousands of songs. Shazam filters out low-intensity frequencies, leaving simply the loudest peaks.
This creates a constellation map, a visible scatterplot of standout frequencies over time, just like sheet music, though it jogs my memory of a mechanical music-box.

4. Creating the Audio Fingerprint
Now comes the magic, turning factors right into a signature.
Shazam takes every anchor level (a dominant peak) and pairs it with goal peaks in a small time window forward — forming a connection that encodes each frequency pair and timing distinction.
Every of those turns into a hash tuple:
(anchor_frequency, target_frequency, time_delta)

What’s a Hash?
A hash is the output of a mathematical perform, referred to as a hash perform, that transforms enter information right into a fixed-length string of numbers and/or characters. It’s a manner of turning advanced information into a brief, distinctive identifier.
Hashing is extensively utilized in laptop science and cryptography, particularly for duties like information lookup, verification, and indexing.

For Shazam, a typical hash is 32 bits lengthy, and it would possibly be structured like this:
- 10 bits for the anchor frequency
- 10 bits for the goal frequency
- 12 bits for the time delta between them

This tiny fingerprint captures the connection between two sound peaks and the way far aside they’re in time, and is powerful sufficient to determine the track and sufficiently small to transmit shortly, even on low-bandwidth connections.
5. Matching Towards the Database
As soon as Shazam creates a fingerprint out of your snippet, it must shortly discover a match in its database containing thousands and thousands of songs.
Though Shazam has no concept the place within the track your clip got here from — intro, verse, refrain, bridge — doesn’t matter, it appears for relative timing between hash pairs. This makes the system sturdy to time offsets within the enter audio.

Shazam compares your recording’s hashes in opposition to its database and identifies the track with the very best variety of matches, the fingerprint that greatest traces up along with your pattern, even when it’s not a precise match as a result of background noise.
The way it Searches So Quick
To make this lightning-fast, Shazam makes use of a hashmap, an information construction that permits for near-instant lookup.
A hashmap can discover a match in O(1) time, which means the lookup time stays fixed, even when there are thousands and thousands of entries.
In distinction, a sorted index (like B-tree on disk) takes O(log n) time, which grows slowly because the database grows.
This steadiness of time and area complexity is called Big O Notation, idea I’m not ready of bothered to show. Please confer with a Pc Scientist.
6. Scaling the System
To take care of this pace at world scale, Shazam does extra than simply use quick information constructions, it optimises how and the place the information lives:
- Shards the database — dividing it by time vary, hash prefix, or geography
- Retains scorching shards in reminiscence (RAM) for fast entry
- Offloads colder information to disk, which is slower however cheaper to retailer
- Distributes the system by area (e.g., US East, Europe, Asia ) so recognition is quick regardless of the place you’re
This design helps 23,000+ recognitions per minute, even at world scale.
Influence & Future Functions
The plain utility is music discovery in your telephone, however there’s one other main utility of Shazam’s course of.
Shazam facilitates Market Insights. Each time a consumer tags a track, Shazam collects anonymised, geo-temporal metadata (the place, when, and the way typically a track is being ID’d.)
Labels, artists, and promoters use this to:
- Spot breakout tracks earlier than they hit the charts.
- Establish regional tendencies (a remix gaining traction in Tokyo earlier than LA).
- Information advertising and marketing spend primarily based on natural attraction.
In contrast to Spotify, which makes use of consumer listening behaviour to refine suggestions, Shazam gives real-time information on songs folks actively determine, providing the music trade early insights into rising tendencies and common tracks.
What Spotify Hears Before You Do
The Data Science of Music Recommendationmedium.com
On December 2017, Apple bought Shazam for a reported $400 million. Apple reportedly uses Shazam’s data to augment Apple Music’s recommendation engine, and record labels now monitor Shazam trends like they used to monitor radio spins.

Sooner or later, there’s anticipated evolution in areas like:
- Visual Shazam: Already piloted, level you digital camera at an object or art work to determine it, helpful for an Augmented Actuality future.
- Live performance Mode: Establish songs stay throughout gigs and sync to a real-time setlist.
- Hyper-local trends: Floor what’s trending ‘on this road’ or ‘on this venue’, increasing community-shared music style.
- Generative AI integration: Pair audio snippets with lyric technology, remix ideas, or visible accompaniment.
Outro: The Algorithm That Endures
In a world of ever-shifting tech stacks, it’s uncommon for an algorithm to remain related for over 20 years.
However Shazam’s fingerprinting technique hasn’t simply endured, it’s scaled, advanced, and grow to be a blueprint for audio recognition programs throughout industries.
The magic isn’t simply that Shazam can title a track. It’s the way it does it, turning messy sound into elegant math, and doing it reliably, immediately, and globally.
So subsequent time you’re in a loud, trashy bar holding your telephone as much as the speaker taking part in Lola Younger’s ‘Messy’ simply bear in mind: behind that faucet is a stupendous stack of sign processing, hashing, and search, designed so properly it barely needed to change.