Your earbuds perform a small miracle every time you take a call from a noisy street: they pluck your voice out of a wall of traffic, wind, and chatter, and deliver it cleanly to the person on the other end \u2014 using microphones smaller than a grain of rice, mounted on a device wobbling in your ear. It's arguably the hardest engineering problem in the whole product, and it's the spec people complain about most. Here's how earbuds actually capture your voice, why calls have lagged behind music quality for so long, and what's finally changing.
The microphone in your earbud is a tiny machine
Earbuds use MEMS microphones \u2014 Micro-Electro-Mechanical Systems. Each one is a silicon chip with a flexible diaphragm just micrometers thick suspended above a fixed backplate. Sound waves vibrate the diaphragm; that movement changes the electrical capacitance between it and the backplate; a tiny onboard amplifier converts those changes into a signal. The entire assembly is often smaller than a millimeter on a side.
MEMS mics are cheap, consistent, and resistant to vibration \u2014 ideal for mass production. But each one is fundamentally omnidirectional: it hears everything from every direction equally. On its own, a single mic on your ear can't tell the difference between your voice and a bus engine. The intelligence isn't in the microphone. It's in using several of them together.
Why one mic is never enough
Modern earbuds carry several microphones per side \u2014 often three or more. Some face outward to sample the environment (these double as the feedforward mics for noise cancellation); some sit near the bottom, closer to your mouth; and some face inward. Having multiple mics in known positions is what unlocks the real trick: beamforming.
Beamforming: listening in a direction
Sound from your mouth reaches each microphone at a fractionally different time, because each mic is a slightly different distance away. Sound from a car to your left arrives with a different set of time differences. By precisely delaying and combining the signals from multiple mics \u2014 a technique called delay-and-sum \u2014 the processor can make signals arriving from your mouth's direction reinforce each other while signals from other directions partially cancel out. The result is a virtual "beam" of sensitivity pointed at your mouth.
This is also called Environmental Noise Cancellation (ENC), and it's important not to confuse it with the ANC you use while listening. ANC cancels noise in your ears so music sounds cleaner to you. ENC cleans up your outgoing voice so the person on the call hears you clearly. They're solving opposite problems with overlapping hardware, which is why a pair can have superb ANC but mediocre call quality, or vice versa.
Bone conduction: feeling your voice, not hearing it
The cleverest earbuds add a sensor that doesn't listen to the air at all. A voice pickup unit \u2014 usually an accelerometer pressed against your ear \u2014 detects the tiny vibrations that travel through your skull and jawbone when you speak. Crucially, those bone vibrations only happen when you talk; a passing truck doesn't vibrate your jaw. By fusing this bone-conducted signal with the air microphones, earbuds get a noise-immune reference for exactly when you're speaking and what the low frequencies of your voice are doing. This is why some earbuds remain intelligible in howling wind where mic-only designs fail completely.
The wind problem
Wind is uniquely brutal for earbud mics. It isn't sound arriving from a direction \u2014 it's turbulent air physically buffeting the microphone port, producing a low-frequency roar that beamforming can't simply steer away from. Manufacturers fight it with acoustic mesh over the ports, dedicated wind-detection algorithms that switch to the most sheltered microphone, and the bone-conduction trick above. It's why "great call quality" claims should always be tested outdoors, not in a quiet room.
The bandwidth bottleneck nobody mentions
Even with perfect mics and flawless beamforming, calls have historically sounded worse than music for a reason that has nothing to do with capturing your voice: the Bluetooth link itself. As we explain in our guide to Bluetooth profiles, phone calls run over a two-way profile that must carry audio in both directions at once, which forced call audio into a narrow slice of frequencies.
This is why even flagship earbuds can sound muffled on a call: the microphones captured your voice beautifully, but the codec then threw away everything above roughly 3.4 kHz to fit the two-way constraint. Wideband speech ("HD Voice," using the mSBC codec) roughly doubles that ceiling, and Bluetooth LE Audio's LC3 codec pushes toward super-wideband while letting you keep stereo music playing \u2014 the first real fix for the decades-old call-quality penalty.
The new layer: machine learning
The latest earbuds add neural noise suppression \u2014 small machine-learning models, trained on thousands of hours of speech-in-noise, that run on the earbud's DSP and separate voice from background in real time. Unlike beamforming, which relies purely on direction, these models recognize the statistical texture of human speech versus noise, so they can strip out steady hums and even some chatter that's coming from your mouth's direction. It's the same conceptual leap that made phone-camera night mode possible: smarter processing extracting signal that the raw hardware alone couldn't.
How to judge (and improve) call quality
- Test outdoors and in wind. A quiet-room test tells you almost nothing; the hard cases are wind and background chatter.
- Record yourself. Call your own voicemail or use a voice-memo callback to hear what the other side actually gets.
- Confirm a good seal and fit. Earbuds that shift in your ear move the mics relative to your mouth, degrading the beam.
- Look for wideband / LE Audio support if calls are central to your use \u2014 it addresses the bottleneck the mics can't.
- Jabra and the bone-conduction-equipped pairs consistently lead budget call quality; see our best-for-calls pick.
Sidetone: why you shout on calls without realizing it
Plug sealing earbuds into a noisy environment and you'll find yourself talking far too loudly \u2014 because the seal blocks your own voice from reaching your ears, so your brain assumes you're being quiet and compensates. Old landline phones solved this decades ago with sidetone: deliberately feeding a little of your own microphone signal back into the earpiece so you hear yourself at a natural level.
Good earbuds reintroduce sidetone during calls, blending a measured amount of your captured voice back into your ears so you self-regulate your volume and the call feels natural. Earbuds that skip it leave you shouting in coffee shops and straining in quiet rooms. It's a small feature with an outsized effect on how "right" a call feels, and it's almost never mentioned on a spec sheet \u2014 something to listen for when you test a pair.
How many microphones is "enough"?
Marketing loves a big microphone count \u2014 "six microphones for crystal-clear calls!" \u2014 but the number alone is close to meaningless. What matters is mic placement, the quality of the beamforming algorithm, and whether there's a bone-conduction sensor in the mix. Two well-placed microphones driven by a smart DSP routinely outperform six microphones with crude processing. A second consideration: more microphones cost battery and money, and beyond a point add little. Treat mic count the way you'd treat camera megapixels \u2014 a weak proxy for a result that depends far more on the processing behind it. Judge call quality by listening, not by counting, a theme we return to in our buying guide's "specs to ignore" section.
Why call latency feels different from music latency
On a call, audio delay is a two-way problem: a lag that's barely noticeable while watching a video becomes maddening in conversation, because both people are waiting on each other and start talking over the gap. Calls run over the lower-quality two-way profile partly to keep that round-trip delay manageable \u2014 another reason call audio has historically been compressed. Bluetooth LE Audio improves both quality and latency simultaneously, which is why it's such a meaningful upgrade for people who live on calls. We dig into the full latency picture in our dedicated explainer on why sound lags behind video.
How we evaluate call quality
Because a quiet-room test reveals almost nothing, judging real call performance means stacking the deck against the earbuds. The conditions that actually separate good from bad are wind across the microphone ports, steady background noise like a busy cafe or traffic, and your own movement shifting the earbud relative to your mouth. The most honest test you can run yourself costs nothing: call your voicemail or use a callback service from a windy street corner and listen back to the recording. That played-back clip is exactly what the person on the other end hears \u2014 and it's frequently a humbling reality check versus how clear you sounded to yourself through sidetone. It's the method behind every call-quality verdict in our rankings.
The full signal chain, start to finish
It helps to see the whole pipeline your voice travels, because a weakness anywhere in it shows up as a bad call \u2014 and people tend to blame the wrong link. When you speak, several MEMS microphones capture the sound while a bone-conduction sensor independently confirms that the vibration is coming from your own jaw. A beamforming stage combines the mic signals with precise timing to favor your mouth's direction and suppress everything else. A noise-suppression stage \u2014 increasingly a machine-learning model \u2014 then scrubs residual background out of what's left. Only then is the cleaned voice handed to the Bluetooth codec, which compresses it to fit the two-way call profile and sends it to the phone, which relays it across the cellular or VoIP network to the other person.
That last stretch matters: even flawless earbud processing can be undone by a weak cellular signal or a congested Wi-Fi call. So when a call sounds rough, the earbuds are only one suspect among several. The way to isolate them is to compare the same call on the earbuds versus the phone's own microphone in the same spot \u2014 if both are bad, the network is the culprit, not your earbuds.
What good and bad call hardware looks like in practice
In real use, the gap between excellent and poor call earbuds is widest exactly where it's hardest to fake. Strong performers keep your voice intelligible in wind, hold their quality as you walk and the buds shift, and include sidetone so you don't shout. Weak performers sound fine in a silent room and fall apart the moment any of those real-world stresses appear \u2014 which is precisely why manufacturers love to demo call quality indoors. The components that separate the two aren't exotic: thoughtful microphone placement, a bone-conduction reference, competent wind handling, and a modern wideband or LE Audio codec. None of it shows up as a single headline number, which is why call quality remains one of the few earbud attributes you genuinely cannot judge from a spec sheet \u2014 only from listening, ideally to a recording of your own voice from somewhere noisy.
The bottom line
Clear calls are a chain, and the chain is only as strong as its weakest link: tiny omnidirectional mics made directional through beamforming, a bone-conduction sensor confirming when you're really speaking, wind algorithms protecting the ports, machine learning scrubbing the rest \u2014 and then, finally, a Bluetooth codec that historically undid much of that good work by squeezing your voice into telephone bandwidth. As LE Audio spreads, the last weak link is finally being repaired. Until then, if call quality is your priority, buy for the microphones and the codec, not the noise cancellation \u2014 they're different jobs.
Need earbuds that nail phone calls?
Our rankings call out the best pairs for call clarity under $100, including multi-mic and wind-resistant designs.
See the Best for Calls \u2192