Stereo Mic Techniques

A guide to stereo microphone techniques and how they relate to Ambisonic decoding

Background

Every stereo microphone technique makes a tradeoff between three spatial cues: level differences (which ear hears it louder), time differences (which ear hears it first), and spectral differences (how the sound's timbre changes with direction). Coincident techniques like XY and Blumlein rely entirely on level differences. Near-coincident techniques like ORTF add a small time difference. Spaced pairs rely almost entirely on time differences.

The usual way to compare stereo techniques is to set up multiple mic pairs in front of the same source and record simultaneously. This works but introduces variables that have nothing to do with the technique itself: small differences in mic placement, gain matching, room position, and the fact that no two performances are identical. It becomes difficult to hear what the technique is doing versus what the setup is doing.

This tool takes a different approach. It starts from a single Ambisonic recording and decodes it into each stereo technique mathematically. Because every decode starts from the same captured sound field, the technique is the only variable. You can switch from XY to Blumlein to Mid-Side on the same passage of music and hear exactly what each one keeps, emphasizes, or discards, without any of the confounding variables of a physical comparison.

Why Ambisonics Makes This Possible

Ambisonics captures the complete 360-degree sound field at a single point using four channels: an omnidirectional pressure signal (W) and three figure-8 directional signals pointing front-back (X), left-right (Y), and up-down (Z). From these four channels, we can mathematically extract what any coincident microphone pair would have captured, because all coincident techniques encode the same spatial information (level differences from directional patterns) that the Ambisonic channels already contain.1

Technique Type Spatial Cues Ambisonic Decode
XY Pair Coincident Level only Exact
ORTF Near-coincident Level + time Approximated (spacing simulated with delay)
Mid-Side Coincident Level only Exact (direct channel mapping)
Blumlein Coincident Level only Exact (native figure-8 mapping)
Spaced Pair (AB) Spaced Time only Not possible from first-order Ambisonics
Binaural HRTF convolution Level + time + spectral Rendered via HRTF (not yet in tool)

How to Use the Tool

Audio Source

The tool includes sample recordings from the Ambisonic Toolkit and Freesound. These recordings were captured with various Ambisonic microphones and cover a range of material: chamber music, Brazilian percussion, and spoken dialogue. They provide good material for exploring how different stereo techniques reveal (or obscure) spatial characteristics in a recording.

You can also upload your own 4-channel WAV or FLAC file in B-format (first-order Ambisonics). The tool supports both AmbiX (ACN/SN3D) and FuMa channel orderings. Toggle between them with the Channel Format buttons.

  • AmbiX: W, Y, Z, X channel order. The modern standard used by most current Ambisonic software and microphones (Zoom H3-VR, Sennheiser AMBEO, etc.).
  • FuMa: W, X, Y, Z channel order. The legacy format from older Ambisonic tools.

The gain slider adjusts output level from -70 dB to +6 dB. Start at a comfortable listening level.

Technique Selection

Select a technique to hear the Ambisonic recording decoded through that virtual microphone configuration. Each technique button shows its name and a brief description. The polar pattern visualization updates to show the pickup pattern of the selected technique.

Parameters

Some techniques have adjustable parameters:

  • XY: Stereo angle (60-180 degrees) and microphone pattern (cardioid, hypercardioid, or figure-8).
  • Mid-Side: Mid microphone pattern (cardioid, omni, or hypercardioid) and width (0 = mono, 1 = standard, 2 = exaggerated).

ORTF and Blumlein use fixed configurations with no adjustable parameters.

Keyboard Shortcuts

  • Space: Play/pause audio

XY Pair

L R 90° angle, coincident

The XY technique places two directional microphones at the same point (coincident), angled apart symmetrically. The classic configuration uses two cardioids at 90 degrees, though the angle and pattern can vary. Because the capsules occupy the same point in space, there are no time-of-arrival differences between channels: the stereo image comes entirely from the level differences created by each mic's directional pattern.

XY produces excellent mono compatibility (the signals sum cleanly because there are no phase differences) and a stable, well-defined stereo image. The tradeoff is a narrower sense of spaciousness compared to techniques that include time differences. Wider angles increase the stereo spread but can create a "hole in the middle" where sources between the two mic axes receive less combined pickup.

The Ambisonic decode is fully accurate for XY. The virtual mic extraction formula directly maps to the B-format channels without any approximation.

ORTF

17cm L R 110° angle, 17cm spacing

Developed by the French broadcasting organization (Office de Radiodiffusion-Television Francaise), ORTF uses two cardioid microphones angled at 110 degrees with their capsules spaced 17cm apart: roughly the distance between human ears. This spacing introduces small inter-channel time differences (up to ~0.5ms) that complement the level differences from the cardioid patterns.

The combination of level and time cues gives ORTF a wider, more natural-sounding stereo image than pure XY, while maintaining good mono compatibility. It is one of the most popular techniques for orchestral and ensemble recording.

The Ambisonic decode accurately reproduces the 110-degree angle and cardioid pattern. The 17cm spacing is approximated with a fixed inter-channel delay, because the actual spacing creates direction-dependent delays that a single-point Ambisonic recording cannot capture.

Mid-Side (MS)

M S S Mid (forward) + Side (figure-8)

Mid-Side recording uses two coincident microphones with different polar patterns: a forward-facing "mid" mic (typically cardioid, though any pattern works) and a sideways-facing figure-8 "side" mic. The stereo signal is decoded as: Left = Mid + Side, Right = Mid - Side.

The key advantage of MS is continuously variable stereo width after recording. Increasing the side signal's level widens the image; decreasing it narrows toward mono. At zero side level, the output is pure mono from the mid mic. This makes MS popular in broadcast and film, where mono compatibility must be guaranteed.

MS maps directly to the Ambisonic B-format channels. The mid signal is extracted from W and X (the omni and front-back channels), and the side signal is simply the Y channel (left-right figure-8). This is one of the most natural Ambisonic decodes.

Blumlein Pair

L R 90° crossed figure-8s

Named after Alan Blumlein, who invented stereo recording in the 1930s, this technique uses two figure-8 (bidirectional) microphones at 90 degrees. Each mic picks up sound equally from front and rear, but with opposite polarity for the rear lobe.

Blumlein produces a remarkably natural and spacious stereo image with precise localization. The figure-8 patterns capture the full ambient field, including reflections from behind the microphones, which gives recordings a strong sense of the acoustic space. The tradeoff is high sensitivity to room acoustics and a narrower "sweet spot" for listener positioning.

Because Ambisonics uses figure-8 patterns natively (the X and Y channels are figure-8s), the Blumlein decode is among the most accurate: the left and right signals are essentially rotated combinations of X and Y, with no approximation.

Spaced Pair (AB)

30cm - 3m L R Omnidirectional, wide spacing

Spaced pair (also called AB) recording places two microphones (typically omnidirectional) at a distance from each other, usually between 30cm and several meters apart. Unlike coincident techniques, the stereo image comes almost entirely from time-of-arrival differences: a sound source to the left reaches the left microphone before the right. The brain interprets these tiny time differences as spatial position.

Spaced pair is valued for its wide, enveloping stereo image and natural reproduction of room ambience. The tradeoff is poor mono compatibility (the time differences create phase cancellation when summed) and less precise localization than coincident techniques. The wider the spacing, the more spacious but less focused the image becomes.

Why Spaced Pair Is Not in This Tool

First-order Ambisonics captures the sound field at a single point in space. The four B-format channels (W, X, Y, Z) encode how sound arrives from different directions at that one point. From this information, we can reconstruct what any microphone with any polar pattern would pick up at that point, because polar pattern differences produce level differences, which the B-format channels faithfully preserve.

Spaced pair recording, however, depends on what the sound field looks like at two different points in space. The time-of-arrival differences that create the AB stereo image require knowing how the wavefront changes between two positions that could be a meter or more apart. A single-point Ambisonic recording simply does not contain this information. No amount of mathematical processing can recover spatial differences that were never captured.

Could higher-order Ambisonics help?

Higher-order Ambisonics (HOA) captures more spatial detail but still represents the sound field at a single point with increasing angular resolution. It cannot reconstruct the sound field at a distant point. To simulate spaced pair from Ambisonics, you would need either a very high-order capture (which approximates the near-field behavior through spatial interpolation) or multiple spatially separated Ambisonic captures.

Binaural Decoding

L R HRTF simulation (headphones)

Binaural decoding is not yet available in the tool but is planned for a future update.

Binaural audio simulates how sound reaches your eardrums by applying Head-Related Transfer Functions (HRTFs): the frequency-dependent filtering caused by your head, ears, and torso. When heard over headphones, binaural audio creates a convincing illusion of sounds arriving from specific directions in 3D space.

For Ambisonics, binaural decoding convolves the B-format channels with HRTF filters that have been decomposed into spherical harmonics. This is computationally efficient and produces a smooth, natural spatial impression. First-order Ambisonics (4 channels) provides moderate spatial resolution: you can clearly distinguish front from back and left from right, but precise localization of individual sources is limited compared to higher-order recordings.

A head rotation control would let you "turn your head" within the recording. This is a key advantage of Ambisonics over fixed stereo recordings: because the full sound field is captured, the listener can explore different perspectives of the same recording.

The Ambisonic Decode Formula

For coincident virtual mic techniques, the left and right output channels are derived from the B-format signals using the pattern parameter p and the half-angle a:

  • L = p * W + (1 - p) * [X * cos(a) + Y * sin(a)]
  • R = p * W + (1 - p) * [X * cos(a) - Y * sin(a)]

The pattern parameter p controls the microphone's polar pattern: 1.0 = omnidirectional, 0.5 = cardioid, 0.25 = hypercardioid, 0 = figure-8. The W channel provides the omnidirectional component, while X and Y provide the directional components. Z (the vertical axis) is not used for horizontal stereo techniques.

For Mid-Side, the formula uses a different structure: the mid signal combines W and X according to the mid pattern, while the side signal is the Y channel scaled by the width parameter.

Further Reading

  • Bartlett, B., & Bartlett, J. (2013). Practical Recording Techniques (6th ed.). Focal Press. Chapters on stereo miking techniques.
  • Gerzon, M. A. (1975). "The Design of Precisely Coincident Microphone Arrays for Stereo and Surround Sound." Audio Engineering Society Convention 50.
  • Zotter, F., & Frank, M. (2019). Ambisonics: A Practical 3D Audio Theory for Recording, Studio Production, Sound Reinforcement, and Virtual Reality. Springer.
  • Wiggins, B. (2004). "An Investigation into the Real-time Manipulation and Control of Three-dimensional Sound Fields." PhD Thesis, University of Derby.

1 As the table shows, ORTF is a near-coincident technique that relies on both level and time differences, so it cannot be exactly represented from a single-point Ambisonic recording. This tool approximates ORTF by applying a fixed inter-channel delay to simulate the 17cm capsule spacing. The other three coincident techniques (XY, Mid-Side, and Blumlein) decode exactly.