Jordan Santell immersive engineer

A Primer in Web Audio

This article originally appeared in net magazine issue 254 (online), published September 29th 2014. Source code and the completed demo are available.

Original article printed in net magazine

"A Primer in Web Audio" in net magazine issue 254.


Every AudioContext instance has a destination property: a special audio node representing a computer's audio out, and all our audio nodes have a connect method enabling us to pipe audio data from one to another. Let's connect our audio element node to the destination node, like a guitar cable to an amp.

The Web Audio API is powerful and can be used for real-time audio manipulation and analysis, but this can make it tricky to work with. Using a routing graph to pipe audio from node to node, it's different from other web APIs – and is a tad daunting the first time one approaches the specification. In this piece we'll walk through a bare-bones example of manipulating audio in real time, then visualise the audio data.

Plug and play

Imagine you're a guitarist for your favourite band: sound comes from your guitar, via effects pedals (you want to sound heavy, right?) and into your amplifier so it can be heard. The Web Audio API works similarly: data travels from a source node, via transformation nodes, and ultimately out of a computer's sound card.

Everything in web audio starts with instantiating an AudioContext instance. The AudioContext can be thought of as our audio environment, and from the context we create all of our audio nodes. Because of this, all audio nodes live inside an AudioContext, which becomes a network of connected nodes.

First we'll need a source node from where the audio originates. There are several types of source nodes, such as MediaElementAudioSourceNode for using an <audio> element, a MediaStreamAudioSourceNode for using live microphone input with WebRTC, an OscillatorNode for square and triangle waves, or an audio file with an AudioBufferSourceNode. In this case, we'll use a MediaElementAudioSourceNode to leverage the native controls the <audio> element gives us.

Here we'll be prepared for prefixed object names, and support different audio codecs. You may need to serve this page from a server so appropriate headers for the audio files are set. Let's create an <audio> element, instantiate an AudioContext and hook a MediaElementAudioSourceNode to our <audio> element:

<body>
  <audio id="audio-element" controls="controls">
    <source src="audio/augment.mp3" type="audio/mpeg;">
    <source src="audio/augment.ogg" type="audio/ogg; codecs=vorbis;">
  </audio>
  <script>
    var ctx = new (window.AudioContext || window.webkitAudioContext)();
    var audioEl = document.getElementById("audio-element");
    var elSource = ctx.createMediaElementSource(audioEl);
  </script>
</body>

The <audio> element's audio is being pumped into our web audio context. So why can't we hear the track when we click play? As we're re-routing the audio from the element into our context, we need to connect our source to the context's destination. It's as if we're playing an electric guitar that's not plugged into an amp.

Audio controls and frequency slider

Our simple <audio> tag and slider controlling our audio nodes.

Every AudioContext instance has a destination property: a special audio node representing a computer's audio out, and all our audio nodes have a connect method enabling us to pipe audio data from one to another. Let's connect our audio element node to the destination node, like a guitar cable to an amp.

elSource.connect(ctx.destination);

Now when we hit the playback, we have the <audio> element node being pumped into the destination node, and can hear the song. But that's no different to using the <audio> tag on its own. Let's add a filter node to play with the audio track's frequencies. First, we'll control the filter with a slider and a display so we can see what the value is:

<input id="slider" type="range" min="100" max="5000" value="100" />
<div id="freq-display">100</div>

Now we must create a BiquadFilterNode: our filter and functions, like an equalizer in web audio. As with all nodes we make, we instantiate our filter from the AudioContext. Previously, we connected our MediaElementAudioSourceNode directly to our context's destination. Now we want our source node to connect first to our filter, and our filter to the destination, so our filter node can manipulate the audio being passed through it. We should have something like this now:

var ctx = new (window.AudioContext || window.webkitAudioContext)();
var audioEl = document.getElementById("audio-element");
var elSource = ctx.createMediaElementSource(audioEl);
var filter = ctx.createBiquadFilter();
filter.type = "lowpass";
filter.frequency.value = 100;
elSource.connect(filter);
filter.connect(ctx.destination);

Filter nodes have several kinds of frequency filtering, but we'll use the default "lowpass" – only frequencies lower than the frequency can pass through. Give it a listen: notice how it sounds kind of like there's a party next door. Only frequencies below 100hz are going to the destination, so we're only hearing the bass. Let's hook up our slider so we can manipulate this filter.

var slider = document.getElementById("slider");
var freqDisplay = document.getElementById("freq-display");
// Cross-browser event handler
if (slider.addEventListener) {
  slider.addEventListener("change", onChange);
} else {
  slider.attachEvent("onchange", onChange);
}

function onChange () {
  // Update the filter node's frequency value with the slider value
  filter.frequency.value = slider.value;
  freqDisplay.innerHTML = slider.value;
}

With our slider able to control our filter node's frequency, we can manipulate our audio playback in real time. To get a better look at the data being passed around, we can visualise the frequency data of the signal. While our filter node actually modified the sound coming out of our speakers, we'll use two new nodes to analyse the signal rather than affecting it. Let's create an AnalyserNode and a ScriptProcessorNode:

var analyser = ctx.createAnalyser();
var proc = ctx.createScriptProcessor(1024, 1, 1);

The createScriptProcessor (formerly createJavaScriptNode) method takes three arguments. The first is the buffer size, which must be a power of 2, and the number of inputs and outputs are the remaining arguments accordingly. With a processor node, we can schedule an event to be fired when enough audio is processed. If our buffer size is 1024, that means every time 1024 samples are processed, our event will fire.

Our AudioContext sample rate is 44100Hz (44100 samples to process per second) and our event will fire every 1024 samples – every .023 seconds, or 43 times a second. Keeping this simple, our processor node enables us to hook into a callback when new audio data is processed. We'll use this to draw to a canvas shortly.

Using our script processor as a callback hook, we need our AnalyserNode to get the data. But our audio signal has to pass through both analyser and processor: let's alter our routing and send our filtered signal's output into the analyser, then the analyser connect to our audio destination and script processor:

elSource.connect(filter);
filter.connect(analyser);
analyser.connect(proc);
filter.connect(ctx.destination);
proc.connect(ctx.destination);

A WebKit bug means we must connect the processor's output back into the destination to receive audio processing events, which we will use in a moment.

Directed graph of audio nodes

An illustration of our current audio graph. The signal travels from source to destination, passing through all AudioNodes.

Now our audio routing is set up, let's hook into that processing event I mentioned earlier. We can assign a function to a ScriptProcessorNode's onaudioprocess property to be called when our buffer is full of processed samples, and use our AnalyserNode's ability to get the signal's raw frequency data.

Frequency control

The analyser's getByteFrequencyData populates an Uint8Array with the current buffer's audio data, in this case, as values between 0 and 255. We reuse the same array on every call so we don't need to keep creating new arrays.

// Make the Uint8Array have the same size as the analyser's bin count
var data = new Uint8Array(analyser.frequencyBinCount);
proc.onaudioprocess = onProcess;
function onProcess () {
  analyser.getByteFrequencyData(data);
  console.log(data[10]);
}

Presently we're just printing out the tenth element of the audio data on every process (not very interesting) but we can see our processor event being fired with our analyser interpreting the data being passed through it. This would be way more exciting to draw to a canvas. Add a <canvas> element with an id of "canvas", width of 1024 and height of 256 – let's look at the drawing code we'll add to our onProcess function.

var canvas = document.getElementById('canvas');
var canvasCtx = canvas.getContext('2d');

function onProcess () {
  analyser.getByteFrequencyData(data);
  canvasCtx.clearRect(0, 0, canvas.width, canvas.height);
  for (var i = 0, l = data.length; i < l; i++) {
    canvasCtx.fillRect(i, -(canvas.height/255) * data[i], 1, canvas.height);
  }
}

Now an event is called whenever an audio buffer is processed, thanks to our processor node, as our analyser node populates an Uint8Array with audio data, and we clear the canvas and render the new data to it in the form of frequency bins, each representing a frequency range. As we scrub our filter controls back and forth, we can see in the canvas that we're removing high frequencies as we move our slider left.

We can see that once we connect audio nodes, we can manipulate sounds we hear and visualise the audio data. This example uses a simple biquad filter node, but there are convolvors, gains, delays and other ways to alter audio data for musicians, game developers or synthesizer enthusiasts. Get creative and play with the Web Audio API, as the lines blur between audio engineers and web developers.

FFT visualization

Our filter controlling what we hear and see. We can add some color to the canvas too!

Sidebar: Cross-Browser Compatibility

As is the way for a lot of browser APIs, there are some caveats that need to be noted to ensure using the web audio API works consistently across all browsers that support it. First landed in Chrome 14, other Webkit based browsers eventually inherited the support for the web audio API: Opera 15, Safari 6 and iOS Safari 6. Firefox landed a Gecko-implementation of the web audio API in version 25, and as of this time, Internet Explorer has no plans to support the API.

Several browser icons

Even though most major browsers support this API, the last few years have seen a few API changes, as the specification approaches being finalized. The main object in the web audio API, the AudioContext, is prefixed in all webkit implementations (webkitAudioContext), whereas Firefox's implementation is unprefixed. As of writing, there's a patch going through Chromium to unprefix AudioContext, which will help in the future, but for now, some good old feature detection is necessary.

var context = new (window.AudioContext || window.webkitAudioContext)();

In addition to the prefixing, there has been some renaming of APIs for consistency and simplicity. Some methods on the AudioContext have been changed from a more verbose createDelayNode into createDelay, so that all the creation methods no longer need a mention of "node".

Some nodes have changed names (JavaScriptNode to ScriptProcessorNode) and some methods have changed (source nodes now have start and stop, rather than noteOn and noteOff), and current webkit implementations feature both modern and deprecated methods. To play it safe, you can use feature detection the same way as the prefixed AudioContext, or use a work around, like Google's Chris Wilson's Audio-Context Monkey Patch.

Similar to other audio and video usages in browsers, there is no one single format that all browsers support. Make sure you check which codecs browsers support, but having Ogg Vorbis and MP3 will cover all of your bases at this time.

Sidebar: Web Audio Resources

There have been many awesome demos and resources since the web audio API became available, and just looking at documentation or tutorials doesn't really illustrate all the cool things you can with web audio.