VO Studio Tech: It’s All About The Bits

A new wrinkle in microphone design - The Rode NT1 Generation 5 has both a USB and XLR connection option. In USB direct connected mode, it records in 32 bit floating point.
A new wrinkle in microphone design – The Rode NT1 Generation 5 has both a USB and XLR connection option. In USB direct connected mode, it records in 32 bit floating point.

When talking about audio recording, a few different numbers begin flying around. Lately, there’s been a buzz about “32 bit” recording tools. Is this important to us as voice actors? Before we can get to that, it’s helpful to understand what all these numbers mean.

Loudness and Peak are measured in dB’s

I’ve written before about the idea of average Loudness in audio recordings, which is different than the Peak value of a single wave form. In both cases we measure with a negative decibel scale. Audio with more acoustic energy gets closer to 0 dB, in other words, a sound that is -18 dB RMS is perceived as louder than one which is -48 dB RMS. 

Sample Rate Defined

Another number used in digital recording is “Sample Rate”. Sample Rate is a value for the number of times per second an audio interface converts incoming analog signal into digital information our computers can deal with. These days Sample Rates can be 44,100 times per second or higher. That means each second of audio has 44,100 “samples” which correspond to that conversion process, with the unit of measure being “Hertz” (cycles per second), abbreviated as “Hz.” 

Sample rate determines something called the Nyquist Frequency, the highest pitch that can be recorded. Theoretically, the Nyquist Frequency is half the Sample Rate. If you have a sample rate of 44,100 Hz, then you can capture frequencies up to 22,050 Hz. Since most of us don’t produce sound in those frequencies, and even fewer of us can hear sounds anywhere near that frequency, it works just fine for recording spoken word performances. The fundamental frequency range for the human voice is roughly between 60 Hz and 300 Hz. 

What does “Bit Depth” mean when recording audio?

The final piece of the puzzle occurs with something called “Bit Depth.” You can think of this as resolution, or the amount of precision with which you can measure a value. For example, if you had a ruler that only had marks in one inch increments, any measurement that fell between those two marks would have to be rounded to the nearest one. The lower the Bit Depth, the more each value is approximated when measured. This approximation occurs in any digital conversion process – photography being one most of us are familiar with. Higher Bit Depth in a digital camera provides greater image detail and color accuracy.

Using low Bit Depth in audio introduces noise, which sounds like static or “hiss.” Interactive Voice Response (IVR) audio, found on most phone systems, infamously uses low Bit Depth (typically 8 bit) so that the audio contains audible noise which was not in our original recordings. An 8 bit recording has an inherent noise floor of 48 dB below the loudest signal it can capture. That’s audible to most people. 

Increasing audio recording resolution to 16 bit drops the digital noise floor to 96 dB, while using 24 bits drops the noise floor 144 dB below the maximum signal level. In both cases, that’s going to be lower than anything that matters to us. Most voice actors are trying to get their actual noise floor below -60 dB when recording. The digital part of the equation is not the limiting factor in recording, 16 or 24 bit systems should not be adding audible noise to your recordings. 

Capturing dynamic range in performance is key

But, it’s not all about the noise. There’s another variable we need to capture in digital audio recording: the dynamic range of our performance. Dynamic range is the difference between the loudest and quietest parts of our performance. That’s why I recommend using 24 bit recording for a raw performance. That allows us to capture a wider dynamic range within those 24 bits of resolution. With 24 bit, we can record with conservative input levels and then make those recordings much louder later (i.e. “Competitively Loud”) without running into the inherent noise of the recording system. 

The benefit of using 32 bit floating point when recording

An interesting thing happens when we further increase the bit depth. 32 bit (technically 32 bit floating point) could allow us to capture a ridiculously high dynamic range of 1528 dB. Given that value is well in excess of the loudest sound on earth, there’s not a lot of benefit. Instead, we can use that increased ability to store data to represent the shape of the audio wave separately from the volume of the wave. 

In other words, 32 bit float can allow you to control the input gain of a recording after you have recorded it. That is very interesting. To put it into even simpler terms: you will not clip your recordings, no matter how loudly you perform them. The original input gain setting simply does not matter. 

Does this matter for voice actors? Maybe. Anyone who has ever given a full volume performance at the mic knows that sinking feeling of exiting the booth to find their recordings blew well past the -0 dB level. It means you are resetting input gain and going again. However, if you can set those levels after the fact…? That sounds like sorcery!

Of course, how a manufacturer makes that happen matters.

Next week, I’ll talk about one approach that looks very interesting…


Want to receive these resources (and more!) the day they come out?
Each week I send out a new article to my email community focused on creatively using recording technology in your home voiceover studio, and how to balance those technical challenges with the need to be brilliant behind the microphone.
If you would like to join in to receive those emails the day they publish, please take a moment to share your contact information through this sign up form.
Thank you!

If this resource has been helpful to you, please consider sharing it with one of the buttons below!

Leave a Reply

Your email address will not be published. Required fields are marked *