Dedicated to design and performance of databases and audio systems.

Digital Audio--My Take

The other night we had a great conversation going about digital music and how it works. Terms were flying about like SACD and high resolution. Numbers such as 16-bit, 24-bit, 44K, 96K, etc. What does it all mean? And, if digital is so good then why is vinyl the only physical medium that is growing?

Great topics.

First, I won't start any religious war regarding vinyl versus digital. I love them both. A great recording on either medium can sound anywhere from awful to OMG fantastic! It has to do with the recording, mastering, transfer, medium quality (e.g. 180g or 200g vinyl), and playback system components.

Arguments are made that vinyl sounds smooooother. Ah yes. Digital is susceptible to jitter--a distortion introduced with playback timing errors. Vinyl is a physical medium and has limitations with dynamic range relative to digital. One problem is that CDs are encoded at relatively higher volumes than in years past. Ever play an older manufactured CD and notice that you have to turn the volume up somewhat? Ya. That's what the record companies are now doing. The effect? Well now they have limited the CD's dynamic range by starting at an elevated volume.

Vinyl is an analog medium. We humans hear analog waveforms. Digital requires a transformation to analog for us to hear the content. But before you think vinyl requires no translation at all, it does--the venerable phono amp. It must enhance specific frequencies of the audio spectrum. Check out the RIAA wiki page.

One qualification about digital music--I am talking about CD quality and high-resolution. Lossy compressed digital audio, like MP3's and iTunes' AAC's need not apply. Blah. Yuck. These files are created to save storage space at the expense of sound quality. I have many friends at work that tell me they would not be able to tell the difference between an MP3 and a CD track. Wrong. Wrong. Wrong. Perhaps it would be more challenging through cheap earbuds, but on anything else most anyone could hear more depth and detail on a CD versus an MP3/AAC audio file.

MP3's and AAC's are lossy compressed files because they are the result of a computer algorithm that throws away sound data that won't theoretically be missed by the listener. For example, the drummer hits the snare drum and bass drum at the same time. Perhaps the bass drum need not be fully rendered soundwise. Once the data is gone it's gone.

These compressed files have different resolutions typically measured in kilobits per second (kps)--some number of thousands of bits per second. Common volumes are 128kps and 256kps. The latter has twice the information as the former but is still a far cry from a CD's 1,411kps. The result is a four minute CD song consumes roughly 40MB (megabytes) of disk space while the 128kps MP3 is approximately one-tenth the size at 4MB.

To play these files, or burn them to an audio CD, they must be restructured back to a CD size while playing. Again, computer algorithms infer what the data would be at CD resolution, but most of the data was thrown away during the creation of the MP3 file. So, at best it's an approximation. Interpretation can differ between different players.

Now we know, lossy compressed music files are bad. However, there are lossless compression schemes for digital music. FLAC files are one example. Apple's ALAC is another. The latter will compress a CD song down to one-half its original size and can uncompress it back to full resolution while playing it. Nirvana? Well, it depends. Some will say that the computer's has to work harder to uncompress the music instead of just playing it. That can cause small timing issues which can affect overall sound quality. I have never A/B tested this to see if I could tell the difference. Instead, I store my digital music uncompressed; otherwise, how could I sleep at night?

Digital music is a computer file, plain and simple. The data are made up of bits. Bit are switches on the disk. They are either on or off. Two states--on or off. Zeroes and ones. Therefore, data values are based on the exponential power of two--on and off.

A CD uses the Red Book standard which specifies that words are defined as 16 bits. As a side note, bits are usually always grouped in sets of 8--a byte. 8 bits = 1 byte. Here, 16 bits is 2 bytes. With 16 bits we can create 65,536 combinations of sound. That is 2 to the exponential power of 16. One word on a CD is therefore 16 bits.

The standard also calls for there to be 44,100 (a.k.a. 44.1KHz, kilohertz) snapshots of sound captured per second. Like a movie is projected with n number of frames per second to create the illusion of movement the same is true for digital audio. Play many sounds in sequence and sounds appear, echo, decay, dynamically rise, etc.

Previously I said that a CD's resolution was 1,411kbps. Let's some quick math. 16-bit words * 44,100 snapshots * 2 channels (i.e. left speaker and right) = 1,411,200. That means 1.4 million bits of data are processed for every second a CD plays. If we divide that number by 1,000, to get our kilo, then we divide 1,411,200 by 1,000 and produce a result of 1,411kps.

We know in order to hear the sound we need a process to translate those digits, those bits, into analog waveforms for our ears. This is where a digital-to-analog converter (DAC) comes in. If you have a CD player, it has a built in DAC. DVD player? Yes. A television? Yes. Computer? Yes. Cell phone? Yes. iPod? You guessed it, yes. All of these things process digital signals into analog sounds for us.

For DVDs we many times output the digital signal into a surround sound amp. The DAC quality in the amp is typically many times better quality than that of the DVD player. Some CD players allow for the digital signal to be output to an external DAC.

DAC quality is based on a number of factors. Many also use algorithms to over-sample or up-sample the data. Up-sampling is where an algorithm takes the data and infers what it believes it would have looked like if it was a higher resolution. For example, the DAC may take the 16-bit / 44.1KHz CD track and project it as a 24-bit / 96KHz track and then create an analog waveform from that. Think of it like Photoshop blowing up a picture to poster size and using an algorithm to smooth out the graininess.

When the music digital word length changes from 16 bits to 24 (remember, bytes are sets of 8 bits), our pallet of 65,536 combinations of sound now total 16,777,216. Kind of like moving from the Crayola Crayon box of 8 to 64 with the sharpener (thanks Steve). And, as Dave points out, listeners more identify the higher bit length impact than a sampling increase from 44.1KHz to 88.2KHz or 96KHz.

Rumors have it that Apple purchased the Dr. Dre Beats headphone line as part of a higher resolution music strategy. Apple initially sold songs as AAC format stating that their sound quality was superior to MP3 when both were at 128kps. A couple of years ago they doubled the resolution to 256kps. An increase in sound quality to be sure, but still far short of CD's 1,411kps.

The Beats headphones are rumored to be moving to a new connector similar to the iPhone5/6's in support of higher resolution. I have read that the iTunes store will ultimately sell 24-bit / 48KHz songs. Supporting Dave's statement, the key factor here is the larger word length. Notice that the sampling rate is only marginally larger than 44.1KHz.

Web sites are now appearing that offer high-resolution downloads. HDtracks and Pro Studio Masters are two of the larger ones. They are becoming more popular all the time. A lot of classic rock and even new material is available. The bit rates are mostly all 24-bit, but the sample rates can vary greatly--44.1KHz, 48KHz, 88.2KHz, 96KHz, 192KHz, etc. Most new equipment can handle sample rates up to 96KHz. To play the levels greater than that requires specific equipment designed to resolve those sampling frequencies.

If we have a 24-bit / 96KHz audio track we are now processing 4,608kps (24-bit * 96,000 samples * 2 channels), or more than 3.27 times the information in a CD track. And, 18 times more information than an iTunes 256kps song. Needless to say, storage of 24-bit / 96KHz tracks requires far more disk space.

So what does this all mean to sound quality? Typically, you hear more detail, more clarity in the midrange, and more space around the instruments. Is it worth it? That would be up to your tastes.

Recently I had a conversation with Brian Zolner. He is the owner of Bricasti, maker of a very expensive DAC. Expensive, but incredible sounding. I was babbling on about how much I was liking high-resolution audio. The midrange, the openess, the detail, the clarity. Babble, babble babble. Brian was gracious. He listened to me and then asked, Are you sure the difference you hear is just not due to the filter? Zoinks! That moment. Everything I assumed I knew for sure was not necessarily fully true.

Here's the scenario. Humans can hear audio frequencies from 20Hz (very low bass) to 20KHz (very high treble). Most people cannot hear near 20KHz, but many can. There is a phenomenon with sound that mirrors our visual. When we watch video of a car rolling down the road, at some speeds the wheels appear to be rotating backwards. We know that isn't truly the case. It is the effect of the speed of the wheel and the number of filmed frames per second. With audio it's the halfway point of the sampling rate relative to the sound frequency.

Let's illustrate: a CD samples (takes snapshots) at 44.1KHz. We can hear an audio high of 20KHz. Half of the sampling rate would be 22.05KHz (i.e. 44.1KHz / 2). Therefore, the difference between our high-end of hearing (20KHz of audio wave frequency) is very close to the half audio sampling frequency of 22.05KHz. What happens if the two were to meet? The audio equivalent of the car wheel appearing to go backwards on the screen. In the audio world it would sound like something was not correct. And, we don't like that!

The challenge, therefore, is to create an audio filter in the DAC that lets it produce audio frequencies up to 20KHz, but cut it off sharply by 22.05KHz. But, if we can't hear above 20KHz, why do we worry about 22.05KHz? Some higher frequencies can affect lower ones and what we hear. Plus, we don't want to roll off the high end of our tunes to be sure not to let 22.05KHz signals through. The quality of the filter is a key component.

If we have a higher sampled music track--say, 96KHz--then our filter does not need to roll off sound until 48KHz of audio frequency, way beyond what we humans can hear. What it does is let that filter slope be very gradual from 20KHz audio frequency to 48KHz. When Brian challenged me that maybe it was the filter now it makes sense.

I was still a little skeptical, but hey, he's the genius, this is his life's work, and I have heard the result through his DAC. Amazing!

Overall, am I completely satisfied? No. Why not? Because I hear some of the flaws of certain tracks--sometimes the vocals can be not forward enough while the instrumentation is fully separated beautifully so. Is it the track? The playback software? The DAC? Should I try an optical cable instead of a USB? So many more tests to go. It's a journey.

No. It's a sickness and I've got it bad. LOL

Till next time.