Digital Audio Theory Essay Example | Topics and Well Written Essays

Digital Audio Theory: Analog-to-Digital Conversion & Advanced Audio Coding Outline Quantization: what is it 2. The sampling theorem and the Nyquist frequency 3. Antialias filters 4. Advanced ADC 5. AAC scheme Usually audio signals (speech, music, echoes, noise, etc.) are continuous. Analog-to-digital conversion (ADC) is the process that allows digital electronic systems to interact with these continuous (i.e. analog) signals. Digital audio is different from its continuous counterpart in two important respects: it is sampled, and it is quantized. Both of these restrict how much information a digital audio signal can contain. Hence, it is necessary to understand what information needs to be retained, and what data one can afford to lose. It means that is necessary to properly select the sampling frequency, number of bits, and type of analog filtering needed for converting between the analog and digital audio signals. 1. Quantization: what is it Let us study quantization concept by following example. Fig. 1 shows the electronic waveforms of a typical ADC. Here, figure 'a' is the analog audio signal to be digitized. The block diagram includes two sections, namely the sample-and-hold (S/H), and the analog-to-digital converter (ADC). In fact, the sample-and-hold is required to keep the voltage entering the ADC constant while the conversion is taking place. Moreover, breaking the digitization process into these two stages is an important theoretical model for understanding digitization (Smith 1999). As shown by the difference between 'a' and 'b', the output of the sample-and-hold is allowed to change only at periodic intervals, at which time it is made identical to the instantaneous value of the input audio signal. Changes in the input signal that occur between these sampling times are completely ignored. That is, sampling converts the independent variable (i.e. time) from continuous to discrete. Fig. 1 - Process of digitizing audio signal (Smith 1999) Then, as shown by the difference between 'b' and 'c', the ADC produces an integer value for each of the flat regions in 'b'. So, quantization converts the dependent variable (i.e. voltage) from continuous to discrete. This introduces an error, since slightly different signals will be converted into the same digital number. It is essential, that the sampling and quantization degrade initial audio signal in different ways, as well as being controlled by different parameters in the electronics. Let us consider the effects of quantization. Any one sample in the digitized signal can have a maximum error of LSB (least significant bit). The quantization error 'd' can be found by subtracting 'b' from 'c'. It means that the digital audio output 'c' is equivalent to the continuous input 'b' plus a quantization error 'd'. So, the quantization error can be interpreted as somewhat like random noise. In other words, quantization results in the addition of a specific amount of random noise to the signal. This additive noise is uniformly distributed between LSB, has a mean of zero, and a standard deviation of LSB. Example. Passing an analog audio signal through an 8 bit digitizer adds an rms noise of , or about of the full scale value. A 12 bit conversion adds a noise of: , or about . A 16 bit conversion adds , or about . It is obvious, that the number of bits determines the precision of digitized audio data. Digitizing this same signal to more bits would produce virtually no increase in the noise, and almost nothing would be lost due to audio signal quantization. How many bits are needed It depends from how much noise is already present in the analog signal, and also from how much noise can be tolerated in the digital signal (Smith 1999). Small amplitude or slowly varying audio signals require enormous bit-depth for digitization. This problem can be solved by so-called dithering technique. At this case, a small amount of random noise is added to the analog audio signal. Appropriate circuits for dithering can use PC to generate random numbers, and then pass them through a DAC to produce the added noise. After digitization, PC subtracts the random numbers from the digital signal using floating point arithmetic; this is so-called subtractive dithering (Katz 2002). 2. The sampling theorem and the Nyquist frequency A continuous signal is sampled properly if the samples contain all the information needed to recreate the original waveform. Otherwise, so-called aliasing effect appears, i.e. when certain harmonics of signal change their frequencies during sampling. Since aliasing corrupts the information, the original signal cannot be reconstructed from the samples. In other words, aliasing is caused by improper sampling. The sampling theorem, or the Shannon sampling theorem, or the Nyquist sampling theorem, indicates that a continuous signal can be properly sampled, only if it does not contain frequency components above one-half of the sampling rate. Example. A sampling rate of 20000 samples per second requires the analog audio signal to be composed of frequencies below 10 kHz. If frequencies above this limit are present in the signal, they will be aliased to frequencies between 0 and 10 kHz, combining with correct information about audio signal and distorting it. The Nyquist frequency, or the Nyquist rate means one-half the sampling rate. Example. Consider an analog audio signal composed of frequencies between 0 and 30 kHz. To properly digitize this signal it must be sampled at 60000 samples/sec or higher. We can choose to sample at 80000 samples/sec, allowing frequencies between 0 and 40 kHz to be properly represented. Here, the Nyquist frequency is one-half the sampling rate, or 40 kHz. It is essential, that a digital audio signal cannot contain frequencies above one-half the sampling rate, i.e. the Nyquist frequency. When the analog signal's frequency is above the Nyquist rate, aliasing changes the frequency into something that can be represented in the sampled data. Moreover, aliasing can also change the phase during sampling (Smith 1999). Fig. 2 - To illustrate the sampling theorem in the time and frequency domains (Smith 1999) Let us consider the essence of audio sampling and aliasing (see Fig. 2). Here, figures 'a' and 'b' show an analog audio signal and its frequency spectrum which is composed only of frequency components between 0 and , the sampling frequency. Then, the analog signal is sampled by converting it to an impulse train 'c'. The impulse train is an abstract signal consisting of a series of narrow impulses that match the original signal at the sampling instants. In the frequency domain, such sampling results in the spectrum 'd'. Since the original frequencies in 'b' exist undistorted in 'd', proper sampling has taken place. Indeed, an analog low-pass filter can eliminate all frequencies above and therefore impulse train 'c' can be converted back into the original analog signal 'a'. Figure 'e' illustrates improper sampling. Due to scanty sampling rate, aliasing appears and the sidebands in 'f' overlap. Overlapping frequencies add together and form a single confused mess. Since there is no way to separate the overlapping frequencies, information is lost, and the original audio signal 'a' cannot be reconstructed. This overlap occurs when the analog signal contains frequencies greater than the Nyquist frequency. The amount of information carried in a digital audio signal is limited by two parameters (Watkinson 1995, Zolzer 1999). The number of bits per sample limits the resolution of the signal value whereas the sampling rate limits the resolution in time. Indeed, small changes in the signal's amplitude may be lost in the quantization noise whereas closely spaced peaks in the analog signal may be lost between the samples at sampling. On the other hand, analog signals are also limited by noise and bandwidth. But the sampling theorem guarantees that both an analog signal formed from frequencies between 0 and kHz and a digital signal sampled at kHz will have exactly the same resolutions (Smith 1999). 3. Antialias filters In accordance with the sampling theorem (see above), before encountering the ADC, the input audio signal must be processed with an electronic low-pass filter to remove all frequencies above the Nyquist frequency (see Fig. 3). The aim of this action is to prevent aliasing during sampling. Fig. 3 - Where and how to use antialias filter So, antialias filters are used to remove frequency components above one-half of the sampling rate that would alias during the sampling. The characteristics of digitized audio signal will depend on what type of antialias filter was used before ADC. There are three main types of analog filters, namely Butterworth, Chebyshev, and Bessel filters. The complexity and quality of each filter depend of the number of so-called poles and zeros. Let us consider key performance parameters of these filters. The first performance parameter is cut-off frequency sharpness. A low-pass filter is designed to block all frequencies above the cut-off frequency (the stopband), while passing all frequencies below (the passband). Each type of antialias filters has its own frequency response showing its performance. On this criterion, Chebyshev filter is better than the Butterworth filter whereas the last one is better than the Bessel filter. As a rule, the frequency band between about 0,4 and 0,5 of the sampling frequency is an unusable due to filter roll-off and aliased signals. This is a direct result of the limitations of analog filters (Roccesso 2003). Example. Let there is a 12 bit system sampling at 10000 samples per second. In accordance with the sampling theorem, any frequency above 5 kHz will be aliased. So, it is necessary to avoid such frequencies by using antialias filter. Let us suppose that all frequencies above 5kHz must be reduced in amplitude by a factor of 100. Detailed analysis shows that an 8 pole Chebyshev filter, with a cut-off frequency of 1Hz, doesn't reach an attenuation of 100 until about 1,35 Hz. Therefore, the filter's cut-off frequency must be set to 3,7kHz so that all frequencies above 5 kHz will have the required attenuation. Hence, the frequency band between 3,7 kHz and 5 kHz will be wasted on the inadequate roll-off of the analog filter (Smith 1999, Dunn 2004). Then, the frequency response of the low-pass filter must be flat across the entire passband. This is second performance parameter of antialias filters. Usually, some passband ripples exist, i.e. there are slight variations in the amplitude of the passed frequencies. For instance, the Chebyshev filter obtains its excellent roll-off by allowing passband ripple; when more passband ripple is allowed in a filter, a faster roll-off can be achieved. In comparison, the Butterworth filter is optimized to provide the sharpest roll-off possible without allowing ripple in the passband. It is commonly called the maximally flat filter, and is identical to a Chebyshev filter designed for zero passband ripple. The Bessel filter has no ripple in the passband, but the roll-off is far worse than the Butterworth (Dunn 2004, Zolzer 1999). The last parameter is the step response, indicating how the antialias filter responds when the input rapidly changes from one value to another. On this criterion, the Butterworth and Chebyshev filters overshoot and show ringing, i.e. slowly decreasing oscillations. In comparison, the Bessel filter has neither of these disadvantages. Hence, each of antialias filters is designed to optimize certain performance parameters. The Chebyshev filter optimizes the roll-off, the Butterworth filter optimizes the passband flatness, and the Bessel filter optimizes the step response. So, how to select the most suitable antialias filter Audio signals are certain combinations of harmonics, so they can be considered from the frequency domain encoding point of view, not time domain encoding. How to use this feature It is well known that the perceived sound depends on the frequencies present, and not on the particular shape of the waveform. In fact, audio signals with completely different waveforms can sound almost equally. Since aliasing misplaces and overlaps frequency components, it directly destroys information encoded in the frequency domain. Hence, digitization of audio signals usually involves an antialias filter with a sharp cut-off, such as a Chebyshev, or Butterworth (Smith 1999, Katz 2002, Roccesso 2003). Indeed, at this case the step response is not essential at all as it can not reflect in sound perception. 4. Advanced ADC Some audio digitization systems use so-called multirate techniques, i.e. they use more than one sampling rate per system. Example. Let the voice signal passes through a simple low pass filter and let the data are sampled at 64 kHz. The digital data contain the desired voice band between 100 and 3000 Hz, but also have an unusable band between 3 kHz and 32 kHz. So, we remove these unusable frequencies by using a digital low-pass filter at 3 kHz. Then, we resample the digital signal from 64 kHz to 8 kHz by simply discarding every seven out of eight samples (so-called decimation). The resulting digital data is equivalent to that produced by aggressive analog filtering and direct 8 kHz sampling (Smith 1999). Multirate data conversion is valuable for two reasons. Firstly, it replaces analog components with software, which is an economic advantage. Secondly, it can achieve higher levels of performance in critical applications. Example. CD audio systems use multirate techniques to achieve the best possible sound quality. This increased performance is a result of replacing analog components (1% precision), with digital algorithms (0.0001% precision from round-off error) (Manquen 1991). So-called single bit ADC is a discretization technique used in high fidelity music reproduction. This is multirate technique where a higher sampling rate is traded for a lower number of bits. In the extreme, only a single bit is needed for each sample (Manquen 1991, Kahrs & Brandenburg 2002). Most of single bit ADC systems are based on the use of so-called delta modulation. Let us consider the characteristics of the delta modulated output signal (see Fig.4). If the analog input is increasing in value, the output signal will consist of more ones than zeros. Likewise, if the analog input is decreasing in value, the output will consist of more zeros than ones. If the analog input is constant, the digital output will alternate between zero and one with an equal number of each. Put in more general terms, the relative number of ones versus zeros is directly proportional to the slope (derivative) of the analog input. Fig. 4 - Generation of delta modulated output signal (Smith 1999) This method allows transforming an analog audio signal into a serial stream of ones and zeros for transmission or digital storage. It is important that all the bits have the same meaning, unlike the conventional serial formats. This method is limited by the unavoidable tradeoffs between maximum slew rate, quantization size, and data rate. For instance, if the maximum slew rate and quantization size are adjusted to acceptable values for voice communication, the data rate ends up in the MHz range. For comparison, conventional sampling of a voice signal requires only about 64 Kbits per second. Alternatively, it is possible to use so-called delta-sigma modulation. Here, if the input voltage is positive, the digital output will be composed of more ones than zeros; if the input voltage is negative, the digital output will be composed of more zeros than ones; if the input signal is equal to zero volts, an equal number of ones and zeros will be generated in the output. So, the relative number of ones and zeros in the output is related to the level of the input voltage, not the slope as for delta modulation. Example. It is easy to use delta-sigma modulation for digitizing audio signals. Let us form a 12 bit ADC by feeding the digital output into a counter, and counting the number of ones over clock cycles. In this case, a digital number of 4095 would correspond to the maximum positive input voltage; digital number 0 would correspond to the maximum negative input voltage, and 2048 would correspond to an input voltage of zero (Smith 1999). Unfortunately, delta-sigma ADC's have several disadvantages, e.g. it is difficult to multiplex their inputs. Moreover, each acquired sample is a composite of the one bit information taken over a segment of the input signal. However, this is not a problem for audio signals encoded in the frequency domain (Smith 1999, Katz 2002, Dunn 2004). 5. AAC scheme Advanced Audio Coding (AAC) is a lossy compression and encoding scheme for digital audio. It is based upon a wideband encoding algorithm that exploits two key approaches to reduce the amount of data needed to represent high-quality digital audio. Firstly, signal components that are perceptually irrelevant are discarded. Secondly, redundancies in the coded audio signal are eliminated (Noll 1997). In general, AAC offers sampling frequencies between 8 kHz and 96 kHz and any number of channels between 1 and 48. Additionally, there are numerous features of AAC that make it flexible and powerful. For instance, the audio signal is processed by a modified discrete cosine transform according to its complexity, i.e. AAC encoders can switch dynamically between a single encoding block of length 1024 points or 8 blocks of 128 points. By default, the longer 1024-point block is used; this results in increasing frequency resolution and then improving coding efficiency. Also, internal error correction codes are used. Moreover, AAC supports arbitrary bitrates. AAC takes a modular approach to encoding. Depending on the complexity of the bitstream to be encoded, the desired performance and the acceptable output, it is possible to create profiles to define which of a specific set of features can be used. The standard offers four default profiles, namely Low Complexity (LC), Main Profile (MAIN), Sample-Rate Scalable (SRS), and Long Term Prediction (LTP) (Noll 1997). This allows tuning AAC encoders for particular applications. Bibliography Dunn, J 2004, Measurement techniques for digital audio, Audio Precision, New York. Kahrs, M & Brandenburg, K (ed.) 2002, Applications of digital signal processing to audio and acoustics, Kluwer Academic Publishers, New York. Katz, B 2002, Mastering audio: the art and the science, Focal Press, Oxford. Manquen, D 1991, 'Digital recording and playback', in Ballou, G (ed.) Handbook for sound engineers, Howard W. Sams & Co, Indianapolis, pp. 1081-1100. Noll, P 1997, 'MPEG Digital audio coding', in IEEE Signal Processing Magazine, September 1997, pp. 59-81. Roccesso, D 2003, Introduction to sound processing, Phasar SRL, Firenze. Smith, SW 1999, 'ADC and DAC', in The scientist and engineer's guide to digital signal processing, California Technical Publishing, San Diego, pp. 35-66. Watkinson, J 1995, An introduction to digital audio, Focal Press, Oxford. Zolzer, U 1999, Digital audio signal processing, John Wiley & Sons, New York. Read More

Digital Audio Theory - Essay Example

Extract of sample "Digital Audio Theory"

CHECK THESE SAMPLES OF Digital Audio Theory

Changing Role of Cinematographers in the Age of DigitalImagery

Digital media research

PCM Theory and Audio Reduction Codecs and Techniques

The Public Changes as a Result of Digital Media

The Invention of the Transistor and the British Media Landscape

Digital Technologies and Contemporary Patterns of Music

Sub-band Coding and Scalar Quantization

Digital Piracy as a Crime