Digital Signal Processing
Reshaping Your Perceptual Environment One Digital Sample at a Time
It was the 60's when I fondly recall watching the sci-fi TV show The Outer Limits. It competed for attention with Rod Serling's The Twilight Zone. In TTZ, strange things just happened to people. TOL set the stage with a "control voice" stating in part: "There is nothing wrong with your television set. Do not attempt to adjust the picture. We are controlling transmission. If we wish to make it louder, we will bring up the volume. If we wish to make it softer, we will tune it to a whisper. We will control the horizontal. We will control the vertical… You are about to experience the awe and mystery which reaches from the inner mind to... The Outer Limits."
Both series broadcast in black and white, which in retrospect really magnified their warp from reality. If TOL had been in color, perhaps the control voice could have corrected the color and tint for me too. I'm certain that all of you survivors of NTSC's early days can relate. As the control voice for this article, I hope to maintain the awe and dispel some of the mystery about digital signal processing.
Controlling, filtering, re-blending, enhancing, compressing, and replaying the everyday analog events of our lives is the new reality that we lump into the phrase "digital signal processing", or DSP in daily tech-speak. DSP is implemented in various forms and means different things to different people. While the most common inference ties DSP to audio processing, any manipulation of digital bits representing picture or sound information is a form of digital signal processing. DSP is now the default acronym associated with manipulation of audio signals; an attribution tied to the reality that digitally processing audio information occurred earlier than video because the audio frequency range is much lower than the basic video frequency range. Therefore, audio signals initially required less digital processing, which could be implemented with lower cost processors.
What is DSP, Really?
Underneath it all, DSP is all about creating focus. Focusing on something is akin to honoring its message or its intrinsic beauty. For example, DSP can remove unwanted noise so we can focus on the unique quality of a sound. DSP can filter out a range of frequencies, much like the adaptiveness of the human ear, to allow focus on a specific message. DSP can remove the mundane and the clutter much like a photographer crops an image to focus attention on one visual concept.
We've been filtering signals for years in the analog domain. Why is digital signal filtering so important? The difference is analogous to the accuracy of cutting something with an axe compared to cutting with a focused laser beam. Analog filtering is coarser with commensurate difficulties controlling the filter response. Digital signal processing manipulates numbers that lie within specific ranges which support very precisely designed digital filters having stable, predictable response. DSP can "surgically remove or modify" information with relative ease compared to analog methods.
The strength and the foundation of DSP is the manipulation of the numbers, or digital data, which represent discrete samples of a time-varying, real world analog event. Given sufficient discrete samples, those samples may be ingested by a processor, manipulated mathematically, and reconstructed as a new version of that analog signal with such accuracy so as to be indistinguishable, or enhanced, from the original. DSP unintentionally directs new focus on how users interface with and use products more than on its technical functionality. The typical DSP product feature list is often complicated with so many choices that the user interface must not only be graphical, but it must guide the user's setup choices and decision-making method to a greater degree.
DSP State of the Art
One question that invariably comes up is: "So, what is the state of the art in DSP?" It's a valid question, but tends to focus on the hardware and not on what the hardware is doing. DSP is about processing numbers rapidly to perform calculations required by algorithms designed to address a specific problem. Just as in traditional computer programming, the DSP engineer writes an algorithm, or process, for solving a problem. The problem solution is usually modeled into one or more equations that can be systematically solved by a microcomputer through iteration. The faster that the microcomputer can perform the required calculations, the closer the solution approaches what we consider to be real time. For audio signals, it must be fast enough that we do not perceive the result as abnormal.
How does a digital signal processor compare to a regular microcomputer? Are they really the same thing? DSPs are specialized, highly streamlined implementations of microcomputers with unique features. All microcomputers can be programmed to perform digital signal processing functions in the general sense. The microcomputer limitations become evident with the type of algorithm typically needed for the DSP solution, such as with audio filtering. The difference boils down simply to efficiency and speed of execution. The internal architecture of a regular microcomputer, while very capable at performing looping calculations and multiplications characteristic of filter algorithms, does not make efficient use of its memory or management of program steps for DSP-type calculations. A microcomputer would have to run at several times the speed of a typical DSP for even the simplest filtering operations. When addressing multiple algorithms, the microcomputer solution becomes too slow or non-functional very quickly. Microcomputers are optimized for control applications where sequential timing of events is necessary without significant mathematical operations.
DSP Algorithms Dictate DSP Engine Design
So, the design of a DSP is dictated by the types of algorithms, or calculation sequences, it most often executes. DSP architecture is optimized to handle multiple calculation sequences in parallel while minimizing clock cycles. At the heart of a DSP is one or more MACs, or "multiply and accumulate" units. Multiplication and accumulation of the result is the prime process in a DSP. A DSP can typically perform one MAC in one clock cycle while a standard microcomputer may require four or more clock cycles.
Figure 1 shows the basic architectural difference between a DSP and a microprocessor. The shared bus in the typical microprocessor necessitates incremental fetching of program steps, then data. Further, the data must be moved into registers and the appropriate operations executed. The result is obtained at the cost of several steps and considerable additional time. In the DSP architecture, program and data are usually separate with dedicated bus pathways. Both program steps and data can be fetched and moved simultaneously, thus effecting much faster processing.
There are many facets that contribute to the fast architecture of a DSP. Among them are: high memory bandwidth, multiple MAC units, local instruction cache memory, data address generation units, and circular addressing.
Processing Power… Approaching 'The Outer Limits'
One of the most common design considerations with a DSP is the data format used for calculations… that is, fixed-point versus floating-point processing. A fixed-point processor has its binary point, which corresponds to the decimal point in our base 10 number system, fixed at one position within the data word. Floating-point allows movement of the binary point by utilizing a mantissa and an exponent; whereby the binary point can "float" based on the exponent value.
Fixed-point format limits the processor's number range for a given data word. The fixed point is maintained by using a single scaling factor. For example, the value of π, 3.14, can be represented in fixed-point notation by the integer value 314 divided by the scale factor 100. Numbers larger than one data word require concatenation of an additional data word, or words, for the range required. Concatenation requires additional calculation time, which affects processing speed accordingly. This situation is one factor relating directly to processing latency in DSP. The primary concern with fixed-point designs is the maintenance of "numeric fidelity", or managing number overflow in the processor. Fixed-point DSPs typically garner lower cost than their floating-point counterparts. Tradeoffs like management of numeric fidelity and programming complexity often redirect designers toward use of floating-point processors.
Floating-point format is similar to scientific notation in the decimal system where a large number may be represented by an integer value multiplied by some power of ten. Multiplication by a power of ten implies an exponent which will dictate the final precision by shifting the decimal point, or radix, by the number of places equal to the value of the exponent. Floating-point representation supports a much wider range of number values.
Arguments debating fixed-point versus floating-point processing can be found among DSP manufacturers, users, and specifiers. In the final analysis, the DSP equipment designer must use great skill such that the user need not be so concerned with the 'how' in DSP-equipped products so much as whether the product appropriately addresses the problem/solution scenario. Another common user concern with DSP-supported products centers on processing bit-depth and processing latency. Bit depth is another way of expressing concern for range and accuracy of the digital representation of the real-world analog signal.
Bit Depth and Latency
The topic always comes up, so let's talk about it. Of course, under the hood, it's important that a DSP system uses lots of bits per word. More bits is better, right? No system designer is going to look 'cool' if he designs an 8-bit DSP solution. Moreover, are there any out there? In a fixed-point DSP, if there are more bits needed to maintain numeric fidelity, and I love that term, we can concatenate more data words to make up any level of processing precision needed. So what's to give up? Speed, of course. But, processing power is getting cheaper by the day. This makes the bit depth question a bit difficult to answer. If the DSP is processing your needs in real time, do you care?
We do care about the front-end and back-end conversion. Professional audio essentially dictates a minimum of 24-bit processing with 48 KHz sampling. The analog-to-digital and digital-to-analog conversion must be designed quite well with the dynamic range bounded by the 24-bit architecture. What happens to 24- bit words in the processing chain is the domain of the DSP architect.
A fixed-point DSP can operate in double precision mode, which is 48-bits, and provide very adequate control over dynamic range as well as have sufficient processing power for its MACs. Double precision may require additional time, but may be mitigated by processor speed. A 32-bit floating-point processor handling 24-bit data words provides us an 8-bit exponent for exceptional dynamic range. However, professional audio processing using 32-bit floating point, or 48-bit fixed-point for that matter, can result in noticeable distortion. Double precision 64-bit operations easily mitigate distortion issues, which is why you'll see reference to 32/64-bit floating point functionality referenced in Extron DSP products. Latency simply relates to the time delay incurred while performing operations. For one-way audio processing and broadcasting, latency may not be important. In telephony, latency is very important because it affects the quality of the conversation between two people. In conversation, the listener issues unconscious utterances subtly notifying the talker that both are engaged in the conversation. Video conferencing and live broadcasting likely demand the most control over latency since action and discussion must remain in sync to realize a 'normal' experience. Live audio systems should limit latency to no more than about 10 milliseconds.
Latency in image processing is problematic depending on the ability to maintain lip sync. In fact, having adjustable latency in the audio processing path is often desirable for matching lip sync timing with video processing delays. What is an acceptable amount of latency? For telephony, latency of less than 150 milliseconds is perceived as acceptable. Image processing typically demands no more than about 30 milliseconds, about one video frame, before lip sync difficulty is noticeable.
Controlling the Horizontal and the Vertical -- Common DSP Functions
Whether an audio product design utilizes fixed-point or floating-point DSP, the following is a common stable of functions created in DSP to provide the intricate signal processing functionality for which audio DSPs are famous:
- Feedback suppression
- Ducking and Auto-mixing
- Dynamics and Delays
- Automatic Echo Cancellation, AEC
Feedback suppression counteracts the common problem of acoustic ringing due to acoustic waves cycling between the microphone and loudspeakers. Feedback is a positive reinforcement effect where room acoustics facilitate coupling between speaker emission and microphone reception of a specific frequency, or group of frequencies. The coupling effect creates a gain loop that builds until the audio system oscillates out of control at a dominant frequency. Feedback suppression utilizes multiple dynamic filters precisely controlled by DSP. The DSP detects the sensitive frequencies and adjusts the dynamic filter's resonant point to coincide with the offending acoustic oscillation. Most DSP systems employ a group of dynamic and fixed filters that may be adjusted manually or automatically.
Since the acoustic performance of each room situation is different, one of the prime DSP features of feedback suppression is the ability to detect and converge upon resonant frequencies rapidly. Once filters are tailored to manage room resonances, the next challenge is to decide when a particular set of filters should be released and a new set converged as the room dynamics change. Why would it change? Room acoustics can change as people move around in the room activity, the temperature changes, or when the microphone is moved.
Ducking and Auto-mixing are features that automatically attenuate the level of microphones or program sources in favor of a primary source whenever it has priority. This feature enhances intelligibility by providing a significant differential between the two sources. Dropping level on a continuous source, such as background music, improves our ability to focus on an important voice-over message.
Dynamics describes a set of DSP features that includes automatic gain control, compression, limiting, and noise gating. The DSP realm opens many possibilities here in terms of functionality and the number of processing steps available. For example, compressors traditionally affect the dynamic range of the entire audio bandwidth being processed within the chain. Digital processing affords the ability to perform parallel compression and/or multiband compression.
DSP also offers the opportunity, with little incremental design effort, to include peak and RMS sensing of level changes so that the compressor's action can be more widely tailored to the type of compensation needed based upon the source material.
Noise gating sets a threshold where signal level below a set point is attenuated and/ or not passed through the signal chain. This can enhance the perception of signal-tonoise performance in noisy environments or where noise exists in equipment within the signal chain. Although when the noise gate is 'open' during louder passages, noise is allowed to pass through.
Delays are important. Delay is needed to control the time at which the signal emerges from one processing task with respect to other signals' processing tasks. Within the DSP system, parallel channels will require differing processing times depending on the type of feature. For example, echo cancellation may introduce more delay than a simple bandpass filter. So, the ability to set a delay time for one channel versus another is necessary in order to keep signal chains in relative sync. Another very important need for variable delay is re-establishment of lip sync where the video channel and audio channels are processed separately.
Filtering is one of the most common and least complex DSP functions. Filters are usually performing parametric equalization, low pass, high pass, or bass/treble shelving functions. DSP commonly provides a collection of customizable filters for the input section and a separate group for the output section. Filter functions in DSP represent quick, recursive routines that perform the task at a much higher degree of accuracy and efficiency than any analog counterpart. Figure 2 shows how the designer can interface with DSP control software to implement a specific filter function.
Automatic Echo Cancellation, or AEC, is the most active and, intensive of the routines in DSP. Echoes occur in teleconferencing when the local microphone picks up not only the speaker's voice, but retransmits the voice of the far-end speaker as their voice emanates from the local loudspeaker. This effect creates positive reinforcement which builds into sustained acoustic feedback. If there is sufficient delay in the link, sustained feedback may not occur, but intelligibility is greatly hampered. Similar to local acoustic feedback, the automatic echo canceller attempts to 'learn' the actual transfer function between the microphone and speaker on the conferencing system. Once learned, AEC cancels the interfering feedback.
However, this feature is much more complex than acoustic feedback because the AEC must determine groups of frequencies within voice patterns so that the voice from the far end can be subtracted from the microphone signal to cancel the feedback while not affecting the reproduction of the near-end speaker's voice. In other words, all the audio coming out of the local speaker must be removed from the microphone signal. With changing room environments or teleconference attendees talking from different places within a room, echo cancellation becomes very challenging. Therefore, AEC requires significant processing power due to the dynamics of the situation. An echo decays over a period of time called the "tail length". Processing the tail length must extend for a period of 100 milliseconds or longer.
Figure 3 provides a relative sense of the DSP hierarchy, or level of complexity and processing required, for the typical DSP functions.
Controlling the Transmission…
A unique aspect of a DSP is its control interface. In fact, a DSP system may be more notable or desirable for its softwarebased control interface as this is the dayto- day interactive means for functional setup. The typical software control interface is highly graphic and should facilitate ease of understanding and implementation. A really good interface paints a picture of the audio processing chain in the mind of the system designer or installer that mentally 'pops up' each time a DSP system is implemented.
Some interfaces support a fully open control architecture, which may require significant training for the system installer. Another approach that is often more intuitive utilizes a fixed signal flow where features are compartmentalized into a typical topology that has flexibility with respect to the setup of the DSP functions within the topology. This control interface aids the user's ability to setup DSP applications quickly. Figure 4 illustrates the flexible, simple design support provided by a compartmentalized approach.
Reaching The Outer Limits
The old episodes of TOL ultimately took the viewer to the point where the outcome was often incredulous, or was it? Approaching the brink of unbelievability is like standing on the edge of a technological cliff. This thing I want to do may not be possible, but yet it might be. Continuously advancing DSP technology requires that we always question the challenge and remain open for what is now possible. Such is the realm of DSP. Do you hear a control voice in your head each time the need for DSP arises? "There is nothing wrong with your audio channel. Do not attempt to adjust the master gain. We are controlling acoustic feedback…"