airon-extv Posted February 20, 2014 Share Posted February 20, 2014 note - This request was also posted in the tracker for the Mumble project.A proposal to improve the sound quality of the incoming signal with the help of simple, readily available Digital Signal Processing, in a way that is easy to setup by beginning and veteran users alike.All of this is optional. The automatic functions of the Speex library currently being used are a decent alternative as well, and one might even combine them with these DSP processors to achieve even more favourable results. WhySoundNo two headsets sound the same, and a lot of them sound downright bad. Most of what these headsets and other microphones produce can be salvaged with the help of simple signal processing, that is computationally cheap and easy to setup for the specific task of handling speech.Alternate ApproachesThere are a number of people who prefer to use virtual audio devices to route audio through a VST host. This requires a lot of knowledge that has to be acquired and put in to action, and is thus utilized by very few people.This is easier for usersIntegrating the DSP in to Mumble means nobody has to use any 3rd party software. The user only needs to setup the equalizer and dynamics once for his particular microphone and voice.The most crucial aspect of this is to make it intuitive to use, and there's a section of this proposal dedicated to that. What is this DSP?FiltersFilters make up the components of equalizers, that are similar to what you'd find on an iPod or stereo amplifier. Controls for bass and treble come to mind. These articles, http://en.wikipedia.org/wiki/Equalization and http://en.wikipedia.org/wiki/Equalization_%28audio%29 , have some good basic information.A good resource is this page : http://www.wickiemedia.net/audio-tutorials/season-2-spectral-processing.html . Please note that we do NOT NEED any graphical output in Mumble, just sliders. People will adjust any controls we hand them by ear alone.I propose that as much as possible is pretuned, and the user is exposed to as few controls as possible. Here I'll list what filters we'll need and what controls to expose to the user.The filters we'd require would be: one Lowcut Filter, 12dB Dropdown menu control, defaults to 80Hz, selectable frequencies are 40, 60, 80, 120 and 200 Hz. Use: Cuts rumble and boomy sound. Pops(wind-flow induced low frequency spikes) have less of an impact, but in general this cuts junk nobody needs to hear. Female users will likely use 120 and 200 Hz frequencies. one Low-Shelving Filter Selectable frequency and gain. Defaults to 200 Hz. This is a simple bass control to shape the lowest frequency character of a voice. Only the gain is exposed to the user. four Parametric Peak Filters (controllable gain, frequency and bandwidth) Only gain controls are exposed to the user by default. Bandwidth is expressed with a Q-factor from here on in. Frequencies are 220 Hz, 700 Hz, 1.5 kHz and 3.5 kHz by default. Bandwidth is kept large(low q-factors) to give a smooth response and more of an overall shaping character rather than surgical. We're not targeting engineers after all. A q-factor of 1 is a good starting point. 220 Hz is near the fundamentals and first order harmonics of voices, and often responsible for nerve-wrecking mud in the voice. 700 Hz is a regular problem frequency for headsets and lavalier mics(little clip-on mics you see on talkshow hosts) 1.5 kHz is a good target for roomy sound. Some voices get nasty here too and a small cut can smooth that out nicely 3.5 kHz is the speech centre. When this get loud, it gets nasty, which is why this often gets cut. Sounds icky ? Try cutting this. one High-Shelving Filter Selectable frequency and gain with only gain exposed by default. 12 kHz default frequency setting. This is a simple treble control to shape the highest frequencies in a voice. Only the gain is exposed to the user. Dynamics A nice, simple introduction to compressors with great follow-up videos is here : These folks present the subject with great ease. More is available on this page on their site: http://www.wickiemedia.net/audio-tutorials/season-1-dynamic-processing.htmlWhat I'm proposing is basically a compressor, and if possible, a noise gate.The noise gate is always first. The noise gate will have basic threshold, range(or ratio), attack, hold and release parameters. We can tune all except for the threshold and perhaps the range(or ratio). The other parameters can be tuned to react well to speech, but those have to be set by the user.The compressor can expose all of its controls, which are Threshold, Ratio, Attack, Release, Knee and Makeupgain. We can however tune the compressors attack and release parameters to react well to speech and expose only the three essential parameters to the user, which are Threshold, Ratio and Makeup gain. Integrated in to the compressor would be a limiter which cannot be changed in any way. This prevents bad clipping on the final output.If possible, the other parameters could be exposed with a "Detail" or "Show All" button for more experienced users. [*]BackendThis consist of readily available C++ code for filters and dynamics. Two libraries come highly recommended for ease of use in ones own projects.Vinnie Falco's DSP Filter classes (MIT license)Quote: Classes are designed as independent re-usable building blocks. Use some or all of the provided features, or extend the functionality by writing your own objects that plug into the robust framework. Only the code that you need will get linked into your application.Chunkwares Simplecompressor (includes a gate and limiter as well) (BSD-like license)Simple compressor, limiter, gate etc. by chunkware.com, originally posted on musicdsp.org, but lost until recently. This is the version of the source that ben-benvesco-com published. [*]FrontendThis is going to be part of the setup wizard, as well as be configurable in regular configurations. It it imperative that the user never clips their input. This DSP chain is one of the reasons that will never be necessary either to achieve a certain loudness.The dynamics and EQ can be placed on the last wizard page where the Push-To-Talk and Signal-To-Noise visuliaztion is. Let's review the sections and necessary controls in order of appearance and processing order.Each section has an ON/OFF checkbox, making everything optional. Low Cut FilterCleans up the low end. This is a dropdown menu control and can be placed on the very left. FREQUENCIES of 40,60,80,120 and 200Hz are selectable. Noise Gate/ExpanderLowers the background noise if possible. This is not meant to replace push-to-talk but to augment it, and give people in loud places like LAN events a chance to get a slightly cleaner sound.Exposed controls by default are THRESHOLD and RANGE/RATIO. THRESHOLD could be slider along the level meter display for easier visualization. EqualizerSome prefer to equalize after compression, but since we're dealing with a lot of low-end gear, the user should first get their sound right before dealing with the dynamic range.Exposed controls are the GAIN for the low-shelf, the four peak filters and the high-shelf. Each can be a slider, if possible vertical(up/down). Complex controls would expose FREQUENCY for both the shelving filters and the four peak filters, as well as BANDWIDTH(Q) for the peak filters. Perhaps knobs could be used, if available. CompressorExposed controls are THRESHOLD, RATIO and MAKEUPGAIN. Additional controls that might be made available are ATTACK, RELEASE and KNEE.Each of these can be represented with a slider.A limiter will sit at the end of all of this, with a threshold of -3 dB, perhaps -6 dB , to prevent any transcoding artifacts from jumping in to clipping range further down the chain. This will not prevent all clipping, especially if the user tries really hard to be loud, but it'll prevent the overload of most live streams and thus spare the listeners from horrible clicks or distortion. Optional processorsDe-EsserA compressor that reacts to specific frequencies. If we can locate an open library that produces decent results, this is a good addition to have, because sharp "S" or "SH" sounds can be quite irritating to the listeners. [*]Future ImprovementsA simple preset system that lets the user save and load their setup to/from a file, which can be carried to wherever they need them. Link to comment Share on other sites More sharing options...
Please sign in to comment
You will be able to leave a comment after signing in
Sign In Now