This is a read-only archive of the Mumble forums.

This website archives and makes accessible historical state. It receives no updates or corrections. It is provided only to keep the information accessible as-is, under their old address.

For up-to-date information please refer to the Mumble website and its linked documentation and other resources. For support please refer to one of our other community/support channels.

Jump to content

Implementing HRTF for positional audio. Help needed


rawnar
 Share

Recommended Posts

After my post about adding some realistic sound effects to the positional sound, I had a short discussion with slicer on the IRC channel. Out of this discussion it was clear the implementing HRTF into mumble would the way to go. So I started investigation into the roam that is called "binaural" or hearing with two ears. One could read this document to get the general idea about positional audio inside an interactive environment(games).


HRTF stands for Head-Related Transfer Function, and is the term often used to indicate binaural filtering of a audio stream. The HRTF is the Fourrier transform of the HRIR (Head-Related Impulse Response), were the latter can be measured by using a speaker, and a microphone positioned in the ear of a subject.(See the listen HRTF database for examples) All the audio cues that make use give an idea were a sound is coming from are present in these HRIRs, and HRTFs. These cues consists of three major components, being IID, ITD and Pinna effect.


IID(Interaural Intensity Difference) refers to the effect that sound is louder for the ear that is closer to the sound, but also from the effect that sound at different frequency will be differently deflected by the head. ITD(Interaural Time Difference) is due the fact the sound has an finite speed and will reach each ear at the different time. These two effect are dominant for getting the azimuthal direction of the sound. The shape of the outer ear(Pinna) helps us with finding the right elevation angle of the sound, by filtering different frequencies at different elevations. To put it short IID and ITD are for in plane localization and the Pinna effect is for out off plane localization.


Now for the part of the implementation of these HRTF. If one would have the HRIRs for a given direction, then performing a convolution on the left and right HRIR with the audio stream the binaural audio stream can be calculated. But doing a convolution in time is not very fast as it is a O(N^2) operation. Fast convolution algorithms use the property that a convolution in time will be a point-wise multiplication in the frequency domain. This means one performs a Fourrier transform on the signal and on the filter do a point-wise complex multiplication and transform the result back to time by a inverse Fourrier transformation. Implementing this will be addressed later let us first go back to getting the HRIR or HRTF for a given direction.


The HRTF or HRIR are provided at fixed directions and will generally not coincide with the wanted direction. For that an interpolation scheme needs to be applied. I think by using a triangular interpolation this can be achieved with little over-head. In this article the first interpolation method is the one I want to apply. The interpolation needs to be done on the HRTFs for which the time delays have been removed to reduced the phase offset between the three HRTFs. And add the interpolation time delay back to the audio streams after the filtering. Finding the time delays can be easily done as a pre-processing step. But finding the three HRTFs which are located around the needed direction is a bit more tricky. I am thinking about using a 3d delaunay triangulation to get all possible the triangles and make a linked lists from them so I can cycle through this list to find the wanted triangle. For the dataset I am working with this will mean I have to check at maximum 370 triangles. Maybe someone has a better method of finding the correct triangle?


Let is get back at the problem of fast convolution algorithms. The convolution will be done in the frequency domain so the HRTF will be stored and not the HRIR skipping one FFT step. When one does a convolution in time the resulting signal will have the length of the audio stream plus the length of the filter. To take this length difference into account a overlap add method can be used. This means you add at least some many zeros to the time signals that its length will be the same as the predicted results, so audio plus filter length. After this perform a FFT on both and do the point-wise complex multiplication and IFFT it back into time. Present the result with the same length as the input audio signal and save the tail for the next audio package.


For FFT to work fast a length which is a power of two is needed. For the case of mumble the samples per audio stream package is 480 so the next power of 2 is 512, but we also need the extra length introduced by the filter, which will not fit in 32 samples. So we end up with a length of 1024 samples, which is quite substantial. To speed up the FFT we can take into consideration that it is a real to complex transformation and not a complex to complex transformation like standard FFT. Also the fact that a large portion of the input samples are zero can help us speed up the computation. To optimize these FFT algorithms I need some help as I am not a specialist in this field? As a matter of fact I am not a specialist in any of the above presented sciences. :D

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

Okay the developers advised me to start simple and then work up to a more complex interpolation scheme. But just because it is nice to think about complex problems here are some nice pics of the delaunay triangulation.

http://www.plaatjesupload.nl/bekijk/2010/10/07/1286427165-660.png

http://www.plaatjesupload.nl/bekijk/2010/10/07/1286426938-540.png

http://www.plaatjesupload.nl/bekijk/2010/10/07/1286427136-550.png

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

  • 2 weeks later...

More important than starting simple is not optimizing too early. Finish the program first then worry about making it fast enough.


Also I would point you to FFTW. Some select features:

# Speed.

# Arbitrary-size transforms. (Sizes with small prime factors are best, but FFTW uses O(N log N) algorithms even for prime sizes.)

# Fast transforms of purely real input or output data.

Link to comment
Share on other sites

Thanks for the advice. There is also a real value fft already inside mumble src tree, so I will start with that one.

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

 Share

×
×
  • Create New...