Jump to content
Mumble forums

Positional Audio revisiting


rawnar
 Share

Recommended Posts

While slicer is working on a new implementation for positional audio I would like to share my ideas with the rest.


One of the main problems, with how positional audio is implemented at the moment, is that when you have a speaker configuration that is mainly positioned in front of you also the sound from the front will be harder then from the back. This topic will give information about the problem and the terms used here. One of my ideas was to shift the listener to the centre of mass of the speakers. The effect of this is that the sum of the dot products between the speakers and the source direction vector will be zero. And this is what we want, as it will result in that the total volume over all the speakers will be independent to the direction of the source. This can also be implemented without putting extra load in the CPU during use, as the shift of the speaker only has to be performed ones. We still have to be careful that the sub woofer does not mess up the system, we just need to leave it out of the centre of mass calculations and the corresponding shift. One problem with this system is that the dot product can be larger then 1.0.


The previous idea also showed me that the overall volume is reduced when switching to positional audio. In a no positional audio case all the speakers will have a volume of 1.0, so the total volume is the amount of speakers. In the positional audio case, even when one has a nice even distribution of speakers, the total volume will be the amount of speakers divided by 2. In the gain calculation first 1 is added to the dot product after which it is divided by 2, so if the sum of dot is zero then the sum of the gain will be the amount of speakers divided by 2. An easy fix for this would be, when not in positional audio divided the volume of the speakers by 2. But this can cause that mumble will sound soft with respect to other sound programs.


Lets take a look at the implementation slicer is suggesting. His idea is to find the speakers between which the sound source is coming from. And use the ratio between the angles of the source vector and these speakers vectors to distribute the sound volume. This will give a nice distribution of the sound over any array of speakers. The only thing I am worrying about is that now the total volume will be 1, so with a large amount of speakers the difference between positional audio volume and normal volume will be quick large.


There was also a discussion about the attenuation with distance on the irc channel, so I also had a deeper peek into these formula's. If we have a look in the OpenAL source code you will find that they have three different distance models. These models all are dependent on a reference distance (dis_ref) and a Roll-off factor(Rolloff). The reference distance can be seen as the minimal distance (dis_min) used in mumble and the Roll-off factor gives how fast the sound gets softer with distance. In mumble the Roll-off factor is not given, but is implicitly calculated with the use of maximum distance (dis_max) and volume at maximum distance (A_max). The three models are as follows.


Inverse distance model: (proposed in IASIG I3DL2)

A = dis_ref / ( dis_ref + Rolloff * ( dis - dis_ref ) )

Linear distance model: previous implemented in mumble

A = 1 - Rolloff * ( dis - dis_ref ) / ( dis_max - dis_ref )

Exponential distance model:

A = ( dis / dis_ref ) ^ ( -Rolloff )


If one calculates the Roll-off factor using the maximum distance and the volume at maximum distance and fills it back into the formulas you will get,


inverse: A = 1 / ( 1 + ( 1 - A_max ) / A_max * ( dis - dis_min ) / ( dis_max - dis_min ) )

linear: A = 1 - ( 1 - A_max) * ( dis - dis_min ) / ( dis_max - dis_min )

exponential: A = ( dis / dis_min ) ^ ( ln(A_max) / ( ln(dis_max) - ln(dis_min) ) ) = e ^ ( ln(A_max) * ( ln(dis) - ln(dis_min) ) / ( ln(dis_max) - ln(dis_min) ) )


Within mumble a modified version of the exponential distance model is used.


A = 10 ^ ( log10(A_max) * ( dis - dis_min ) / ( dis_max - dis_min ) )


Let see how these three models will behave in mumble, using minimal distance = 1, maximum distance = 20, volume at maximum distance = 0.2.

http://www.plaatjesupload.nl/bekijk/2010/09/06/1283791438-590.gif

So which one is the one we want. I do not know, do you have a preference?

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

Using the centre of mass can change the virtual position of the speakers. Hence, a side speaker could be placed more to the back, resulting in less volume over the side speaker when source comes from the right then when the source is slightly from the rear. To solve this, one could clamp the dot product not to exceed 1. Another solution would be to scale the initial speakers volume and leave the listener were he was. The scale would be the distance from the listener to the speaker as if the speaker would be in the centre of mass. One could use the variable fSpeakerVolume[], which is used in the positional and non-positional case. This can be a could thing as it will also give a better distribution of the audio when one is not using positional audio.


BTW, The present distance model of mumble can be rewritten into, A = e ^ ( ln(A_max) * ( dis - dis_min ) / ( dis_max - dis_min ) ) or A = 2 ^ ( log2(A_max) * ( dis - dis_min ) / ( dis_max - dis_min ) ). Maybe on of these expression is more optimized.

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

We talked about what all ready has been implemented in mumble and can be improved. Now lets talk about enhancements we can add to the positional audio system of mumble to make it even more realistic. I have a small list and some idea's and how to implement them, maybe someone can see if these idea's can be implemented in mumble, without loading the CPU to much.


1) In real live sound is muffled by the air as it travels through it, in other words air acts as a low-pass filter. A low-pass filter can be applied on the audio stream were the distance defines the cut-off frequency. The further away the lower the cut-off frequency.


2) Sound does not have a infinite speed, meaning that if one would talk to you it takes time before you hear him. Like for instance when someone talks to you from a long distance, the lips and sound will be out of sync. This can be implemented by delaying the sound stream with respect to the distance. It could be that people will start to see this enhancement as lag, in game worlds this delay is never used.


3) The doppler effect will provide you information for the movement of the sound source, which helps you to pin-point its location. If we want to implement this we will need to have relative speed of the sound source. We could probe the game for this info, or we could save the previous position and time and do a first order approximation.


4) When you talk to a person which you are not facing he will have problems hearing you. This is due to the fact the sound you produce is directional. This can be implemented if one also has the front vector of the sound source. By calculating the dot product between the direction vector of the sound source and the front vector of the sound source you get a number related to the opening angle between the two vectors. This opening angle can be divided into three zone. for instance,

dot < -0.8 no attenuation, volume = 1.0

-0.8 < dot < -0.5 linear attenuation, volume varies between 1.0 and 0.5

dot > -0.5 constant attenuation, volume = 0.5


Then there are some sound effect which are related to the room you are in like, reflection, reverberation,obstruction and occlusion. But for these effects you need to know something about the game environment, so not go.


Pick the one you would like to have implemented into mumble or pose another enhancement and I will start writing the code when enough people want it.

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

The enhancements I suggested are not going to be implemented into the client. Mainly people using mumble don't want to have a realistic sound experience ,but just something unrealistic that will give them advantage over non mumble users. ;)

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

aww to bad. I would love to here more realistic 3d voices. But i have to admit that i didnt understand all of the calculation you did.

Im not shure if we ever get the chance to give the user a choise to pic from diffrent 3d moddels. But this might be to much for the avarage user not to mention to keep up with the code of such 3d codecs.

Expecaly the doppler effect did sound like a fun idear.


Thank you for all the thoughts and efforts you put in here.

Link to comment
Share on other sites

Yes, I tend to dive into the formulas to much. But what do you expect from the rocket scientist(M.Sc Aerospace Engineering) with a tendency to physics. :D

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

  • Administrators

I guess what slicer said about users not actually wanting realistic roll-off is kinda true. I guess most if not all of the people using it want to get a feel for direction out of it that helps their in-game orientation.


Also we have to take into consideration that Mumble, due to AGC as well as clipping, cannot support the big range of volumes normal communication has. In a realistic combat scenario this would go from whispers which can't be possible overheard from a mere meter away to shouts that easily travel tens of meters. _If_ we were to implement realistic roll off models we would have to compensate for that by, for example, giving the user the possibility to switch between shouting and talking normal.


The whole topic is a very tricky one indeed. Thanks for your investigation and writeup on the topic, learned a lot.

Link to comment
Share on other sites

There is this float pfVolume in the AudioOutputUser class, maybe that can be used to make the sound louder when someone talk louder in his mic. Or is that float not coming from the other client through the protocol.


Still people can reply to this topic. And when we get enough request for an enhancement, we will have a change that it is added to mumble. So we have one for the Doppler effect. And one for the directional sound, my favourite.

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...