Jump to content
Mumble forums

Question about WASAPI audio capturing on windows


billconan
 Share

Recommended Posts

Hello there,


I'm trying to understand the WASAPI to see how audio is captured on windows.


I can't quite understand the following logic in void WASAPIInput::run():

 


if (we want exclusive mode && we don't want echo)
{
   for (int channels = 1; channels<=2; ++channels) {
       try to create exclusive AudioClient
       if (success)
       {
             break;
       }
   }
}

 


first of all, why exclusive mode can't have "echo"

what does it mean by using exclusive mode for input? like other applications won't be able to hear the mic?

what's echo? I think it means we also capture the audio output, so we can cancel that audio output from the audio input. In this way, we don't send the sound from the game, but only the human voice?

Then still I don't understand, why using exclusive input requires no "echo", what's the conflict here?


Second, why do we loop on "channels" here? and if we successfully create an AudioClient with 1 channel, we will break from the loop. That means we won't try to create the AudioClient with 2 channels?


Why is that the case? Shouldn't we always use more channels as they provide stereo?






This is the actual code I'm referring to

	if (g.s.bExclusiveInput && ! doecho) {
	for (int channels = 1; channels<=2; ++channels) {
		ZeroMemory(&wfe, sizeof(wfe));
		wfe.Format.cbSize = 0;
		wfe.Format.wFormatTag = WAVE_FORMAT_PCM;
		wfe.Format.nChannels = channels;
		wfe.Format.nSamplesPerSec = 48000;
		wfe.Format.wBitsPerSample = 16;
		wfe.Format.nBlockAlign = wfe.Format.nChannels * wfe.Format.wBitsPerSample / 8;
		wfe.Format.nAvgBytesPerSec = wfe.Format.nBlockAlign * wfe.Format.nSamplesPerSec;

		micpwfxe = &wfe;
		micpwfx = reinterpret_cast<WAVEFORMATEX *>(&wfe);

		hr = pMicAudioClient->Initialize(AUDCLNT_SHAREMODE_EXCLUSIVE, AUDCLNT_STREAMFLAGS_EVENTCALLBACK, want, want, micpwfx, NULL);
		if (SUCCEEDED(hr)) {
			eMicFormat = SampleShort;
			exclusive = true;
			qWarning("WASAPIInput: Successfully opened exclusive mode");
			break;
		}

Link to comment
Share on other sites

  • Administrators

Exclusive mode is a WASAPI feature where you can get exclusive access to an audio device cutting out the built-in mixer of the operating system to decrease latency. This of course has the downside of no other application on the system being able to output or input sound from that device.


As to why we don't support echo suppression in that mode: Iirc we use a special "loopback" stream provided by WASAPI to get back the actual system output which is not available in exclusive mode. I guess we could feed back our own output instead in that case (we are exclusive after all) but implementing that is probably not worth the effort. I don't think there are many people out there running in that mode :lol:


See https://msdn.microsoft.com/en-us/library/windows/desktop/dd370844(v=vs.85).aspx for more information on exclusive mode streams as well as https://msdn.microsoft.com/en-us/library/windows/desktop/dd316551(v=vs.85).aspx on loopback devices.


With regards to the for loop: Pretty much guessing but it was probably a workaround for stereo microphones not offering a mono configuration for this mode. As Mumble audio input is mono we first try to open it that way, if that fails we try stereo before we give up. As to why we only try up to two? Lack of imagination maybe :lol:


If you want to understand our normal audio path best disregard exclusive mode for now. Afaik it was more of a "let's try if this actually is faster" thing without much practical use for the vast vast majority of users and uses.


Hope that helps.

Link to comment
Share on other sites

Thank you so much for the answer. :lol:


I read the microsoft document. it says if an endpoint is for loopback, it can't be exclusive. but it doesn't say that other coexisting endpoints need to be shared too.


In the code, we create two endpoints. One is an eCapture endpoint for the mic input, one is an eRender endpoint for the loopback capture.


My understanding is, the eRender endpoint needs to be shared. But the eCapture endpoint doesn't have to be shared too.


However the current code logic seems to suggest that both the eRender and the eCapture endpoints need to be shared. That's why I had the first question.


I'm planning to build mumble myself and hack that part of the code to see what will happen if I create an exclusive eCapture endpoint with a shared eRender endpoint.

Link to comment
Share on other sites

so I built mumble, and modified the code into the following:

 

if (/*g.s.bExclusiveInput && ! doecho*/ true) {
       ... initialize the exclusive mode ...
}

if (!  micpwfxe) {
	//if (g.s.bExclusiveInput)
	qWarning("WASAPIInput: Failed to open exclusive mode. ------------------------------------- ");

 


just like I suspected, the exclusive mode shouldn't conflict with echo cancellation.


Both seem to be fine as seen from the following log, the audio wizard functions ok too.




 

WASAPIInput: Successfully opened exclusive mode -------------------------------
WASAPIInput: Mic Stream format 0
WASAPIInput: Stream Latency 100000 (480)
WASAPIOutput: Output stream format 1
WASAPIOutput: Stream Latency 116100 (2646)
WASAPIOutput: Periods 10000us 3000us (latency 11610us)
WASAPIInput: Echo Stream format 1
WASAPIOutput: Buffer is 60000us (5)
AudioInput: Initialized mixer for 1 channel 48000 hz mic and 2 channel 44100 hz
echo
AudioOutput: Initialized 2 channel 44100 hz mixer
warning: The VAD has been replaced by a hack pending a complete rewrite
AudioInput: ECHO CANCELLER ACTIVE

Link to comment
Share on other sites

Thank you.


I'm now trying to understand the echo cancellation code and the re-sampling part in AudioInput.cpp.


I can't be sure that this is a bug yet. I'm thinking, the major benefit of using exclusive mode is performance.


if we mix the exclusive mode with shared mode, we might lose the performance speedup as both audio buffers need to be available at the cancellation stage.


maybe the author of the code determined that it isn't worth the effort in this situation.


another thought is the audio format. I noticed that when exclusive mode is used, the format for a sample is "short", whereas in shared mode, the format is "float".


mixing exclusive and shared might require a format translation, which will also slow down the performance.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...