Question about WASAPI audio capturing on windows

billconan · September 29, 2015

Hello there,

I'm trying to understand the WASAPI to see how audio is captured on windows.

I can't quite understand the following logic in void WASAPIInput::run():


if (we want exclusive mode && we don't want echo)
{
   for (int channels = 1; channels<=2; ++channels) {
       try to create exclusive AudioClient
       if (success)
       {
             break;
       }
   }
}

first of all, why exclusive mode can't have "echo"

what does it mean by using exclusive mode for input? like other applications won't be able to hear the mic?

what's echo? I think it means we also capture the audio output, so we can cancel that audio output from the audio input. In this way, we don't send the sound from the game, but only the human voice?

Then still I don't understand, why using exclusive input requires no "echo", what's the conflict here?

Second, why do we loop on "channels" here? and if we successfully create an AudioClient with 1 channel, we will break from the loop. That means we won't try to create the AudioClient with 2 channels?

Why is that the case? Shouldn't we always use more channels as they provide stereo?

This is the actual code I'm referring to

	if (g.s.bExclusiveInput && ! doecho) {
	for (int channels = 1; channels<=2; ++channels) {
		ZeroMemory(&wfe, sizeof(wfe));
		wfe.Format.cbSize = 0;
		wfe.Format.wFormatTag = WAVE_FORMAT_PCM;
		wfe.Format.nChannels = channels;
		wfe.Format.nSamplesPerSec = 48000;
		wfe.Format.wBitsPerSample = 16;
		wfe.Format.nBlockAlign = wfe.Format.nChannels * wfe.Format.wBitsPerSample / 8;
		wfe.Format.nAvgBytesPerSec = wfe.Format.nBlockAlign * wfe.Format.nSamplesPerSec;

		micpwfxe = &wfe;
		micpwfx = reinterpret_cast<WAVEFORMATEX *>(&wfe);

		hr = pMicAudioClient->Initialize(AUDCLNT_SHAREMODE_EXCLUSIVE, AUDCLNT_STREAMFLAGS_EVENTCALLBACK, want, want, micpwfx, NULL);
		if (SUCCEEDED(hr)) {
			eMicFormat = SampleShort;
			exclusive = true;
			qWarning("WASAPIInput: Successfully opened exclusive mode");
			break;
		}

hacst · September 29, 2015

Exclusive mode is a WASAPI feature where you can get exclusive access to an audio device cutting out the built-in mixer of the operating system to decrease latency. This of course has the downside of no other application on the system being able to output or input sound from that device.

As to why we don't support echo suppression in that mode: Iirc we use a special "loopback" stream provided by WASAPI to get back the actual system output which is not available in exclusive mode. I guess we could feed back our own output instead in that case (we are exclusive after all) but implementing that is probably not worth the effort. I don't think there are many people out there running in that mode :lol:

See https://msdn.microsoft.com/en-us/library/windows/desktop/dd370844(v=vs.85).aspx for more information on exclusive mode streams as well as https://msdn.microsoft.com/en-us/library/windows/desktop/dd316551(v=vs.85).aspx on loopback devices.

With regards to the for loop: Pretty much guessing but it was probably a workaround for stereo microphones not offering a mono configuration for this mode. As Mumble audio input is mono we first try to open it that way, if that fails we try stereo before we give up. As to why we only try up to two? Lack of imagination maybe :lol:

If you want to understand our normal audio path best disregard exclusive mode for now. Afaik it was more of a "let's try if this actually is faster" thing without much practical use for the vast vast majority of users and uses.

Hope that helps.

billconan · September 30, 2015

Thank you so much for the answer. :lol:

I read the microsoft document. it says if an endpoint is for loopback, it can't be exclusive. but it doesn't say that other coexisting endpoints need to be shared too.

In the code, we create two endpoints. One is an eCapture endpoint for the mic input, one is an eRender endpoint for the loopback capture.

My understanding is, the eRender endpoint needs to be shared. But the eCapture endpoint doesn't have to be shared too.

However the current code logic seems to suggest that both the eRender and the eCapture endpoints need to be shared. That's why I had the first question.

I'm planning to build mumble myself and hack that part of the code to see what will happen if I create an exclusive eCapture endpoint with a shared eRender endpoint.

billconan · October 1, 2015

so I built mumble, and modified the code into the following:

if (/*g.s.bExclusiveInput && ! doecho*/ true) {
       ... initialize the exclusive mode ...
}

if (!  micpwfxe) {
	//if (g.s.bExclusiveInput)
	qWarning("WASAPIInput: Failed to open exclusive mode. ------------------------------------- ");

just like I suspected, the exclusive mode shouldn't conflict with echo cancellation.

Both seem to be fine as seen from the following log, the audio wizard functions ok too.

WASAPIInput: Successfully opened exclusive mode -------------------------------
WASAPIInput: Mic Stream format 0
WASAPIInput: Stream Latency 100000 (480)
WASAPIOutput: Output stream format 1
WASAPIOutput: Stream Latency 116100 (2646)
WASAPIOutput: Periods 10000us 3000us (latency 11610us)
WASAPIInput: Echo Stream format 1
WASAPIOutput: Buffer is 60000us (5)
AudioInput: Initialized mixer for 1 channel 48000 hz mic and 2 channel 44100 hz
echo
AudioOutput: Initialized 2 channel 44100 hz mixer
warning: The VAD has been replaced by a hack pending a complete rewrite
AudioInput: ECHO CANCELLER ACTIVE

hacst · October 4, 2015

Interesting. You should create an issue or even better a pull request so we can take a closer look. If we don't need that restriction we should drop it.

billconan · October 5, 2015

Thank you.

I'm now trying to understand the echo cancellation code and the re-sampling part in AudioInput.cpp.

I can't be sure that this is a bug yet. I'm thinking, the major benefit of using exclusive mode is performance.

if we mix the exclusive mode with shared mode, we might lose the performance speedup as both audio buffers need to be available at the cancellation stage.

maybe the author of the code determined that it isn't worth the effort in this situation.

another thought is the audio format. I noticed that when exclusive mode is used, the format for a sample is "short", whereas in shared mode, the format is "float".

mixing exclusive and shared might require a format translation, which will also slow down the performance.

Question about WASAPI audio capturing on windows

Recommended Posts

billconan

Link to comment

Share on other sites

hacst

Link to comment

Share on other sites

billconan

Link to comment

Share on other sites

billconan

Link to comment

Share on other sites

hacst

Link to comment

Share on other sites

billconan

Link to comment

Share on other sites

Browse

Activity