This is a read-only archive of the Mumble forums.

This website archives and makes accessible historical state. It receives no updates or corrections. It is provided only to keep the information accessible as-is, under their old address.

For up-to-date information please refer to the Mumble website and its linked documentation and other resources. For support please refer to one of our other community/support channels.

Jump to content

Mumble Protocol Question


Recommended Posts

Hi ,

I have a very simple question.

The mumble TCP part of the protocol states that every package

has a header that includes a 2 byte type number and 4 byte load packet length.

Every protobuffer message has a unique 2 byte number.

However, i looked into the mumble sources.. but i can't seem to find where these numbers are defined.

I'm looking for the match between every protobuffer message name type and its unique number type.

Link to comment
Share on other sites

Never mind. Found the ennumeration:

typedef enum {

Version, =0

UDPTunnel, =1

Authenticate, =2

Ping, =3

Reject, =4

ServerSync, =5

ChannelRemove, =6

ChannelState, =7

UserRemove, =8

UserState, =9

BanList, /* 10 */

TextMessage, =11

PermissionDenied, =12

ACL, =13

QueryUsers, =14

CryptSetup, =15

ContextActionAdd, =16

ContextAction, =17

UserList, =18

VoiceTarget, =19

PermissionQuery, /* 20 */

CodecVersion, =21


Link to comment
Share on other sites

I'm using pdf file that describes the mumble protocol:

Mumble protocol 1.2.X reference (WIP)

Stefan Hacker, Mikko Rantanen

November 8, 2010

Busy on the client part.

Following that.. the ssl handshake has completed.. with the server.

I have send the version and authenticate. But then.. nothing happens.. i get nothing back from the server.

Connections are open. I have received the certificate.. is there something changed in the protocol sequence?

Link to comment
Share on other sites

Next problem..

Is the UDP Voice Packet

I'm reading it from the tcp stream thus far.. as supported.

1 byte: Header byte : type/target Bit 1-3: Type, Bit 4-8: Target

varint : session The session number of the source user

varint : sequence

The above 3 i read succesfull. The varint has a variable length.

I noticed that the sequence is increased with steps of 2.. is that correct? (0 , 2 , 4, 6 etc)

Then the tricky part the audio part.

I first should get a terminator bit followed by 7 bits indicating the length of the audio part.

However, the length i get exceeds my total remaining message length.

Hence, i must be doing something wrong.. is the above still up-to-date?

The positional audio part.. i don't care about.

Link to comment
Share on other sites

Would you be so kind to update the documentations when you are finished with your implementation. So future devs will not encounter the same problems.

Github is used when one wants to have commits added to the git repo.

Computer specs: AMD FX-8320, 8GB DDR3-SDRAM, AMD Radeon HD 7950, Asus Xonar D1, Windows 7 Ultimate 64bit/Debian Jessie AMD64.

Link to comment
Share on other sites

  • Administrators

Thank you.

If you have changes ready to be merged back feel free to create a pull request. There is a button/link at the top nav of your repo.

I did so this time.

Pull requests make it easier to see the pull requests, code changes (potentially) ready to be merged back.

Also, those not checking these forums can see the changes at a prominent location and pull back with a single button click on the github website.


Also, as I just see your commit does not have author information (unknown), you may want to set your author information in git if you will contribute more in the future.

See point 1 of the github help setup

Link to comment
Share on other sites

  • Administrators

I didn't have time to look at the whole documentation but it seems like the bit order is assumed to be

MSB 1|2|3|4|5|6|7|8 LSB

The header packet above is described the same way. A bit counter intuitive and definitely worth mentioning. I don't have time right now but, unless I forget, will take a closer look this weekend.

Link to comment
Share on other sites

I stand corrected...

I was assuming network IP endian.. however, the fields are little-endian.

It states that in the beginning of the doc.. i overlooked that. Srry.

I have reverted my change, and added some pseudo code. With that added, the next dev will have no problem with the explanation and can

pratically copy-paste it into his own program.

The change is in my git.. if you want it. (I don't have tex.. so the pdf is not changed.)

Looking at the mumble sources i have not been able to determine yet which version of Speex is currently being used.

(speex dir is empty on my check-out)

Link to comment
Share on other sites

  • Administrators

Did you checkout all submodules?

git submodule init
git submodule update

Should've done the trick. Looks like this for my local checkout:


> git submodule
e3d39fec7c44d1841e817d3b1986bfdc4d0863a9 celt-0.11.0-src (v0.11.1)
6c79a9325c328f86fa048bf124ff6a8912a60a3e celt-0.7.0-src (v0.7.1)
a6d05eb5ff9d5062852cdf7df574bec728921ef9 speex (speex-1.2beta2-263-ga6d05eb)


I guess I should mention that since the speex format has been frozen for a while you don't necessarily need the latest and greatest version. Any old version in distros should do the trick.

Link to comment
Share on other sites

Thanks, got all sources now.

Ok.. well.. i'm decoding some audio now and i can hear myself when saved to wav.

Only.. way too fast.

I have some mumble taken code that determines a bit rate of 12000 and 4 frames.

However, how i use that on the speex decoding is not quite clear to me.

I'm currently decoding 48000 sampling rate with 480 framesizes which seems ok. (i know those are the settings how mumble uses the sound internally.)

I have been playing around with the speex resampler... that is applied after the decoding.

No succes yet. I can hear myself.. just too fast. So its must be somewhere with the sampling rate.

I want the resulting wav file to be written to 48000 hz sampling rate.

Link to comment
Share on other sites

  • Administrators

Speex, unlike CELT, isn't a 48Khz codec. Afaik we are using the 16Khz wideband variant so you probably have to resample from that to 48Khz (though honestly if you are only saving the stuff you might just as well save 16Khz pcm, any player will resample that for you).

Link to comment
Share on other sites

Thanks! :)

I now got crystal clear sound on 48Khz.

So.. whatever codec used.. Celt or Speex.. the best possible recording is performed.

I can now work on the playback part. :)

After that.. i need to work on some Jitter buffer implementation.. before i move to the UDP part... i saw that speex also has one.

Perhaps its just as sensible as to use that one.

Another question: Is the server actually doing anything with the Audio data? Suppose i wrap something else in there?

The idea is to have a few bots simultaniously connecting to the mumble server.

They will talk through mumble and directly. Each bot will verify if the mumble-server is performing correctly.

Then the bots will start moving in and out channels.. again checking if the server correctly tells each bot position.

After that a stress test.. the number of bots will increase until the maximum that mumble can handle. (25 for my setup)

Using this.. i hope to tackle the mumble-server connection problems on ARM.

Link to comment
Share on other sites

Thanks.. that will make the testing framework a bit easier.

Although the generic framework i'm making is not just for testing bots.

Continuing my journey..

On connect the mumble server posts a max-bandwidth message.

Based on that i could set the speex quality. The speex quality ranges from 1-10.

My client has already determined the outgoing kbit/s rate. (same algorithm as a full featured mumble client)

But how do i translate the kbit/s that this algorithm determines to the quality setting in speex?

A pointer to the location in the mumble source code where this part is determined would suffice also.

Link to comment
Share on other sites


  • Create New...