Jump to content
Mumble forums

Develop a BOT to record sessions - protocol documentation


roberthendrickx
 Share

Recommended Posts

Hello,


I'm interested in a BOT that could record automatically the conversations. The "record" function on the client is not sufficient because I want to record all teams at once (basically, recording a specific channel that ears everyone).


It should be possible with a dedicated client, but it's hard to manage (one additional computer is needed as it's not easy to start 2 clients on the same machine)


I could try to develop it myself, but I can't find any documentation on the actual protocol between the client and the server... The protobuf "mumble.proto" is not documented at all, and I have no idea of the process of authentication, how flows are received, how the negotiations occurs, ...


I found 2 BOT that do part of that (EVE and smoak-mmb) that do part of it, but reverse engineering of their own reverse engineering is not the best way to do some good code...


Thanks

Link to comment
Share on other sites

  • Administrators

We have some limited documentation for the protocol in our repository. If you want to develop a bot libmumbleclient might be your best bet. There are also some protocol implementations in python in various stages of completion (rather less complete the last time I looked ;-) ).

Link to comment
Share on other sites

I hoped to do it in python, but I could also wrap libmumbleclient in a cython extension... thanks for the hint


It could be interesting to put some links in the "3rd party" part of the wiki... I only found 2 pythons projects (smoak-mmb seems the most advanced protocol implementation, but only for the parts they need)...


If you have some other links, I'm interested :D


Anyway, thanks

Link to comment
Share on other sites

  • Administrators

Interesting. I didn't even know of mmb before ;-)


I don't really have any links I could give you from the top of my head (there's mumbo in our mumble-scripts repository but that doesn't do audio yet).


It would be great to have a relatively pure python implementation (I guess noone wants to do protobuf and encoding/decoding in pure python ;-) ) to play with.


Anyways. Be sure to report back if you find anything interesting :lol:

Link to comment
Share on other sites

  • 2 weeks later...

OK, It begin to work... I'm able to connect on the server and receive/decode sound on TCP... It's not yet a generic library, but's that already something...


I still havea questions : How does the "sequence" field works in the protocol ?


I try to figure out how to put together the different user's streams in a single sound file, but for that I need to keep the different streams synchronized, and as there is no timestamp with the audio frames, I hope to be able to use the sequence to check when there is a continuous emission and when there is a "gap"


Thanks

Link to comment
Share on other sites

Another question... How does the codec negotiation work ?


When authenticating, I have to send the list of the CELT bitstreams versions I support, but how is it "negociated"?


I receive a "CodecVersion" message, that include all the time the bitstreams versions for CELT 0.7 and CELT 0.11 even if I only support the 0.11 in my authenticate message, only the "prefer alpha" bool changes...


Basically, how do I know I must use Speex (which is mandatory I understand) CELT alpha (0.7 ?) or CELT beta (0.11) for outgoing audio ?


(by the way, this alpha-beta thing does not seems very extensible to support new CELT versions... why not a list of supported codecs that would only include the ones that are supported by all the connected clients, like in the authenticate message with for each a preference...)


thanks

Link to comment
Share on other sites

  • Moderators

iirc the sequence number in an audio packet is the number of the first frame in that packet (i.e. each frame is 10ms and it will increase by 2 for each packet with audio per packet set to 20ms) and the counter resets after 500 "silent frames" (i.e. if you're not talking for 5 seconds).


about the codec negotiation: the client sends all supported versions in the authenticate message. The server will tell you which codec you should use using the preferAlpha field. The client always sends audio data encoded with the version depending on the value of preferAlpha, if it's true it will use the version in alpha, if false it it uses the version in beta. CELT 0.7 is mandatory, Speex and CELT 0.11 are optional (for now) and any client which supports only CELT 0.7 will force all other clients to use CELT 0.7.

Link to comment
Share on other sites

Thanks

 

iirc the sequence number in an audio packet is the number of the first frame in that packet (i.e. each frame is 10ms and it will increase by 2 for each packet with audio per packet set to 20ms) and the counter resets after 500 "silent frames" (i.e. if you're not talking for 5 seconds).

 

You mean that the counter is increasing by 1 for each 10ms even if no frames are sent ? I understand better what I saw during my debuggings. This also means you cannot know if a packet was lost...

I also suppose this counter is specific to a session and not shared...

 

about the codec negotiation: the client sends all supported versions in the authenticate message. The server will tell you which codec you should use using the preferAlpha field. The client always sends audio data encoded with the version depending on the value of preferAlpha, if it's true it will use the version in alpha, if false it it uses the version in beta. CELT 0.7 is mandatory, Speex and CELT 0.11 are optional (for now) and any client which supports only CELT 0.7 will force all other clients to use CELT 0.7.

 

OK, it's clear now


Thanks for the answers.

Link to comment
Share on other sites

  • Moderators
You mean that the counter is increasing by 1 for each 10ms even if no frames are sent ? I understand better what I saw during my debuggings. This also means you cannot know if a packet was lost...

I also suppose this counter is specific to a session and not shared...

The counter is per user and is increased for every audio frame sent to the server. It won't increase if you're not talking. A lost packet should result in missing numbers, i.e. someone with audio per packet 60ms should send packets with the sequence numbers 6, 12, 18, 24 etc. (audio per packet 20ms -> 2, 4, 6, 8 etc.) and there will be a gap if the server didn't receive one of those packets.

Link to comment
Share on other sites

  • 2 years later...

OK... It took some time to have something I could present publicly (documentation... always documentation), but I have a solution that works for months now, quite successfully.


The core is a mumble client library written in python (and cython for the interfacing with CELT and OPUS) that can be found here :

https://github.com/Robert904/pymumble


Lot of features (mainly around server management) are not implemented, but all the basics are there, and it should not be too difficult to extend it (at least that's the plan...)


On top of that core, I have a recording application here:

https://github.com/Robert904/mumblerecbot

that take care of putting together the streams from the different users amongst other functionalities...


At the same place, I have a small example application that connect on a server and play an audio file.


Of course, it has not the polish of a pypi module... especially on the installation side... you probably need some development skills...


I hope it can interest someone !

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

×
×
  • Create New...