Open Source Platform
for interconnected virtual worlds

Communications: IM, voice, video

From Rex community wiki

Contents

Introduction

CommunicationModule offers IM, voip, video and presence services. Actual protocol specific implementation is done using existing libraries. The module provides interfaces to these services and hides actual implementation and possible threading.

The user interfaces for communication features are implemented separately. These user interface modules use the services of UserInterface and CommunicationModule modules.

Communication protocols

There are several commonly used IM protocols and plenty of libraries which implement those [3] but the requirements reduces the available choices.

XMPP/Jingle

Extensible Messaging and Presence Protocol (XMPP) also known as Jabber protocol was originally intended to be a messaging and presence protocol. Features like VoIP and file transfer have been added later. For basics of XMPP see Appendix A. Jingle protocol adds streaming and signaling features, enabling peer to peer streaming of video, VoIP and other media. [1][2][5]


  • Maturity, being used since 1998
  • Open standards
  • Flexibility, custom functionality can be added on top of XMPP, XMPP gateways can be made for connecting to other messaging protocols
  • Security, XMPP offers strong means against spamming. SASL and TLS are included in the core of XMPP
  • XML based and easily extensible
  • Mainly used by IM application today.
  • Presence data traffic is heavy
  • Currently no binary data versions exist. XML and base64 coding creates overhead to transmission


Libraries

  • Telepathy framework
  • Loudmouth
  • PyXMPP


Servers:

  • XMPP clients also need server side software, and there's lot of open source and proprietary software to choose from
  • Using existing services is also possible

SIP/SIMPLE/MSRP

Session Initialization Protocol (SIP) is, as its name implies, used to initiate sessions. The session initialization itself is done in client-server manner, but actual data transmission in a session is done peer to peer. The origins of SIP lie in setting up a protocol for IP-based call and signaling communications, a protocol for "IP-phones". [4]

Simple (Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions) is an extension to the SIP protocol and provides presence functionality, short text messages and signaling for real time messaging.


  • Performance, extensibility and simplicity
  • MSRP protocol is a fairly new protocol and it has only one known open source implementation
  • There's no large community support for MSRP
  • SIP approach has its origins in Telecom applications and is intended to address problems in Telecom applications instead of web applications


Libraries:

  • Pjsip (C library with python bindings)
  • Simplesipclient (python library build on top of C library of pjsip)


Servers:

SIP clients need also SIP relay server software like Freeswitch and MSRP proxy server to deliver messages to clients over NAT.

OpenSimulator

OpenSimulator server currently supports text chat and presence status. These services are world specific and the users can only communicate with other users from the same world. The transfer protocol used is OpenSimulator/Secondlife specific.

  • No overhead in message transfer
  • Lack of voice communication support (*)
  • Text messages are not working properly. (delays in transmission and messages can propagated)
  • Protocol is SL specifig. Support for other protocols has to be made on server side.
  • Communication is possible between users on the same grid (*)

(*) with Modular realXtend this is possible.


RTP

Real-time Transfer Protocol (RTP) is used for media streaming, for example streaming real time video or audio streams. Specification xep-0167 [13] specifies how the RTP protocol is used with Jingle. Jingle sessions can contain more than one RTP stream at one time. For example a video call with another user would combine video and audio streams in a synchronized manner. RTP protocol is useful in establishing group sessions, because it can combine multiple streams to one stream. That way group session can be establised in the way that each individual participating group session receives the same copy of incoming stream that is the sum of everyone's outgoing streams.

Libraries

Telepathy framework

  • Realtime communications framework from Collabora
  • Instant messaging protocols as plugins
  • Lots of supported messaging protocols like SIP and XMPP
  • Streaming support through GStreamer
  • Supports p2p application specific communication
  • Abstract API for all messaging protocols
  • Multiplatform, currently used by eg.
    • Empathy (IM application for Linux)
    • Spicebird (collaboration tool for Windows)
    • Nokia tablet devices
  • Written in c-language
  • Bindings for(QT/C++, python, C#)
  • Active development community
  • Developed for Linux environment (usage on Windows environment requires quite a bit work initially)
  • A relatively young project, new features and bug fixes are common

Chosen technologies

We use Telepathy framework for implementation jabber IM and voip/video streaming. Its component based and third party implementations of different IM protocols already exist and more might be written in the near future. It also includes components for audio and video streaming.

Telepathy supports both SIP and XMPP protocols as well many others. XMPP seems to fit better for IM and presence functionality whereas SIP is more about VoIP like solutions. However our implementation doesn't have to be fixed to use just certain IM protocol, because Telepathy supports them as plugins.

It is rather new one and it is developing rapidly. It is still a young product and changes and fixes are common. The biggest problem is the lack of proper support for Windows platforms. There is no existing build files for Windows so those have to be made by ourselves.

If Telepathy turns out to be not mature enough for Windows development at the moment, we can still implement IM functionality with eg. loudmouth library and test the features and develope IM user interface with it.

Design

The CommunicationModule is userinterface agnostic and just implements the communication functionality using lower level comminication spesific libraries. The module runs in its own thread and interacts with the framework by events.

The IM user interface is implemented in CommunicationUIModule and it's tied to main user interface thread.

Communication between CommunicationModule and the user interface implementations

Use case - client connect to IM server:

  • User gives username and password in the viewer login screen.
  • Framework uses these:
communicationModule.login(username, password);
  • CommunicationModule receive accept from IM server and raise event
eventManager.raiseEvent(IMCONNECTION_LOGGED_IN);
  • UserInterface indicates to user of current status


Use case - client receives incoming IM session request

  • ConnectionModule receives request from the IM server
  • ConnectionModule raises event
eventManager.raiseEvent(INCOMING_CHAT_SESSION_REQUEST);
  • ConnectionUIModule receives the event and indicates user of request
  • User choose to accept the reqeust and
  • ConnectionUIModule call
chatSession.accept()
  • CommunicationModule sends accpet message to IM server
  • The chat session is now established


Use case - User sends a text message during chat session

  • User types the message and presses enter
  • CommunicationUIModule call:
chatSession.sendMessage(message);
  • CommunicationModule sends the message to an IM server


Use case - a friend of the user comes online

  • Friend logged in to his IM server
  • The server sends notification to all presence listeners
  • CommunicationModule receive the presence state update
  • CommunicationModule raises event:
eventManager.raiseEvent(PRESENCE_STATE_UPDATE, presenceState);
  • CommunicationUIModule receives the event and indicates user about the change


CommunicationModule design

The image below describes extensible structure communications component model. The exact way of communicating, used protocols and libraries (Telepathy, XMPP, Loudmouth), is hidden and can't be seen by higher level user interfaces and applications using it. However, connection metadata can be dug out and advanced users can find configuration settings that define the used libraries. What is common to different types of communication protocols is sessions and connections and sessions usually have Identifiers.

Image:RexCommunicationsSharedDataModel.jpg

Image [Comms SD]: Class diagram presenting data structures of shared belonging to communications component.



Citations

[1] http://www.ietf.org/rfc/rfc3920.txt
[2] http://www.ietf.org/rfc/rfc3921.txt
[3] http://en.wikipedia.org/wiki/Comparison_of_instant_messaging_protocols
[4] http://en.wikipedia.org/wiki/Session_Initiation_Protocol
[5] http://xmpp.org/extensions/xep-0166.html
[6] http://xmpp.org/extensions/attic/xep-0215-0.1.html
[7] http://code.google.com/apis/talk/jep_extensions/jingleinfo.html
[8] http://www.ietf.org/rfc/rfc3489.txt
[9] http://xmpp.org/extensions/xep-0220.html
[10] http://xmpp.org/extensions/xep-0016.html
[11] http://xmpp.org/extensions/xep-0158.html
[12] http://code.google.com/apis/talk/libjingle/important_concepts.html
[13] http://xmpp.org/extensions/xep-0167.html

Appendix A - How XMPP works

In XMPP communication data is transferred through XML streams. XML streams are like open XML documents going from client to server and from server to client over a TCP connection. XML streams start with opening stream XML tag and end with ending stream XML tag. Simple XMPP session could look like something in the listing below (Listing [XML streams]).

  C: <?xml version='1.0'?>
     <stream:stream
         to='example.com'
         xmlns='jabber:client'
         xmlns:stream='http://etherx.jabber.org/streams'
         version='1.0'>
  S: <?xml version='1.0'?>
     <stream:stream
         from='example.com'
         id='someid'
         xmlns='jabber:client'
         xmlns:stream='http://etherx.jabber.org/streams'
         version='1.0'>
  ...  encryption, authentication, and resource binding ...
  C:   <message from='humptey@example.com'
                to='dumptey@example.net'
                xml:lang='en'>
  C:     <body>Doo doo daa</body>
  C:   </message>
  S:   <message from='dumptey@example.net'
                to='humptey@example.com'
                xml:lang='en'>
  S:     <body>That's all you've gotta say to me?</body>
  S:   </message>
  C: </stream:stream>
  S: </stream:stream>

Listing [XML streams]: Listing showing basic XML stream data flows between client and server. C: marks XML tags send by client and S: marks XML tags send by Server. Timeline goes from up to down and </stream:stream> ends stream.

More detailed descriptions of encryption, authentication and resource binding are explained in rfc3920. The resource binding part binds the streams used between client and server to a specific JID address. XMPP network uses two types of XML streams: the one between client and server, and the one between server and server. In the above example example.com and example.net servers would have established streams in order to communicate messages between humptey and dumptey. So the core of XMPP protocol is not P2P protocol, and that simplifies the architecture.

Jingle is an extension to XMPP that allows XMPP to create P2P media sessions. Jingle protocol contains XMPP Information Query (IQ XMPP stream elements) requests that enable client to ask list of STUN servers (STUN = Simple Traversal of User Datagram Protocol(UDP) Through Network Address Translators (NATs)). Establishing session in NATed network between 2 XMPP clients is described in image below (Image [XMPPandP2P]).

When client A wants to establish P2P media session with client B, it first queries available STUN servers. With STUN server client then lists possible connections for the media session counterpart to use: Local addresses, Global address (external NAT addresses), relay server addresses (see Image [JingleAddresses]).


After that client A starts negotiations with client B (along XMPP network), both clients offer possible connection points to use for media streaming. When a suitable connection path is found the connection is established. In the image below, media session is established using a relay server, however if turns out that it is possible to communicate directly between clients and a pure P2P connection is established. [6,7,8,12]


Image:XMPPandP2Psessions.jpg
Image [XMPPandP2P]: Sequence diagram describing how XMPP creates P2P media sessions. (1.)Client asks available STRUN relay servers from its XMPP server using Jingle protocol (2.)Client defines addresses it can use for P2P media session with other XMPP client. (3.) Clients negotiate what media path to use. (4.) media session is established (5.) media session ended by client A

Image:candidates.gif
Image [JingleAddresses]: Determining possible addresses with Jingle protocol (from [12])


One benefit of using XMPP protocol is that spamming in XMPP network is difficult for several reasons, for example:

  • Clients establish connections through servers, so direct spamming is not possible
  • XMPP servers check each others identities with DNS-based "dialback" protocol, so that spammer cannot set up a XMPP server and then steal another server's identity.
  • In XMPP network chatting is usually connected to a buddy list, if someone is not your buddy he/she cannot send you messages. Furthermore you can define your settings in a way that doesn't allow receiving anonymous friend list, in case of virtual worlds you could just arrange settings so that you only add buddies when they are in the same place as you.
  • XMPP server can setup a limit for amount of messages single client can send.
  • You can always delete annoying buddies from your buddy list
  • XMPP even has ways of discovering whether the sender is a human user or a robot

[1,2,9,10,11]

Appendix B - How SIP works

In SIP protocol media sessions are basically negotiated with Invite request. In the image below (Image [SipFlow]) host A UA client sends Invite to host B acting as UA Server. The server between sends TRYING message, indicating it has received Invite from host A and tries to reach host B, which is the target of the Invite message. Ringing message is send from host B as indication that destination is reached. When user on the host B side accepts media session OK is send back to host A. As a result a media session is established.

Image:SipCallFlow.jpg
Image [SipFlow]: Sequence diagram describing how sip session is initiated.


The type of the media session is described by Session Description Protocol (SDP). SDP headers reside inside Invite request and OK response. SDP is used to negotiate protocol and media types between endpoints of a P2P session. In the simplest case SDP could carry endpoint IP addresses of participating counterparts and the counterparts could establish media sessions using those addresses. However this is not the case with networks that have Network Address Translations (NAT). This is where Message Session Relay Protocol (MSRP) comes along. When using MSRP protocol communicating endpoints establish MSRP endpoints to some MSRP relay server that is outside NAT and use those endpoints to communicate over NAT. Used MSRP addresses are communicated to other counterpart with SDP headers inside SIP requests and responses. The image below (Image [MsrpFlow]) visualises the MSRP session media path.

Image:MSRPimage.JPG
Image [MsrpFlow]: MSRP relay routing SIP media session through NAT.

Appendix C - How Telepathy framework works

In Telepathy framework the userinterface and actual protocol implementations are separated into different modules. These modules communicate through D-bus. The protocol implementations are called connection managers and those provide connections for different protocols. Multiple application can use same connection manager instance at same time.

The communication is separated to connections and channels. Connection is a session between client and server. Connection provides different type of channels. Channels represents actual data flow between communicating participants. Eg. Text channel is used to send and receive text messages between users and RoomList channel is used to retrieve a list of currently users on some chat room.

Channel types:

  • Text: for IM
  • StreamedMedia: for voip and video
  • RoomList: for chat rooms
  • Tube: for p2p application specifig communication

Telepathy architecture