In the Google Jingle jungle
I find the presentation of Libjingle on the Google coder page a typical example of misleading documentation. Anyone reading this presentation is bound to assume that Jingle is for peer-to-peer communication only. Many are also referring to Google’s proprietary implementation as Jingle audio, which is a second misnomer. How could we create credible communication protocol if we cannot agree on a common language to describe the protocol usage and implemented technologies. My visit to the Libjingle pages came the other day as I was trying to separate reality from fiction behind the recent Jingle support announcements made by Asterisk and FreeSwitch. In fact, they only relates to the support of GoogleTalk. And this is not Jingle, whatever Google may say on the subject.
Google Talk uses a proposed extension to XMPP known as Jingle which can traverse many types of NATs, to establish peer-to-peer connections. The Jingle protocol is based on Interactive Connectivity Establishment (ICE), which is very successful in bypassing NAT. The core idea behind ICE is that the only way to determine the best way to connect to another user is to try every way, and use whichever works best. In that vein, when a peer-to-peer session is initiated, each client creates a list of possible addresses for itself. Each potential address is called a "candidate." Then, each client creates connections using every permutation of local and remote candidates possible and transmits data using the highest-quality connection.
Let’s have a look at the Jingle framework as proposed by the JSF. To date the Jingle media stack look just like this:
It is comprised of the main session management framework, media descriptions such as audio and video, and of several transport descriptors. Three different transport negotiations JEPs are available within the Jingle framework. All of them use datagrams over UDP at the network level. Two of them rely on RTP as the transport protocol, and one is based on the Asterisk inter-exchange protocol.
Interpreting the description available on Google’s coder site, I am inclined to believe that what they call “Jingle audio” is in fact the combination of Jingle, Audio and the RTP-ICE transport. If this is the case, then the “Jingle support” for Asterisk and FreeSwitch is just a particular subset of the Jingle specification as highlighted bellow:
In reality, what has been implemented in the two IPBX is the support of GTalk, or more specifically the support of the Libjingle signaling, as it seems to be the library of choice for other XMPP clients in the making. I am certain every XMPP client developer will be eager to implement the specification as defined by the JSF. But they will also be forced to implement the GTalk distortion of the specification. And this will be a long status-quo, Google not being a philanthropy driven organization. They benefit from the hype created around Jingle and their use of an ‘open’ protocol such as XMPP. Will they spend the money to update all the GTalk clients to support the real Jingle as they claim remains to be seen. In the meantime, Google will benefit greatly from these open source PBX supporting their subset of the Jingle specification. This company has been great at leveraging open source for its own benefit. They will certainly not miss this opportunity of having two platforms allowing them to bridge between GTalk and SIP based VOIP networks, or to terminate GTalk calls on the PSTN…
But what of the rest of the world? I will try to define the base of a truly inter-operable Jingle implementation should be. Google explains the choice of ICE to negotiate and modify the RTP streams by issues related to NAT traversal. This is mentioned several time in their description. This is certainly plague for many VoIP clients, and identified as such in the recent H325 workshop. When asked what could be done about it, clever geeks came up with the clever idea of using ICE in GTalk. With a single clever move they left out all the VoIP clients using RTP on raw UDP for the media transport. When you think of it, this is just the vast majority of the SIP soft phones, and probably all the IP hard phones out there! Is this showing consideration for inter-operability? I doubt it. The enterprise market will obviously want to leverage the huge investments they made in SIP as a result of their listening to all the big telco vendors. A growing population on the Internet is using SIP based soft-phones as an alternative to the proprietary Skype. I would have thought one of the ways to show how superior XMPP is, would have been to enable voice communication with these SIP soft phones. I do not consider it such a distant dream to have a Jingle client holding a voice conversation with one of those SIP soft-phones, or IP hard phones. But that requires the real support of raw UDP in the Jingle libraries. It would mark the beginning of the kind of media inter-operability I mentioned in a previous post. Jingle to remain a player as a standard must extend the GTalk only model to include RTP/UDP transport. And as good protocol standards must define minimum mandatory requirements, I believe the base Jingle audio stack must look like this:
Implementing this base Jingle stack in clients and servers will greatly facilitate the emergence of XMPP as a strong VoIP player. Just look at the Libjingle media components, they are used by a number of public IM clients, and several IP phones. At least there is already compatibility at this level. Let’s make sure we do not miss the media transport compatibility. Doing so, Jingle will appear both protecting the investment through its support for legacy UDP/RTP transport, and extensible by its early adoption of ICE. On the server and IPBX side, I believe this extended support will make it easier for the community to quickly develop simple proxies or gateway to SIP, the other prominent signaling protocol.Technorati Tags: XMPP, Jingle, Jabber, VoIP, IPBX, SIP, Session signaling, Antecipate