Tuesday, May 16, 2006

Jingling in conversation space

Being a good listener, I am always surprised when starting a conversation to realize how different meanings’ interpretations often lead to a lengthy preliminary exchanges before reaching a common context understanding. As explained by Seth Ladd,

I believe meaning is relative, and that the reader, with their unique experiences and perspectives, will always interpret things in their own way. The readers, or consumers, will always have more power with regards to information interpretation.

Communication is a paradox of a subject, and what he says about readers attitude can easily be extended to mailing lists’ participants. I particularly like Seth’s final statement

It’s very possible that conversations between semantic web agents will be required to come to a sort of shared understanding.

I don’t know if participating to a mailing list thread makes us “semantic web agents”, but we definitively need this “shared understanding”. This is why I am adding to my previous post to further build that shared understanding around Jingle. I was prompted to do so remembering the difficulty I had on the JSF mailing list to convey the difference between Jingle as a framework and its application in setting up multimedia sessions. In fact, the confusion came from this word: “session”. Every time I used this word, I was referring to a communication taking place between participants in a conversation space, whereas my interlocutor was referring to the underlying RTP sessions…

I believe Jingle has all the qualities to become a widely adopted way to manage sessions in conversation spaces. We must be careful not to let Jingle be positioned as “yet another signaling” protocol, or it will only be perceived as a SIP imitation. This would be the worth positioning. Jingle is larger than just P2P VoIP, in the same way presence is larger than just instant messaging. Referring to my previous definition of “conversation space”, from a protocol stand point Jingle should allow the complete management of conversations in a given space. From the simple one-to-one phone call, to the more complex collaborative meeting mixing MUC, document sharing and multimedia streams.

This is unfortunately not the case at present time, as the Jingle glossary is both inconsistent and very limiting. For example, sessions are defined in JEP-166 as “a negotiated transport method and media description format connecting two entities”. The use of singular points to a single media/transport relationship. It is obvious using this definition, that we must use several Jingle sessions to become participant of in video conference. At least one Jingle session for audio, and one for video. Even SIP is doing better than this. A Jingle negotiation must allow negotiating several media descriptions at the same time. I believe we could even negotiate several media/transport relationships in a single Jingle negotiation. As an illustration, imagine a multimedia mobile phone becoming part of the previous video conference while roaming. The user has joined the conference while sitting in its favorite Internet cafe, with an adequate network connectivity. The conversation space is provided by a Jingle a multimedia conference server, and the negotiation is taking place between the mobile client and the server. Remembering its next appointment, the user decides to drive to a meeting outside town. Being a careful driver, it will decide to mute its microphone, hold the video streams, and only leave the incoming audio on, so it can listen while driving. The user also updates its presence state and PEP profile to indicate it is driving and cannot be disturbed. At some point, during the conference, documents are shown. The user decides to stop and have a look. It does it by re-enabling the video stream. It may decide to intervene in the conversation after un-muting its microphone. I believe the user experience will be greatly enhanced if all these media are part of a single conversation space, presented and managed as such. From an implementation perspective, it is easy to understand how control operations are more efficient when all media negotiations have already been carried out and referenced by a single session ID. Adoption requires ease of use and low client complexity. Its better to maintain states on a server and only pass a reference to a client. It is easier to group media streams together if the have a common root session identifier.

From a protocol definition stand point, I believe Jingle purpose must be widened beyond “peer-to-peer sessions” to manage conversation spaces sessions. With this definition, we open the perception of Jingle to multi-peer and client server sessions. I also propose to rewrite the definition of session as “the relationship linking together a dynamic set of multimedia senders and receivers and the data streams flowing from senders to receivers”. A wider definition in the protocol will result in allowing real multimedia (i.e. several media) management capability for a single communication space session.

Technorati Tags: , , , , , , ,

Labels: ,