Tuesday, October 31, 2006

Multi-homing Jingle

I described in an earlier post how some edge cases of NAT traversal for media steams could be solved using a media relay proxy. This kind of proxy is widely used amongst well know peer-to-peer VoIP implementations, such as Skype or SIP based devices. SIP based application in most cases use media relay proxies co-located with SIP border controllers (for service providers) or SIP proxies. Skype uses its clients that are not behind NATs to proxy data for clients behind NATs.
While discussing this subject on the JSF mailing lists, an important question have drawn my attention: How would Jingle allow a media relay proxy to be reached when it is not co-located with the client's home server?

The current Jingle specification has been designed with direct peer-to-peer communication in mind, and does not provide built-in support for intermediary relays on either the media path or the signaling path. The Jingle syntax provides a way to redirect a session to a different remote party if the original target is unavailable for the appropriate media communication. This feature is handy when several devices with different capabilities are online for a given user JID, or when the user has set his client to re-route calls to a voice/video mailbox.

However, nothing in Jingle allows for the use of a relay or an intermediary media proxy of any sort. An intermediary proxy has several application use cases. One is the media relay proxy to perform NAT traversal as described in my previous post. But any simple IPBX would equally benefit from this extension. The current flavor of Jingle does not allow referring to an IPBX as a separate server, which is the most common architecture found either at service providers or in the enterprise. Anyone would understand the interest of integrating with IPBX. The "numbering" plan for the IPBX could be defined purely in terms of URIs rather than extension numbers for example. And obviously all the expected exchange functions would be provided by a specialized application, including transparent bridging with other signaling and media protocols.

To that aim the Jingle specification can easily be extended by adding

  • A discovery mechanism for intermediary media proxies, which will as usual leverage XEP-0030 Service Discovery. It boils down to defining the proper categories and features in the XMPP registrar.
  • A mechanism in the protocol to allow defining a remote party URI independently from the destination address of the wrapping stanza.

Contrary to a redirection, which is an indication given by the remote party, relaying has to be requested by the initiating party. Once again we can simply leverage existing XMPP extensions and implemented the forwarding by using XEP-0033 Extended Stanza Addressing with Jingle.
In the now traditional context of XMPP examples, assuming Romeo's client has discovered that Juliet is only reachable through a relay media proxy, it would issue a request similar to this:

<iq to='relay.capulet.com' from='romeo@montague.net/orchard' id='jingle1' type='set'>
   <addresses xmlns='http://jabber.org/protocol/address'>
       <address type='to' jid=' juliet@capulet.com/balcony '/>
   </addresses>

   <jingle xmlns='http://jabber.org/protocol/jingle'
          action='session-initiate'
          initiator='romeo@montague.net/orchard'
          sid='a73sjjvkla37jfea'>
    <content name='this-is-the-audio-content'>
      <description xmlns='http://jabber.org/protocol/jingle/description/audio'>
        ...
      </description>
      <transport xmlns='http://jabber.org/protocol/jingle/transport/ice'>
        ...
      </transport>
      <transport xmlns='http://jabber.org/protocol/jingle/transport/raw-udp'>
        ...
      </transport>
    </content>
        ...
  </jingle>
</iq>

Just including an extended address in the Jingle stanza opens up a host of possible new applications without modifying the actual Jingle negotiation. It exemplify once gain the flexibility of XMPP.

Technorati Tags: , , , , , , ,

Labels: , ,

Sunday, October 29, 2006

Jingle media relaying

In an ideal Internet, each device would have a routable IP address all devices would be able to communicate end to end without any intermediaries except routers. In reality, devices connected on the Internet are using a NAT (Network Address Translation) function present in the border router. Using NAT, it becomes possible to connect multiple devices to the Internet by only using one public IP address. On the other hand, it becomes impossible to initiate connections from the Internet. Traversing NAT in both directions becomes an issue when doing point-to-point communications. This is particularly true when using RTP for multimedia communications.

A device behind NAT does not know much about how it will be seen from the Internet, it only knows its own IP address and the ports where the application runs. When communication with the Internet is established, the NAT function maps the IP:port combination of the device on the private NAT interface to a temporary public IP:port combination on the public interface connected to the Internet. Furthermore, the RTP transport protocol usually uses a random port. This means that users cannot just open a port on their NAT device for RTP.

Media consists of one or multiple streams which are negotiated in an associated signaling, such as SIP or Jingle. The signaling protocol allows devices to negotiate a set of common media. The negotiation is performed conveying information about the media streams, such as address where the media will be received, codec types, bandwidth, etc... The problem is that the signaling conveys information about the private IP of the device when it is behind NAT. There are two ways to solve this issue.

One is using a signaling protocol able to negotiate dynamically a communication path for the media even after the initial session has been setup. ICE (Interactive Connection Establishment) is such a protocol, which allows devices to probe for multiple paths of communication by trying different ports and STUN techniques. With ICE support devices have a good chance to handle point-to-point communication without any intermediary media relay. But ICE is awaiting full specification, and therefore only experimental support is provided. In addition, depending on the type of NAT, the communication might not be established even when using ICE. In this case a media relay proxy with a public Internet address must be used.

To transparently establish a multimedia session through a media relay proxy, it is best to use a service that associate the media proxy with a signaling proxy. The media relay proxy does the actual RTP traffic forwarding between the parties involved in the conversation. Upon request from the signaling proxy, it allocates sockets for each media stream of a session. The signaling proxy will use the media relay proxy's IP address and socket's port to replace the original values in the signaling payload. For SIP this would be achieved by modifying the SDP payload, for Jingle this would require changing the transport candidate. After this is done, the parties involved in the conversation will contact the media relay proxy thinking they contact the other party.

This approach is needed because the media relay proxy would then be able to determine the addresses from where the media streams originate. This information is unknown when the signaling takes place, and can only be determined when the RTP streams actually start.
After the media relay proxy has allocated the sockets for each stream, it will listen for an incoming packet from each of the two parties. Once these are received, the media relay proxy is able to know where the packets should be forwarded and can start relaying them between the parties. However, if one party has a public Internet address, the media relay proxy is able to send packets to it before it receives a packet from it, since the party's IP:port is already known. Because of this, it becomes possible to chain media relay proxies between them.

It is interesting to note that the media relay proxy solution is independent from the actual signaling protocol. Several solutions already exist for SIP, with the added complexity that SIP itself will require a NAT traversal solution when transported over UDP. I will describe how a media relay proxy can be implemented in XMPP using the Jingle signaling. As explained above, the media relay proxy must be on the public Internet. The easiest approach would in my opinion consist in implementing the media relay proxy as a component of an XMPP server, and install it into a DMZ. Doing so has the advantage of leveraging the trust relationship between the Jingle client and the XMPP server and extending it to the media relay proxy.

The XMPP server would have to be modified to route the incoming Jingle traffic through the component, which will in turn intercept and modify the Jingle transport negotiation payloads:

  • For raw UDP transport, the component will replace the original transport candidate using the IP:port of a newly created socket.
  • For an ICE transport, the component will create a new candidate using the IP:port of a newly created socket, and discard any other candidate directly by either parties. As the proxy is always reachable, this connection will always be established.

This example demonstrate that existing multimedia NAT traversal techniques can easily be adapted for Jingle, with the added advantage that the Jingle signaling itself is NAT and firewall friendly, which is not the case of SIP. This use case can also be extended by support both SIP and Jingle on the same media relay proxy component to provide seamless media connectivity between SIP and Jingle clients.

Technorati Tags: , , , , , , ,

Labels: , ,

Thursday, October 26, 2006

Feedback enabled presence

Alec Sanders deplores the rather crude state of today's presence systems where only a limited availability awareness is supported. I would add that, in the context of human communications, the mono-directional propagation of presence states without feedback does not help improve an already minimalist replacement for face-to-face communication.

In every day's life, people have an expectation that information about them is ephemeral and subject to the errands of human memory. This ambiguity gives people the ability to influence other's memory and manage the way in which they are perceived. With recording and analyzing of presence information, it becomes more difficult for individuals to adjust how they are “read”. Capture and propagation of presence states complicates one's ability to provide only the information reflecting positively on one's image. A presence system may be disclosing undesired information. Moreover, because distributed presence systems do not allow people to watch how others interpret this information, it decreases one's ability to detect variations in how others accept the image one is attempting to project.

I believe successful presence systems will be those dealing with this kind of “impression management” concerns by providing more control on people's self image projection and interpretation.
These systems will have to carefully balance the recipient's desire for control with the caller's desire to understand the recipient's context, while maintaining trust and usefulness. Today's approach of putting control of accessibility exclusively in the hands of the recipient is not consistent with face-to-face communication. In real life, the caller and the recipient share the context, and can both feel when starting or holding a conversation is not appropriate.

It is interesting to note that experiments conducted recently on awareness systems did not reveal a decrease in the frequency of interruptions. When callers take the recipient's context into account, it has been observed that they adopt a more polite approach to interruption. For example, a caller might say, "I see you're busy, but I have a quick question," or "can you call me when you're free?" At this point, a recipient has an opportunity to provide feedback on how appropriate the interruption is.
Real-time communication systems do not support the many subtle cues people use in face-to-face conversation to convey such feedback. Callers are unaware of the recipient's context and cannot be held accountable for any disruption this may cause. Providing presence information in return will allow some amount of accountability.

Overall, I believe a deeper understanding of how presence supports and breaks social feedback mechanisms would greatly benefit designers and architects.

Technorati Tags: ,

Real-time utterances

Interestingly, Janne asks if the web is not slowly but surely discovering the other side: real-time communication. I find the question interesting because it is a confirmation sign of a changing mind set. The web as it has been defined and implemented is about references and location of resources. Obviously, the recent development of feed technologies over HTTP has allowed to venture at the fringe of real-time communication. But this is, in my opinion, more a technology 'long tail' than long term engineering and efficient use of resources.

If the web is to become more than just a clumsy avatar of real life, it must encompass our inherent thinking and communicating capabilities. On the web, communication and references are two complementary sides of the way we express ourselves. It is the presence of different contexts that make information, events or transition in information states meaningful.

As communication and references differ and serve different but equally important purposes they need to be supported by appropriate tool sets. HTTP is the undisputed champion on the reference side. On the real-time communication and presence side, I believe XMPP is gaining enough traction to become the next protocol that counts: open, free (i.e not supported by any commercial consortium) and easy to implement and extend.

Technorati Tags: , ,

Labels:

Wednesday, October 25, 2006

Seeing is not believing...

Announcements in the video-conference space have been flourishing lately. I find them interesting as they attempt to provide answers to the crude rendering of virtual conversation spaces when compared to the rich person-to-person interactions of real life.  A large telecom vendor has dubbed its system "Telepresence" and is pretending that

You can think of those technologies as precursors to true telepresence that replicates the experience of "being there".

Well, the advent of large flat screens technologies have certainly made it easier to cover an entire wall with screen panels, but the result remains far from the participants individual holographic representations in the Jedis meetings of "Star Wars"… In my opinion, these video-conference systems fall short in the way they provide view points, usually through a limited number of video cameras. And this latest offering of a well known software vendor claiming to provide

a 360-degree, panoramic video of side-by-side images of everyone who taking part in the conference.

will not make me change my opinion...

Real life shared conversation spaces allow people to maintain instant knowledge about others' interaction with the space. In a virtual world, the concept of conversation space awareness is key for systems wanting to approach the fluid interaction of face-to-face communication.
In physical conversation spaces, participants often shift their attention back and forth between individual and shared activity. In these moments, the space gathers lightweight information such as quick glances at another participant's or its personal area. This information participates in maintaining a sense of awareness of where other persons are and what they are doing. For example, in a work environment, this space awareness would help coordinate tasks and resources. People can use the space awareness to anticipate others' actions, help them with their tasks, and interpret references to objects in context.

Conversation space awareness comes naturally in a face-to-face communication, but it is far more difficult to render in real-time communication systems. In video-conference, only a fraction of the space may be seen, and each participant often does not see the same part as others. More generally, real-time communication systems reduce the richness of communication, and their user interface hides many actions that are visible in a real-life space. Moreover, the perceptual and physical abilities we use to maintain this awareness, such as glances, are replaced with mechanisms that are both slow and clumsy.

As I explained in a previous post, awareness is an essential component of presence. Designers of presence based systems will face two problems to integrate conversation space awareness. First, they will need to know what information should be captured about a person's interaction with the conversation space. Second, they will have to decide how this information should be presented to other participants.

The constituents of conversation space awareness fall into two groups. The first group concerns what is happening to participants:

  • Amount of activity (How active are the participants?)
  • Changes in progress (What changes are participants making?)
  • Expectations (What do participants need me to do next?)
  • Nature of actions (What are the participants doing?)

The second group deals with where it is happening in the space:

  • Focus (Where are the participants?)
  • Influence (Where can participants make changes?)
  • Objects in use (What objects are participants using?)
  • View extents (What can participants see?)

Setting explicit status in a presence system provides a first level of awareness. A second level of awareness can be inferred from events observed inside the conversation space, such as the visible or audible signs of interaction with the space or its artifacts, or the participants' activity behavior. But capturing intentionally public utterances, expressions, or gestures that are not explicitly directed at other participants may prove difficult.

In the end, even if a system integrates information from a variety of sensors and other sources, presence indicators still have a long way to go before they reflect true human nature. The “flat” implementations of today's user interfaces plays certainly in disfavor of a realistic rendering. And then there is the user behavior itself that comes into play. Just because my office's door is open and I happen to be looking outside, don't take it for granted you can come in and interrupt me.

Technorati Tags: , ,

Labels: ,

Monday, October 23, 2006

The three legs of presence

Several researches have largely explored the fields of "social presence" and "awareness". However, in my opinion, the emergence of "social networking" and "virtual community" requires adding the concept of “connectedness” to the features mix of any effective real-time communication system. I believe important to review these concepts and see how they inter-relate. These relations will have to be considered carefully when building what is commonly called "presence" into these communication systems.

The concept of "awareness" has been used in many ways. Fifteen yeas ago, academics defined it as

an understanding of the activities of others, which provides a context for your own activity.

Awareness is usually classified into four types:

  • availability awareness, which relates to the availability of people and objects.
  • contextual awareness, which includes physical, social and mental context.
  • group awareness, which promotes the feeling of belonging to a group.
  • workplace awareness, which is knowledge of tasks within the virtual environment.

In these definitions, awareness is used in the sense of feeling what is believed to be an external perception, whether synchronous or near-asynchronous. It encompasses both a perception of the users of a system, and a feature of a system that facilitates that perception.

The concept of “social presence” is more ancient. Thirty years ago Short et al. in “The Social Psychology of Telecommunications” defined it as:

the degree of perception of the other person in a mediated communication and the consequent perception of their interpersonal interaction.

More recently, in “Criteria and Scope conditions for a Theory and Measure of Social Presence”, Biocca et al. depicted social presence as pertaining to the user, but closely related to the interaction and the medium:

it is a temporary judgment of the nature of interaction with the other, as limited or augmented by the medium.

Social presence theory studies efficiency and satisfaction in the use of different communication media. Short et al consider social presence a subjective dimension of a medium in its capacity to transmit information about facial expression, direction of looking, posture and non-verbal cues as they are perceived to be present in the medium. This dimension affect the level to which a medium is perceived as sociable, warm, sensitive, personal or intimate when it is used to interact with other people. Social presence varies between different media, it affects the nature of the interaction and influence the choice of a medium by an individual wishing to communicate.

The concept of “connectedness” is one of the basic principles which underlie social behavior. In psychology, the fundamental needs for belonging and connectedness are described as powerful drivers to promote social relationships.

Virtual multi-media communication can create a sense of connectedness or “feeling of being in touch”. In awareness systems this may be more important than the content of the communication. Even without direct information exchange, people want to maintain connection with others. Look how instant messaging users monitor the availability of their buddies, and exchange greetings without any need for a real information exchange. Similarly, witness how mobile phone users exchange SMS and share a common, although asynchronous, experience.
There are also situations where connectedness does not imply direct awareness of another person, but rather of an object. Receiving a post card may create a feeling of connectedness although there is no direct awareness of the other person.

Real-time communication systems aim at reducing the spatial constraint in peoples' conversations. Presence has the capacity to convey additional context attributes pertaining to a conversation. If the experience of connectedness is a basic human need, it may help design communication systems enabling connectedness without imitating face-to-face communication, and facilitate "immediacy" and "intimacy" while minimizing intrusiveness.

Technorati Tags: ,

Labels:

Tuesday, October 17, 2006

Walled gardens redux

In a parallel post I described the landscape in which I consider that walled gardens, such as those still implemented by the incumbent consumer IM, are a flawed strategy.

In short, for any community based service, switching costs remain a critical success factor for building market share and defending against competition. However, as people themselves are increasingly becoming the sources of content and the owners of distribution, we are increasingly seeing an inversion of control, where service providers benefit from customers provided competitive advantages. For a service provider, basing a strategy on directly increasing switching costs by using walled garden becomes antagonist with the users' aspirations. Instead, as the control shifts towards community, a new environment appears where people themselves willingly create their own high switching costs.

When translating to real time communication technologies, it become evident that open protocols offering transparent and unbiquitous communications is the only acceptable way. As a consequence, only widely accepted and implemented open standards will remain. The same shift will apply to communication as it did to content generation and distribution. The end user themselves will want to create their own communication services, and will not accept anything but self imposed constraints. As a result of this new dynamic, they will only choose the easiest to implement and widest ranging protocols. In the end, the most successful services will be those who manage to hide the inherent complexity of any multimedia communication system, while providing unlimited interoperability.

Technorati Tags: , ,

Labels: ,

Sunday, October 15, 2006

Barrier to exit

A noteworthy "collateral" effect of the Internet is its ability to turn entrenched business positions upside down. As a consequence, thinking strategic approaches to business in a contrarian fashion and experimenting with all sorts of bizarre combinations of services has become the norm for entrepreneurs wanting to grab a piece of the web 2.0 pie. In hope that out of the "mashups" chaos, a few viable models will eventually emerge.

Amongst these experiments, the most in favor are those dealing with "social" [insert your own hype word here] services, where the intended audience is some sort of "community" of interests. The underlying concept is to capture the natural tendency of human beings to gather and talk. Some people will choose to go to "Ye Olde Cheshire Cheese" and do this with a glass of ale; other will login to MySpace… Ultimately though, the most lively and sticky places are those where people communicate whatever the subject, even without subject.

It is important to keep in mind that communications alone does not necessarily act as the primary draw for gathering. Back in the days of consumer online services, email was not effective at acquiring new users. Mainly because most people had no idea what email was and how useful it could be. New users where acquired by showing unique benefits, such as unique content. Yet once they discovered the benefits of email, the mail box became the common ubiquitous reason to join the community. As a result, it is critical to understand that what attracts people initially is often not what keeps people on your network interested in the long run. After all is said, people still hang out in their favorite pub or [insert your own favorite place here] to have conversations.

At the same time, a community generates its own content, and uses its own style in expressing widely its interests. When transferred to the Internet, in a context where people themselves are increasingly becoming the sources of content and the owners of distribution, it becomes clear that any strategy based on increasing switching costs becomes antagonist with the users' aspirations. In effect, we are increasingly seeing an inversion of control, where service providers benefit from customers provided competitive advantages. Switching costs remain a critical success factor for building market share and defending against competition. However, who creates and controls it is fundamentally different from previous services.

Walled gardens are the antithesis of this new dynamic. As the control shifts towards community, a new environment appears where, instead of a service provider locking its customers into a walled garden, people themselves willingly create their own high switching costs. For instance, for an auction service, the switching cost is not the relationship with the service itself, but the reputation and trust a user has spent time building with other community members. For bloggers, the switching cost is not the time spent constructing the blog, but rather the social network of "friends" accumulated over time.

Ultimately, and it may sound like a paradox, the most successful services will offer entire freedom to their user to churn while giving them unlimited communication capabilities. The most successful communities will be those having understood that self imposed constraints at switching are the only ones willingly accepted by members.

Technorati Tags: , ,

Labels:

Friday, October 13, 2006

Jingling the MUC

I discussed previously how the XEP-0045 specification already contains all the management features to handle Multi User Communication (MUC), beyond simple text chat rooms. At the same time, I was interested to see that discussions are taking place at the Jabber Software Foundation about ways to integrate screen sharing and remote control into XMPP, as well as performing shared XML editing or SVG based white boarding.

MUC provides a very appropriate framework where these different applications can be aggregated. I have already explained how MUC exposes the required components of a generic conversation spaces management. One of the most important characteristics of conversation spaces is the possibility to host public and private conversation in the same space. In essence, conversation spaces simultaneously allow:

  • point-to-point exchanges between participants,
  • client-server exchanges, the conversation space being the server.

I believe people have finally come to realize the utopia of considering XMPP as an universal multi-media transport, although there are still bursts of "can I do VoIP over XMPP" here and there… When it comes to multi-media, Jingle is obviously the appropriate answer in the XMPP world.

I nonetheless feel there are still some misunderstandings with Jingle’s concept of separating sessions' establishment operations (signalling) from communication (transport) when a conversation involves more than two parties. In fact this is rather simple. In a Jingle enabled MUC, in addition to the existing XEP-0045 specification the service will only need to implement:

  • Jingle support for each multi-media conversation spaces in order to answer session requests made to the space JID.
  • Jingle stanzas' forwarding for session requests made to a space's participant JID.

It is easy to understand that private Jingle session requests made though a MUC are strictly equivalent to session requests between two clients made through a standard XMPP IM server. It is equally easy to understand that the point-to-point media transport between the two participants' clients will take place out of band using the appropriate negotiated binary protocol.

The same scenario applies in a conversation space, when several participants share the same media during the conversation. The difference with the above case lies in the type of end points involved in the point-to-point communication. In order to share a particular media, a mediator system has to combine each individual media flows produced by the various participants into a single media flow. This mixer role is usually played by specialized bridges, or even IPBX, when it comes to voice and video. In the particular case of VNC, I am under the impression this protocol has always been used as a pure point-to-point communications between workstations. As the typical use in the context of a MUC would mostly involve a one-to-many communication (one master screen shown to many participants' clients), adapting VNC screen sharing would require implementing some sort of repeater system.
This same one-to-many communication also applies to streaming media servers that could be used to broadcast inside the conversation space.

In the end, I believe the MUC protocol bears much more than its current limited usage for text chat rooms. I have just tried to illustrate some of the extended possibilities. I am not saying it will be all done in one day, but the proper bricks are already in place.

Technorati Tags: , , , , ,

Labels: ,

Thursday, October 12, 2006

Multi user communication

In the XMPP parlance, MUC refers to "Multi User Chat". The specification foreword highlights the textual aspect of the communication by referring to other similar text based systems.

Traditionally, instant messaging is thought to consist of one-to-one chat rather than many-to-many chat, which is called variously "groupchat" or "text conferencing". Groupchat functionality is familiar from systems such as Internet Relay Chat (IRC) and the chatroom functionality offered by popular consumer IM services.

In the wake of the growing use of XMPP as an application platform, I am convinced XEP-0045 go far beyond text chat rooms. The specification can easily be extended beyond text to any form of conversation space involving several parties. The word "room" is used throughout the document, and may appear somewhat linked to "chat room". But, leaving this restrictive interpretation aside, we can easily see that almost every concepts used can be applied to the wider scope of conversation spaces.

The first important aspect of the specification is the capacity of its definitions to be applied without changes outside the limitative text communication scope.
The specification provides an exhaustive list of spaces types, such as public (Open, Public) or private (Members-Only, Hidden) spaces, identified (Non-Anonymous) or anonymous (Fully-Anonymous) spaces, temporary (Temporary) or permanent (Persistent) spaces. One can refer to the specification for an exhaustive description of these types. The important point being that these types of spaces are somewhat independent of the actual means of communication used inside the space. They would apply equally well for text, voice or video, or any combination thereof.
Similarly, the specification goes to great length at defining the roles (Moderator, Participant, Visitor) and affiliation (Owner, Admin, Member, Outcast) any participant in a conversation space may acquire, as well as the rules and privileges a compliant implementation must provide in relation to these roles or affiliations. Once again, none of these definitions are specific to text communication. They remain valid for any kind of conversation space.

The second important point is the reliance on presence for the actual conversation spaces functioning. Entering or leaving a space is entirely driven by presence. And the XEP takes great care at defining the associated presence broadcast behavior.
The third important point is the definition of all the management actions that may be associated with a conversation space. For a space owner, it defines a wide range of configuration options. For a space administrator it allows to modify persistent information about user affiliations (e.g., banning) and to grant or revoke moderator privileges. Here again, there is nothing specific to text only communication.

The last important point lies in the possibility to use an XMPP address to refer to any conversation space, and any participant in the space. This fined grained addressing is the key to the extensibility of the "Multi User Chat" to "Multi User Communication". It is easy to imagine that a multi-media conversation space will use Jingle to extend its media capabilities beyond text only. As the conversation space is presence enabled, it will broadcast its capabilities to any client entering the space, while receiving the client's one. Similarly, the space will also broadcast the new participant's client capabilities to all existing participants, thus allowing multi-media private conversations to take place. A Jingle enabled client would then be able to negotiate a voice or video session with a multimedia space seamlessly. It does not matter if the multi-media implementation is entirely provided by the MUC service, or handed over to specialized audio or video bridges.

In the end, the MUC specification is a very important enabler for building advanced communication application, beyond text only. It provides a ready made framework to manage conversation spaces and participants. It is another perfect example of the maturity of the protocol and of the growing importance of presence in application. As I have mentioned earlier, XMPP has reached a stage allowing many application usages from its existing features set. It is just a matter of imagination and creativity

Technorati Tags: , , , , ,

Labels: ,

Wednesday, October 11, 2006

Moving up the stack

I have always heard bright people advocating that the real value lies in "applications" and not in the underlying "plumbing". Although some will certainly disagree, there is certainly some truth in the saying…

The recent submission of a new XMPP extension proposal prompted this reflection. The list of XMPP extensions is nearing the 200 and going. The purpose of this proposed addition is quite clear and legitimate:

While a protocol has been described for initiating a file transfer from one user to another, there is not yet a protocol allowing for one user to designate a set of files as available for retrieval by other users of their choosing.

The proposal goes even further by listing the functionalities provided by the protocol extension:

  • Obtain a list of other client's publicly available files, which match given search criteria. The search protocol is similar to that described in XEP-0055.
  • Request the transfer of one of those files. The transfer itself would function as described in XEP-0096.
  • Place requests into a sender-side queue, such that files are sent at a later time.

Unless I am missing something, these functionalities are shared by many resources management applications. Many of these applications always follow the same features' pattern once the service has been discovered:

  • Traversal of the resource store a.k.a listing,
  • Finding a particular resource or resource set a.k.a search,
  • Performing some action on a particular resource.

It happens we already have all the building blocks amongst existing XMPP extensions. In this particular case we would use:

I believe this example shows that XMPP as a protocol has reach the right maturity level, allowing many application usages from its existing features set. There are still unexplored areas were the protocol will need entirely new constructs. This will pobably be the case of very specific application domains, such as the latest discussions around gaming extensions show. In a majority of cases, though, we should see more and more proposals defining existing extensions "mashups" instead.

Technorati Tags: , ,

Labels:

Tuesday, October 03, 2006

YOIP (yawns over IP)

Steering an old stew doesn't turn it into a gourmet meal. And frankly, I think the latest babble-chatter around which VoIP start-up is more voice 2.0 than the other is kind of boring.

I recently left a simple comment on a series of posts describing the "bad practices" dictating the success or failure of new voice and video over IP enterprises: "what should they do to succeed?" Well, the tentative answer has only highten my perplexity. In short, no one seems to knows, but many are talking…

after all is said, more is said than done

The unease grew even stronger when I red that voice 2.0 in a nutshell sums-up as "be different and give the customer control". What a great discovery! That sounds so marketing 0.1, dear. It just leaves me wondering if anyone is seeing the obvious: until the toll based business model ceases to exist, there are not many chances for large scale innovative and compelling voice applications to emerge. The promise land of converged applications remains a mirage. Unfortunately for all these new enterprises, the issue lies with the networks business model, not with technology. On the positive side though, these VoIP start-ups are helping an ongoing customer education process. Whatever their fate may be, by providing a test bed of other possibilities to end users, they are more valuable than it appears at first sight. Obviously, trying to quantify this value in business terms is open to interpetations. Like any return on peoples’ education…

The troubling feeling I had was further re-enforced when I red the self-congratulating inventor of the so called Voice 2.0 meme suggesting that switching from one's current telecom provider to AOL, Google or [insert your favorite walled garden keeper name here] is the next panacea and will open a wonderful era of user driven voice applications. I doubt refurbishing second hand ideas which have been around since the beginning of the nineties will ever make this meme a reality soon. If like me you have seen so many of these layered voice/directory/application diagrams in any single telecom operator presentation, and so little changes over the same period of time, you remain somewhat skeptical…

As Aesop, this fine observer of humanity, declared, "after all is said, more is said than done". I believe the VoIP meme, in its current voice 2.0 disguise, as well as all its “propagandists” unfortunately do not escape this great saying.

Technorati Tags: , ,

Labels:

Sunday, October 01, 2006

Let's archive FUD.

We can draw a parallel between protocols and men: confidence comes with maturity. I believe the latest versions of the public key publishing and message archiving extensions to XMPP are to be interpreted as a concrete sign of maturity. In which way would you ask? In leveraging other existing protocols to their own advantage, in this case the established XML digital signature and encryption.
The process of standardizing a protocol often implies walking a narrow path between the "not invented here" (let's redo everything) and the "narcissistic elitism" (we can do everything) precipices.  In my opinion, when protocol's extensions authors rely on somebody else's work, it gives a strong signal they, and by consequence the protocol, have reached an "adult" level of confidence. And in turn, the early natural fears and doubts linked to this kind of endeavor are clearing out.

Going back to the proposed extensions, although I find the public key publishing proposal ready for implementation, I believe the message archiving proposal requires further improvement. Don't get me wrong, I am in no way diminishing the great work provided by the authors. I am just saying that, with few modifications, this archiving proposal could open a much wider perspective in our days of hyper communication.

From the JEP-136 introduction, I read the purpose of the extension.

Many XMPP clients implement some form of client-side message archiving. However, it is not always convenient or even possible to archive messages locally, e.g., because it is easier to keep all archives in one universally accessible place (not scattered around on multiple computers or devices) or because the client operates in a web browser or resides on a mobile device that does not have sufficient local storage for message archiving. In addition, server-side archiving makes it possible to offer new services such as integration of IM and email. Therefore it is beneficial to define methods for server-side archiving of XMPP messages.

In essence, this describes an application process, either manual or automatic, where different conversation threads are recorded to a store for later processing. When you think of it, this is a very natural extension to any communication system. It is also a typical storage application, with typical application subsystems:

  • Recording rules management (preferences)
  • Conversations management (collections)
  • Conversations' items management (messages)
  • Content replication management

In the end, this is not very different from a mail box or a blog post management application. The transport protocol for interacting with the application will have to be adapted to the appropriate client, but the operations will remain identical: modify, add, delete collections, modify, add, delete items, list store content. And all this is independent of the actual content of the messages or collections. To re-enforce this point, just note how the JEP describes the two cases of clear and encrypted content, and provide adequate methods for both content types. I believe the JEP would greatly benefit using Atom and its extensions instead of an home grown content format. 
From a functional stand point, there is no difference between using the JEP proposed content format, and a standardized Atom content, as represented bellow.

<iq type="set" to="montague.net" id="up2">
 <store xmlns="http://jabber.org/protocol/archive"
with="juliet@capulet.com/chamber"
start="1469-07-21T02:56:15Z"
subject="She speaks!">
  <feed xmlns="http://www.w3.org/2005/Atom"
xmlns:thr="http://purl.org/syndication/thread/1.0">
   <title type="text">She speaks!</title>
   <updated>1469-07-21T02:56:15Z</updated>
   <generator uri="romeo@montague.net" version="1.0">
Verona client
</generator>
   <id>60a76c80-d399-11d9-b91C-0003939e0af6</id>
   <entry>
    <id>60a76c80-d399-11d9-b91C-0003939e0af6:0</id>
    <published>1469-07-21T02:56:15Z</published>
    <author>
     <name>Juliet</name>
     <uri>juliet@capulet.com/chamber</uri>
    </author>
    <content type="xmpp" xml:lang="en">
     <body>Art thou not Romeo, and a Montague?</body>
    </content>
   </entry>
   <entry>
    <id>60a76c80-d399-11d9-b91C-0003939e0af6:1</id>
    <published>1469-07-21T02:56:26Z</published>
    <author>
     <name>Romeo</name>
     <uri>romeo@montague.net/orchard</uri>
    </author>
    <thr:in-reply-to ref="60a76c80-d399-11d9-b91C-0003939e0af6:0"/>
    <content type="xmpp" xml:lang="en">
     <body>Neither, fair saint, if either thee dislike.</body>
    </content>
   </entry>
   <entry>
    <id>60a76c80-d399-11d9-b91C-0003939e0af6:2</id>
    <published>1469-07-21T02:56:29Z</published>
    <author>
     <name>Juliet</name>
     <uri>juliet@capulet.com/chamber</uri>
    </author>
    <thr:in-reply-to ref="60a76c80-d399-11d9-b91C-0003939e0af6:1"/>
    <content type="xmpp" xml:lang="en">
     <body>How cam'st thou hither, tell me, and wherefore?</body>
    </content>
   </entry>
   <entry>
    <id>note:60a76c80-d399-11d9-b91C-0003939e0af6:0</id>
    <published>1469-07-21T03:04:35Z</published>
    <author>
     <name>Romeo</name>
     <uri>romeo@montague.net/orchard</uri>
    </author>
    <content type="xhtml" xml:lang="en">
     <div xmlns="http://www.w3.org/1999/xhtml">
I think she might fancy me.
</div>
    </content>
   </entry>
  </feed>
 </store>
</iq>

I believe in re-using what has already been thoroughly and critically discussed when it comes to standards covering the same field of application. In this case, with all due respect for the JEP authors, I think most aspects related to expressing web contents are already incorporated in the Atom specification, including standardized extension methods. In comparison, an XMPP specific content format will almost certainly be limitative. I am not certain the JEP authors have had the time to consider all the possible content use cases, which is perfectly understandable. On the other hand, they know XMPP well, and they came up with a good management envelope around the content. I have no doubt they would find adapting the query part of the extension to an Atom content easy.

I know that some of the JEP authors are particularly cautious with the size of what is sent over the network when it comes to wireless clients, and they may object as to the increased size of the XMPP payload. In this precise case, this would not apply, as the JEP is meant initially to provide a service to such limited local storage capacity client. In practice, the kind of content I describe will never be transmitted to these clients.

In the end, the same content information is conveyed, with a much greater content extensibility built into Atom, and the flexibility of XMPP for the transport. And this alone offers a number of possible applications beyond message archiving. For example, the exact same protocol can be used to post to a blog. Using Atom as the exchange format immediately increases the complementarities between XMPP servers and traditional web servers. XMPP servers could expose conversations as feeds. Etc… The end-user imagination only becomes the limit, and not the JEP. And this in turn is invaluable.

Technorati Tags: , , , , ,