Friday, June 02, 2006

World wide chat

Earlier this year there have been debates on the JSF mailing lists about ways to reduce traffic flowing between XMPP servers hosting multi-user chat rooms. In today’s common way of providing chat rooms using XMPP, a particular community's room could attract interest from users hosted on many other servers. The existing MUC protocol design assumes a chat room is a hub, and consider any participant to be directly connected to the room. In effect, participants may reside on different home servers than the server hosting the chat room. The discussion initiated from finding that the message distribution was sub-optimal between servers. Because the room itself is in charge of sending a copy to every participant in the room, a posted message would in turn generate a heavy traffic on the server to server links when it is relayed to all participants. To illustrate this phenomenon, let's use the now traditional Shakespearian scenery. In our days of reality shows, Juliet's balcony has morphed into the latest chat room in town. This way, all their fans can follow the intrigue in real time. When their story was still private, messages were flowing unnoticed between the capulet.net and the montague.com servers. Ever since Juliet has opened the balcony chat room, the capulet.net server has difficulties forwarding the lovers' message to all their eagerly awaiting fans.

Romeo and Juliet just hit an ever repeating truth: hub and spoke architectures simply do not scale to the Internet size. Only distributed architectures can take full advantage of the Internet. After all, the Internet itself is build by using distributed router and servers nodes.

If we take a step back, XMPP can be broadly described as a publish-subscribe protocol. It has built in mechanisms to notify subscribers of events occurring in three particular contexts, namely:

  • at the core, users are able to subscribe to receive presence or personal events states change from other contacts residing on any XMPP server,
  • in MUC, users are able to enter a chat room, and doing so subscribe to receive all messages posted to the room,
  • in PubSub, user are able to subscribe to different objects of interest and receive notifications whenever a publication matches the subscription filter.

In essence, XMPP as a protocol provides two specialized and one generic publish-subscribe mechanisms. The problem arising from scaling publish-subscribe systems has been a long time subject studied by the academic world. And all scholars have agreed distribution was the only valid architecture to scale publish-subscribe systems to the Internet size. To make it short, the best practices in distributing a publish-subscribe system consist in:

  • building a meshed overlay network comprised of core publish-subscribe routers and brokers,
  • connecting subscribers and publishers to the edge of the overlay network.

I believe it is possible to implement a scalable distributed MUC implementation without modifying XMPP. We can achieve Internet size scalability by organizing chat rooms by subjects and create MUC peering agreements between the rooms. Going back to the best practices in publish-subscribe architecture, we want to limit the broadcasted traffic to a minimum. Translating this requirement in the context of a MUC overlay network, we would simply achieve this result by:

  • having the users connecting to the nearest MUC room of interest. This way the traffic forking to reach every individual user only occurs on the users home server.
  • having all the rooms sharing the same interest being subscribed to each other. This ensure a message posted in any room across the overlay network is only forwarded once between rooms, thus achieving the expected traffic reduction.

In a first step, this would be best applied to widespread common interest public communities. Without existing mechanisms, the rooms cross subscriptions will have to be manually configured and set up across any group of servers wishing to share a common interest. A natural extension would be to build a cross subscription mechanism in the MUC implementation themselves. As a room is identified by a JID, there is no protocol limitation preventing a room from being participant into another room. It is a matter of building into MUC implementations the possibility for a room to join as a participant another room on another MUC server.

In a second step we would be to extend the existing MUC service discovery definitions to include the relevant vocabulary allowing this distributed rooms of interest to be exposed a such to clients. This is a matter of registering new items for the XMPP disco protocol.

This solution will greatly decrease un-necessary server to server traffic. This can be done without relaxing any of the MUC built in moderation features. This approach makes XMPP a good candidate to implement a robust Internet wide multi user chat network able to supersede other technologies such as IRC. And this can be achieved without modifying the protocol. Isn't this what re-use and leveraging is all about?

Technorati Tags: , , , , ,

Labels: