Thursday, November 30, 2006

The six oldest new ideas in chat…

Since I came across this post, I was looking for the proper illustration before coming back to it. In the end, the simplest and first impression is always the right one: this author does not have the slightest clue of what he is talking about…

First of all, he starts by being confused by the subtle difference between "chat" and "instant messaging". In effect, most of the "exemplifying" services or product he mentions are doing with instant messaging, and some of them offers multi user chat facilities. But instant messaging is not really news.

Interoperability. Ever since the invention of the telephone, interoperability has been the single requirement for any mediated communication system. In short we cannot decently consider this to be a new idea. Because of the early calcification forming into the innovative region of most corporate executives' brain, we unfortunately face a walled garden landscape for instant messaging (amongst other things). So suddenly discovering that "clearly, open standards are here to stay" is not properly visionary. Well as they say in China, "when the wise man point to the moon, the fool only sees the finger".
As a passing remark, unless I am mistaking, Trillian, Gaim, Adium and Miranda are multi protocol instant messaging client applications, not "services".

In-Browser Chat. This one is misleading because of the confusion between chat and instant messaging. But a simple search on the term chat will bring back 551,000,000 results which tends to indicate that chat has been in the news for quite some time. And when you carefully look at the results, you will find that many point to in-browser chat for so called "adults" services. But isn't this the oldest business in the world?
There are a number of browser based instant messaging clients which are trying to solve the interoperability issues mentioned above. For the same reasons that lead to the ineluctable disappearance of communication silos, only those services that rely on open protocols will remain. At that point in time, the end user will have the choice of installing a standard based client application on its workstation or use the same application hosted at a provider. The debate between hosted and non-hosted application has been part of the IT landscape ever since its beginning. This is not new either.

Location Based Chat. Geo location services have existed in mobile phone services since the early days of GSM. Before the web 1.0 bubble, some vendors were even touting geo location as "killer application". And it is still part of several mobile application services trying to bring contextual search to road warriors.
As I explained earlier, geo location can participate in bringing a sensation of "place" into mediated communication. But initial research on the subject dates back to the mid nineties. It is fair to say that most of the research was concerned about "virtual reality" worlds at the time. Obviously, instant messaging was not yet considered mainstream. This simply re-enforce the length of the road leading from ideas to their applications in different domains, but this has always been the case.

Flexible Identities. A quick look at this post will convince you that multiple personae have one of the longest research histories in the context of the Internet. You will also notice that there is a subtle distinction between "multiple personae" and "multiple facets" of one's identity. The same as between Doctor Jekyll and Mister Hide…
As a remark, the provided examples only implement precepts from the now defunct Presence and Availability Management forum on how granular presence was to be used to manage communication address handles, and when thinking of it, all the call forwarding features available in a PBX were already a way to "separate your private and professional faces". After ten years of "converged communication" talks, applying the same principles to instant messaging should not come as a real surprise.

Contextual Chat. In this case we are really talking about chat. This section presents a blurred mix of two different concepts. On one end there is the chat room client who can be embedded on different web pages, such as blog posts, and provide a fixed context for discussion. On the other end we have the "virtual presence" clients as browser's add-in which will work on any web page. The idea of being instantly aware of other users reading the same web page emerged early in the history of the Web. Implementation projects appeared in the fist half of the nineties with the advent of Virtual Places and other co-browsing initiatives. Ten to fifteen years on the Internet time scale are more like a century to me.

Rich Media Chat. As the author puts it "web cams and microphones have been on the web for a while", and it is only the commoditization of broadband network access and computing power that made "rich media", read internet audio and video, accessible to the mass. Already in the early days of video-conferencing, when only ISDN and leased lines were available, several systems were offering in-band instant messaging and even document sharing. Once again the idea dates back fifteen years. As for the previous section, speaking of novelty is not appropriate. That said, the trend to combine audio to instant messaging is natural as it mimic a real world behavior. The challenge will be to provide a seamless experience moving from one medium to the other across different devices.

The Internet has an immense magnifying power. It even allows us to quickly search a vast knowledge repository for information that would be relevant to make a post valuable. But using a flashy title is still far easier than providing relevant content: you only need to utter six words instead of a few hundred. Unfortunately the lazy approach based on making noise has been around since the beginning of time.

Technorati Tags: , , , ,

Labels: ,

The speech act of replying to a question

It is somewhat interesting to note that the Google Search result for the definition of "answer" gives a large majority of answers related to the legal practice. For a company so certain it can decrypt the vast complexity of human nature by using algorithms, that must the finding must have been devastating! From then on, the reaction to the stimulus was ineluctable. An automatic correlation with the recent increase of the number of legal proceedings against Google was obvious and could only lead to a single answer: bring down Google Answers…

Although the announcement eulogizes the innovative nature of the service, it did not protect it from down to earth consideration: it was not making money. Many have been and still are questioning the company's pretence to innovation. Google is no different than any other large incumbent, it just chose another moto: the "non evil" company. But these are just words. Google is master at coercing and, as many businesses, will use its power to smooth out its way to domination.

More simply, Google is not about people: its business model is only to become the largest advertising agency ever. Advertising only works by flattering basic instincts, lowering critics barriers and feeding distorted and subjective information. No wonder there is no space for real human answers in this context…

Tuesday, November 28, 2006

XMPP.fm

The latest announcement of an XMPP based tool to support music fans communities got me thinking how easy this can be implemented by putting together the proper XEPs. Let us consider how to implement an XMPP internet music radio similar to Last.fm. Obviously we would want to put some "social" focus into it. So the scope is simple:

  • Broadcast music to anyone joining the service, even on a temporary basis.
  • Provide a music recommendation system.
  • Provide discussion spaces amongst listeners.

From a protocol standpoint, the basic extensions we would need to provide this type of service are MUC for the community part, and PEP for the announcement part, obviously complemented by the user tune extension.

A very straightforward implementation route would be to create a presence enabled service to which any users could subscribe. The service will expose a number of "stations" implemented as MUC rooms corresponding to musical subjects of interest for the en users. So you may have rooms such as acoustic, alternative, ambiance, classical, dance, dark, experimental, folk, funk, groove, hip-hop, instrumental, jazz, lounge, metal, pop, punk, rap, reggae, rock, ska, techno, world, [add your own genre here]… To add the required "social" trend, one can imagine a system where users are able to tag and rate the music pieces and rooms are automatically created and added from these user preferences.

…kind of a lite social network system, featuring chat and commenting throughout…

The service "identity" is provided through some kind of DJ bot. End users subscribe to the presence of the bot, and are also automatically subscribed to the bot's personal eventing notifications. The bot also request to be subscribed to the user's tune events. Whenever a user come online, then it will receive XEP-0118 notifications from the bot describing what is currently playing in each active room. Obviously each notification will be augmented to include the URI of the corresponding room, to allow creating a "stations" list for the user to choose from.

Tuning to a station boils down to joining the associated MUC room, providing instant conversation between music lovers with similar tastes. These MUC room are augmented to allow direct presence subscription in the case end users prefer to tune directly to specific musical genre. Every "station" has its own DJ bot to which participants in the room can make suggestion as to the next piece of music to be broadcasted. The DJ bot is also in charge of notifying the room participants of the currently playing piece. It does it by simply posting notification to the room and relies on the standard MUC mechanisms to take over the distribution. This has the advantage of having the "station" past program automatically handled through the room history. The notification will be extended with the information about the physical connection URI in order to enable listening to the current track using the user’s workstation features.

Scrobbling tracks is directly available when the user set's its own user tune state. The notification is sent to the service DJ bot for collection and update of the appropriate statistics.

Additional “social” features could be implemented at each “station” level through the appropriate use of mood and activity publications and notifications. If participants are also able to publish their current room information through XEP-0194, the service could be enriched to provide what Wikipedia describes as:

The most-used community feature within Last.fm is the formation of user groups between users with something in common (for example, membership of another internet forum). Last.fm will generate a group profile similar to the users' profiles, showing an amalgamated set of data and charting the group's overall tastes.

I have already mentioned the lack of interest by the web 2.0 crowd in leveraging XMPP killer features. As one can see from the above quick jotting down, many of the puzzle pieces are readily available. In addition, they can be used from standard XMPP clients supporting MUC and PEP, which will be mainstream this year end. Those wanting to add all the bells and whistle of a multimedia UI can do so by building a flash based client with embedded audio and video players, and they will be ready to compete in the Media 2.0 broadcasting space without much technology investment. XMPP has reached a stage where a large number of applications can be built from its existing features set. It is still just a matter of imagination and creativity

Technorati Tags: , , , , , ,

Labels: , ,

XMPP rocks...

I used to smile when Peter was writing this kind of phrase, but this time it is real ...

“The Virtual Ticket Media Player Chat Room is written and operated under Extensible Messaging and Presence Protocol (XMPP) standards. This means that other users will be able to use their existing messaging software (Trillian, GTalk, etc…) to connect, authenticate, and log in to the Artist’s chatroom. Currently, the built-in Virtual Ticket Media Player chat interface is the only public way to enter the Artist’s chatroom though support for other clients is planned.”
A great example of what can be achieved by combining MUC and multimedia

Technorati Tags: , ,

Monday, November 27, 2006

Sensing activities in MUC

I have discussed before why conversation spaces are not places themselves, but rather for people to make places in them. In physical places as well as virtual ones, adaptation and appropriation of the associated technology by users is a critical element in the emergence of a sense of place and appropriate behavior. In short, the sense of place cannot be inherent in the system itself.

Within a place, social navigation is navigation through information collections on the basis of information derived from the activity of others. In the real world, we act where we are. We talk to people around us, because voices can only be heard at a short distance; we get closer to things to view them clearly. Understanding proximity helps us relate people to activities and to each other. When we see a group gathered around a meeting table, we understand something about this peoples' activity, and we know that another person standing off to one side is likely to be less involved.

Just like in real life, place aware presence systems should allow users to move to areas where others are clustered, to join the crowd and see what's going on. Since actions and interactions fall off with distance, so distance can be used to partition activities and the extent of interaction. I have described here how the use of "social proxies" can be used as abstract artifact to induce additional social information. Amongst the "social proxies" that have been studied, I find the Bable experiments particularly interesting as it could be applied to multi user conversations, such as the MUC rooms available on many XMPP servers. The proxy's role is to provide cues about the presence and activity of participants in the current conversation. It is graphically represented by two concentric circles similar to the drawing herewith. The outer circle symbolizes the conversation room border, the inner circle the conversation subject. Every participant is represented by a colored dot. The way it works is that participants in a particular room are shown within the proxy outer circle. People in other rooms are positioned outside the circle. When people are active in the conversation, meaning they either "talk" or "listen", then their dots move towards the inner circle, and then gradually drift back out to the edge when their activity decreases. What is interesting is the way test users have reported their experience of using this type of proxy:

…our users report the social proxy is engaging and informative. They speak of seeing who is "in the room," noticing a crowd "gathering" or "dispersing," and seeing that people are "paying attention" to what they say (when other dots move into the center of the proxy after they post).
On the practical implementation side, XMPP provides a number of extensions that can be put to use to enhance existing MUC implementations to support this type of social proxy. Beyond the specificity of the social proxy, the expected enhancement falls under what I have been writing about as presence feedback.
  • XEP-0085 associated with message stanzas moving averages over time calculated at the MUC room level could provide sensible indications about the "talking" activity of every participant. A "listening" activity indication could be derived from the automatic presence status generated by the client.
  • XEP-0163 could be used at the MUC room level to notify the MUC clients of each participant's dots relative position changes to be displayed on the client interface. If we limit the proxy geometry to a Cartesian representation, we could easily derive an appropriate format for the associated data similar to the Geo Location XEP.
  • Other MUC rooms' global activity could also be provided to further accentuate the overall places context. Obviously, the notion of proximity could also be put to use to induce the notion of semantically related room discussion contexts.
In the end, I believe it is not overly difficult to assemble all these XMPP extensions together in a MUC implementation and, as a result, give a better sense of other people's presence and the ongoing awareness of activity into the conversation space. All in all it would be an interesting step toward better structuring our activity in the rooms, and better integrating communication and collaboration.

Technorati Tags: , , , ,

Labels: , ,

Sunday, November 26, 2006

Visibility, Awareness, and Accountability

I have already presented the concept of "social translucence" while discussing the benefit of adding "place" based information to presence for a better regulation of mediated communications. A "socially translucent" system enhances two important dimensions of communications. First, by making social information visible it enables participants to be aware of what is happening, and to be held accountable for their actions as a consequence of public knowledge of that awareness. Second, the fact that the real world is translucent to social information, and that people have a sophisticated understanding of the consequences of the visibility of their social interactions helps structuring interactions in a mediated communication.

While the "social translucence" perspective is unique, it is not the only concept to be concerned with making the activities of communication systems' users visible to others. Since ten years, a considerable research work has been targeted at video-mediated communication (Finn et al., 1997), and has led to the concept of "awareness". A number of researchers have constructed systems attempting in various ways to provide cues about the presence and activity of their users (Benford et al., 1994). These researches have highlighted three design approaches to representing social cues in a digital system: the realist, the mimetic, and the abstract.

  • The realist approach tries to project social information from the physical domain into or through the digital domain. This work is exemplified in teleconferencing systems and media space research.
  • The mimetic approach tries to reproduce social cues from the real world as literally as possible in the digital domain. The mimetic approach is exemplified by graphical games and virtual reality systems. It uses virtual environments and avatars to mimic the real world.
  • The abstract approach involves portraying social information in ways that are not closely tied to their physical analogs. It could uses abstract sonic cues to indicate social activity, or abstract visual representations. This approach also includes the use of text or simple graphics to convey social information.

Large deployment and adoption of systems based on the realist or mimetic approaches have faced substantial pragmatic hurdles, such as their cost, the required infrastructure, and the constraints of users support. On the other hand, I believe that the abstract approach has not received sufficient attention, particularly with respect to graphical representations. Text and simple graphics have many powerful characteristics: they are easy to produce and manipulate; they persist over time, leaving interpretable traces; and they enable the use of technologies such as search and visualization engines. In this last category we find "social proxies" such as those depicted here.

A social proxy is an abstract dynamic graphical representation that portrays socially salient information about the presence and activities of a group of people participating in an online interaction. It is one technique for providing online, multi-user systems with some of the cues so prevalent in the face to face world. Social proxies are intended to be visible to all those portrayed in them, thus providing a common ground from which users can draw inferences about other individuals, or the about the group as a whole.

Typically, a social proxy shows participants in a particular "place", as well as some of their activities in that "place". The choice of which aspects of activity are visible, and which remain private, depend on the particular context. Social proxies have four basic characteristics:

  • A social proxy typically consists of two components: a large geometric shape with an inside and an outside that represent the online "place", and much smaller shapes positioned relative to the larger shape that represent participants.
  • The presence and activities of participants in an online "place" are represented by the location and movement of the smaller shapes relative to the larger one. The relationships and movements of the visual elements have a metaphoric correspondence to the position and movement of peoples in a similar face-to-face situation.
  • Social proxies are public representations, and everyone looking at a social proxy for a given "place" sees the same thing. It is not possible for participants to customize their views of a social proxy. This is important because I know that if I see something in the social proxy all other participants can see it as well. This is what supports mutual awareness and accountability.
  • Social proxies are represented from a third-person perspective. Looking at a social proxy, every participant sees itself represented in it in the same way other participants are represented. This enables learning. A participant can see how its actions are reflected in its personal representation, and thus begin to make inferences about the activities of others.

The shared nature of a “social proxy” is critical. The knowledge that activity depicted in the social proxy is visible to all participants makes it “public”, and transforms it into a resource for the paticipants. It is this visibility that supports people accountability for their actions, and underlies the social phenomena, such as feelings of obligation, peer pressure, and imitation, that enable coherence in groups interactions.

On the Internet we are socially blind, and our attempts to communicate are often awkward. Even when others are clearly present, as in a chat room or on a conference call, it is difficult to see who is present, who is paying attention, or who wishes to speak. Things that require little effort in real world "places", such as taking turns when speaking; noticing when someone has a question; seeing who is responding to whom, require a lot of effort in online "places", when they at all are possible. I think introducing "social proxies" in widely used presence enabled applications, such as IM or VoIP clients, would allow us to progress on the way of a better sensitivity to the actions and interactions of those around us in virtual "places".

Technorati Tags: , , , ,

Labels: , , ,

Presence and going places

Mike Gotta has jotted down a series of notes about his trend of thoughts regarding presence technologies. In my opinion, his segmenting of the subject strongly reflects the constraints of an analyst work, but he nevertheless brings up many interesting points. I like in particular the way he widens up the scope of the reflection beyond current implementations

... food for thought and for consideration as to how some of these items relate to assumptions currently made around presence systems. How many assumptions based on instant messaging, IP telephony and so on will get in the way of a more expansive view of presence?

From his points, I would like to focus on presence relations to "location", "environment", "activity" and "role", what is often refered to as parts of a context. In that respect, because of their strong relationship to the physical reality, the use of spatial metaphors and spatial organization to model context have been favored by many mediated communication and collaboration systems. I believe this approach does not properly capture the complexity of real human social interactions. In real life, we are located in "space", but we act in "places". If the structure of a world is spatial, by comparison a “place” is a space invested with social meaning, such as behavioral appropriateness or cultural expectations. Furthermore "places" are valued "spaces". The distinction is like between house and home: a house is where we shelter, but a home is where we live. In order to get contextualy closer to the complexity of social communications, integrating “place” based information would greatly improve presence technologies.

Presence technologies make applications augments physical reality rather than replaces physical reality. Current implementations fall short in adapting to the vast variety of social communications contexts any human being is experiencing in real life. For example, the nature of relations and interactions with one's friends and family differs significantly from the nature of relations and interactions in the workplace. Even these simple differences are only superficially addressed by today's presence enabled applications. As a matter of illustration, we can cite:

  • The way of establishing trusted presence relations scales poorly. To establish individual full trust between 10 persons, 10×9/2=45 bilateral agreements need to be established. Trust groups would scale much better and be easier to manage.
  • The availability status does not provide much added value. Many community systems provide a list of who is on to the community website, using a group-based trust model where presence information not only indicates "who is online" but also "who is here". This model may work well when using a community’s website as primary shared resource. However, many groups and communities in workplaces often use a variety of shared resources. In such cases, for example when co-workers are always online at the same time, more detailed presence information than just online/offline status is needed.
  • The trust model is very crude. Either one establishes a trust relation and can always observe other's presence information, or one does not establish a trust relation, in which case one can never other's presence information and cannot engage in conversations. This might not be problematic when dealing with friends and family, with whom you expect to resolve unwanted interruptions easily. It becomes a problem when dealing with a larger set of co-workers in a multi-project environment.

Presence technologies need to be augmented to provide information not only about people but also about “places”. Unlike what is currently offered in IM and VoIP applications, more advanced presence mechanisms must allow exchange of information only with a certain subset of people, not always but sometimes only, depending on real-time context information that can be derived from the virtual or real “places” people visit. Today, ordinary presence systems only give answers about person oriented questions:

  • Who is online, or is this person online?
  • What is this person doing?

Advanced presence systems will have to provide answers and notifications about “place” oriented questions, such as:

  • Who is here?
  • Who is near?
  • Where is that person?
  • What is that person doing there?

It happens these questions can be answered by combining different scoped attributes of presence infomation, including the trust relation between parties, their real or virtual locations, activities at these locations and presence and awareness scopes.

Trust scope. Some presence systems allow anyone who has access to a presence server to see presence information of others, other systems are more restrictive. The establishment of trust is distinguished by four model aspects:

  • Opt-in / opt-out / managed: In an opt-in trust model, others can only see presence information if you explicitly give them permission. In an opt-out model, others can see presence information, unless you explicitly denied them permission. In a managed model, a third party instead of the users determines who can see presence information.
  • Individual / group: In a individual trust model, each person rights to presence information are managed separately. In a group trust model, rights to see presence information are managed for an entire group.
  • Reciprocal / non-reciprocal: In a reciprocal trust model, if A has the rights to presence information of B, then B also have the rights to the presence information of A. In a non-reciprocal trust model, this may not be the case.
  • Permanent / blockable / contextual: In a permanent trust model, presence information is available as long as the rights to do so exist. In a blockable trust model, presence information can temporarily be denied. If the rights to presence information are based on location or place the trust model is contextual.

Location scope and virtual distance. When users browse the web, edit files from a shared storage, or read or post in blogs, they are present at a "location" in cyberspace. That said, many characteristics of physical space, such as being aware of someone's presence, and being able to initiate contact and communicate with that person, do not necessarily exists in the cyberspace.
Location information is expressed by coordinates, but in cyberspace unlike in the real world, users can be at multiple coordinates simultaneously. Place-based presence systems need to answer the question "Who is near?", have to calculate virtual distance between these coordinates. Virtual distance is then used to determine who can and who cannot be seen. To calculate virtual distance, presence location coordinates need to be laid out in a space, such as topology, virtual world or any directed graph. In place-based presence systems, location information constitutes a primary form of presence information. Not only the fact that someone is online somewhere in cyberspace, but also which resource that person is accessing provides presence information that can be made available to trusted parties.

Presence scope. A presence scope specifies the maximum virtual distance at which a trusted party can watch presence information. One may use multiple presence scopes, e.g., "people on the same website can see me, but cannot see the page I am on" and "people on the same web page can see if I am focusing on that page".

Awareness Scope. An awareness scope specifies the maximum virtual distance at which a user wants to get notified of presence information of trusting parties. One may use multiple awareness scopes, e.g., "are there people with me on the same web page?" and "are people with me on the same web page focusing on the page?".

Activity scope. What a user is doing at a location is also presence information. For example, in addition to browsing a web page, this may involve whether the user is actually focusing on this page or not , whether the user is editing this page or not.

Ultimately, by relaying “place” based information, presence technologies will enable three important building blocks of social interaction-- visibility, awareness, and accountability-and thus become "socially translucent" systems. We can illustrate a "socially translucent" system by the following example. Consider a door with a design problem, which is likely to slam into anyone about to enter from the other direction when opened quickly. An attempt to fix this problem would be to place a "Please open slowly" sign on the door. As one might guess, the sign is not a particularly effective solution. But we could also put a glass window in the door. As people approach the door they see whether anyone is on the other side and, if so, they modulate their actions appropriately. The sign is no longer required. While this solution works, it is useful to examine the reasons for the effectiveness of the glass window:

  • Firstly, the glass window makes visible socially significant information. As humans, we notice and react to movement and human faces and figures more quickly than we notice and interpret a printed sign.
  • Secondly, the glass window supports awareness. One does not open the door quickly because one knows that someone is on the other side. Our social rules come into play to govern our actions, as we have been raised not to slam doors into other people.
  • Lastly, there is another subtler reason. Even if one does not care about hurting others, one will nevertheless open the door slowly because one knows that the other knows that one knows it is there, and therefore one will be held accountable for its actions. While awareness and accountability usually occur together in the physical world, they do not necessarily in a virtual context. It is through such individual feelings of accountability that norms, rules, and customs become effective social control mechanisms.

Note that "social translucence" is not only about acting according to social rules, but more about facilitating different types of communication and collaboration. Using presence information it is today possible to observe that another party is likely to be available for communication. In return for giving up some privacy, the other party expects to be contacted at suitable moments, can screen incoming messages, can plausibly deny being present by not responding or responding later, or simply by initiating the conversation at a time of its choosing. With "socially translucent" presence technologies it becomes easier for users to have coherent discussions, to observe and imitate others' actions, to engage in peer pressure, to create, notice, and conform to social conventions.

Technorati Tags: , , , ,

Labels: , , ,

Wednesday, November 22, 2006

Jingling call control

Third party call control is what makes applications such as "click-to-call" possible. Although I will not qualify "click-to-call" of killer application, its potential in traditional commerce or support applications is undeniable. In essence, third party call control is a must have when the communication sessions are managed by just more than individuals.

Although until recently third party call control was the guarded property of large telecom vendors, a new breed of call control gateway has made its appearance. These devices bridge Microsoft's LCS world with the open source world of Asterisk, and provide a way for the Office communicator client to control the open source IPBX:

  • Use Office Communicator as a soft-phone to place calls, deflect calls, forward calls through Asterisk.
  • Receive incoming call notifications, see who is calling and reroute to an alternate number.

Obviously I do not feel this kind of device important because of what they do for the Microsoft closed products, but rather because they do it through the use of a standard protocol. Office Communicator has a built in support for the ECMA-323 standard, which is also known as CSTA XML. CSTA in its binary disguise has been around telecom vendor's equipment for a while. But I am ready to bet that its XML version will gain more and more traction as it allows a much easier and quicker integration between communication equipments and business applications.

In the context of Jingle, supporting different forms of call control is mandatory if the protocol is to see adoption beyond the narrow context of peer-to-peer direct communications. And I believe that looking to integrate CSTA XML and Jingle is the way to go.

Technorati Tags: , , , , ,

Labels: , , , ,

Down with the phone number tyranny

Martin Geddes is the last amongst a long line of heroes before him to have a whack at killing the phone company. The phone company is like the fabulous Hydra of Lerna. For each of its head heads that is decapitated, another one or even two more spring forth. In addition, like the Hydra beast which is half snake, the phone company has a very long tail…

Besides the underlying saga, this post also join in the growing chorus advocating what Ken Camp summarize as
… presence and availability, or context, or whatever they become as facets of our digital identity and persona will be a huge piece of the evolution of unified communications.
Martin speaks of context as a driver for communication. I would say that context is the only driver of communication. We never communicate outside of a context, and whatever media we use must be able to take the context into account. But this has nothing to do with the technical context Martin describes. His context is just the mere legacy of the antiquated operating systems and user interface in use today. The context driving the communication is a temporal cross reference intersecting several social groups and encompassing one or several spatial environments. The post is interesting as it goes on trying to develop on concepts tightly related to presence technologies.

I like the way he describes how Outlook may look like if it were "socially" enabled, but I don't think he has grasped the full spread of presence technologies in upcoming communication systems. Describing an "address book" the way he does shows he simply did not take into account that the actual communication context is in effect part of one's own presence. So tomorrow's "address book" should only show what is relevant in this context.
There is another point where the post slightly misses the target. Or maybe this is a matter of wording. When talking of "collaborative" I am more inclined to use it in regards to inter-individuals collaboration, whereas Martin seems to emphasize the inter-applications collaboration. Beyond this semantic digression, I have previously described my frustration in front of current user interfaces, and this has since been further developed by Giacomo Vacca when he also deplores the crude state of these interfaces and says

It's not about how presence technologies provide information on users' availability, but rather how much presence information can be truly dynamic and reflect users' habits and personalities.
To illustrate my point, let's go back to the phone communication. The success of the phone lies in its ability to mediate the most important mean of communication common to any human being: voice. And voice has the intrinsic capability to convey intonations as well as articulated semantic meaning. As such, a voice communication system providing a decent sound quality will compete on fair ground with face-to-face communications where only sound is available. This quality makes the phone system unique, as it introduce almost no perturbation in the medium. And this quality also makes the phone system’s success. Every other means of communication introduce much higher perturbations in the medium.

I believe the phone is inexorably moving toward a wireless mobile device. This is a no return journey, and in a few years we won't see any fixed phone left. Hopefully, at the same time, the phone device interface would have evolved well beyond using

  • a touch tone keypad that was the ultimate invention in the early sixties
  • a display trying to simulate a miniature windowing system that became common in the early eighties.
Just look at all these poor mobile phone victims running in every airports' corridors, bent forward, pulling their roller bags with one hand while furiously thumb hammering their mobile phone with the other. Don't you feel they look strangely like the common representation of our Neanderthal cousins on the human evolution charts?

To conclude, I agree with Martin that we need to have a more integrated experience when we communicate, for the simple reason that technology should get out of the way. But, unfortunately, every example he gives still bears a strong influence from today's (or rather yesterday's) devices limitations. The most important of it being the use of phone numbers. The current phone devices are so closely associated with numbers that they are de-facto unfriendly to any other means of communication found on the Internet. A keyboard is already unfriendly, but a keyboard where you have to press three times the same key to obtain a single character is hundred times more unfriendly. We are so used to this approach for voice calls that we seem unable to think outside this limitation. See how Martin only describes "address books" as repertories for phone numbers…

Until the two words "phone" and "number" have been taken far apart, I believe we will unfortunately still see a lot of the phone company, both inside peoples’ minds and outside.

Technorati Tags: , , ,

Labels: ,

Saturday, November 18, 2006

Presence is irrational by nature

Technologies are rational by design, and they tend to rationalize human activity when used. I came across an interesting reading (Luhmann, 1993, 1995) which emphasized the over simplification often introduced by technology. I think this is particularly true in complex human related applications, such as those found in mediated communications.

The flipside of technological simplification is loss of flexibility and contingent response that have to be re-instituted through artificial mechanisms. Technological sequences cannot handle (i.e. absorb, ignore, forget or dissimulate) unforeseen incidents at the level on which they operate, even though technologists currently attempt to construct systems that respond to emergent events on the basis of learning from experience (i.e. neural networks). Such simple behavioral characteristics as forgetfulness, dissimulation and indifference, that we often assume to be part and parcel of the limitations of humans, play an extremely important and adaptive role under conditions of emergence, complexity and unpredictability.

Human communications and interactions are neither rational nor designed. Furthermore, temporal regularity is important in human experience. Communication technologies create perturbations in the regularity of time that characterizes a life made of personal habits and social routines. Habits and routines are more than repetition. They are often unique and spontaneous human experiences, where each repetition is different from the last.
By comparison, immediacy and access, as well as the constant flow of information, command that we attend to whatever is nearest and most urgent. Doing so, we lose a line of continuity to a dashed line of distraction.  In the end, we pay attention, but in spurts of sameness that contribute little to a healthy experience.

The adaptation between the technical and the human takes place at what is called the "interface." In the case of communication, this not just a user interface, but also a social interface. It is social because it mediates communication while facilitating the exchange of interpersonal cues and acknowledgments.

Because communication and presence technologies can stretch our relationships across time and space, they produce proximities involving rhythms of interaction, coordination of activity, ways of communicating, and ways of offering and protecting our availability. They do it creating kind of virtual proximities in which we become "equidistant" to one another. Unlike physical proximity, temporal proximity can be described as having qualities of speed, duration, acceleration, rhythm, and synchronization. Amongst the major challenges for communication and presence technologies we will find

  • respect for habits and social routines without reducing them to simple functional repetitions,
  • seemless flexibility and adaptability of user interaction,
  • mediation of rythms and time, in complement of space, to induce a more human impression of proximity.

Today's communication and presence technologies’ interfaces often create a recurring sameness. The functions codified in the technologies reproduce the same abstracted operation and the same simplified representation with each repetition. This functional repetition displaces the spontaneity of social tradition. And we begin to think that repetition itself is dull, when it is the technical procedure implementation that is dull. Just look at a mobile phone to get a sense of what I am driving at…

Technorati Tags: , ,

Labels: ,

Thursday, November 16, 2006

Streamlining remote probes

I wanted to finalize my original topic on improving XMPP presence handling model for subscribers hosted on distributed home servers by looking at presence probes.
XMPP differs from presence protocols such as SIP/SIMPLE by using persistent presence subscriptions, instead of transient subscriptions to be renewed for every session. In this model, an XMPP client publishes its presence states variations to its home server, which in turn generate the appropriate presence notifications.
Furthermore, an XMPP presence server is only responsible, and above all knowledgeable, of its own user's constituency.

When a user want to initiate a presence enabled session, it publishes an initial presence after login. This is intercepted by its home server, which is in charge of

  • Responding to the client with the initial known presence state of every watched contact,
  • Notifying every contact subscribed to receive the user's presence.

The home server then triggers two processes:

  • A user's presence notification to all the watchers of its presence state,
  • A probe of each user's subscriptions to receive the contacts' presence states.

When every contact in a user's buddy list is co-located on the same home server, the server has a complete view of each contact's presence state, and using a probe stanza is not necessary. But if the contact resides on a remote server, a probe stanza is sent to that server to trigger a presence state stanza in return. Mridul rightly point out that the current specification leaves open the possibility for a server to cache remote contacts presence states and derive initial presence for additional instances of these contacts from cache. I believe the specification must avoid remote presence state caching. The probing mechanism is a guarantee for the home server to always have the latest initial presence state of remote users.

I can now come back to the early concern of minimizing the network traffic generated by presence handling when subscribers are hosted on distributed home servers. It is to be noted once again that the two processes of notifying all watchers and probing to receive contacts' presence states are asymmetrical. I have shown previously how notifications could be optimized by using transient remote users' lists. Conversely, probing multiple contacts' presence states is a one time operation.
XMPP requires that probes be sent to a global URI (bare JID). In my opinion, to the contrary of notifications, the expected gain of grouping several JIDs in a single stanza are more limited. Leveraging XEP-0033 Extended Stanza Addressing to group JIDs in a single stanza may result in slight traffic improvement if the number of remote contacts located on a single server is important. But for a limited number of remote contacts, the gain will certainly be marginal, so the complexity of implementing this kind of mechanism should be carefully weighted against the actual traffic gain.

An illustration of this mechanism is given bellow, assuming the multi-probe support has previously been discovered through XEP-0030:

<presence to="prober.denmark.lit" from="horatio@denmark.net/palace" type="probe">
  <addresses xmlns='http://jabber.org/protocol/address'>
    <address type="to" jid="rosencrantz@denmark.lit"/>  
    <address type="to" jid="guildenstern@denmark.lit"/>         
  </addresses>
</presence>

<presence to="horatio@denmark.net/palace" from="rosencrantz@denmark.lit/on-the-road"/> 

<presence to="horatio@denmark.net/palace" from="guildenstern@denmark.lit/motel"/> 

It is rather obvious that the overhead of discovering the support and using multiple addresses in a single stanza will offset the gain when contacts are spread up few at a time amongst many servers. That said, if a significant number of contacts are located on a remote server, the mechanism may prove valuable, not primarily because of the traffic reduction, but rather because all the probe targets are delivered in one single stanza. For the remote server it will generally be more efficient to process a series of JIDs in a single transaction without context switching, rather than processing several atomic stanzas each in its own transaction. I most other cases, the added complexity may not be worth implementing.

Technorati Tags: , , ,

Labels:

Tuesday, November 14, 2006

Presence calls out attention

Every new piece of information put on the web becomes available to millions, almost instantaneously. Drawing a parallel with material goods, information production can be virtually infinite, and consequently, be in oversupply. Around ten years ago several definitions started appearing for what is now known as the "attention economy". The main concept was that over abundant information could only get value from the attention anyone of us was willing to devote to specific pieces of information.

To get attention you must emit what is technically identifiable as information; likewise for information to be of any value, it must receive attention. Therefore an information technology is also an attention technology, or in other words, a transfer of information is only completed when there is also a transfer of attention proceeding in the opposite direction.

In economy, property is the ownership of wealth. If attention has become a new kind of wealth, then one gain property whenever one attracts and holds it. One attracts attention by making oneself, and whatever one wants attention for, as visible as possible. Thus one holds best onto this form of property by being most open. In fact, this property is in the minds of one's beholders, and there needs to be as many minds as possible.
If one is good enough at attracting attention, it may create a temporary "enslavement", where those giving attention turn over control of a large part of their mind and even body. Attracting attention also means acquiring recognition, identity, and meaning in the eyes of those around. One's store of attention can sustain spirit, mind and body, in just about any form.
At the same time, those paying attention will also want to get attention for themselves by quoting, citing, criticizing, parodying, gossiping about, or referring to an attention grabber as if a star. In the extreme case which govern fans relationships to their stars, giving attention will take multiple forms such as listening to them, heeding what they say, doing what they ask, waiting on them, waiting for them, serving them, loving them, in short doing anything and everything for them.
In an "attention economy" it becomes possible to benefit from revealing as much as possible about oneself, including weaknesses, and just about anything else. That way, humanizing oneself not only stir up interest, but makes it easier for others to imagine themselves in one's shoes, which means turning their minds to see from one's eyes, a key part of any "paying of attention". Conversely, hiding away will likely turn attention elsewhere and create a risk of losing at least some of one's attention capital.

When they are not in the same place, peoples maintain presence and proximity through communication. Not through images, or appearance, but by maintaining communication. Beyond this, I think that presence technologies influence and enhance our proximity to one another.

Presence occurs when part or all of an individual's experience is mediated not only by the human senses and perceptual processes but also by human-made technology (i.e., "second order" mediated experience) while the person perceives the experience as if it is only mediated by human senses and perceptual processes (i.e., "first order mediated experience).

Early conceptions limited presence to its spatial and physical context, partly because of the technical nature of the mediation. But timing, rhythm, speed, and continuity, despite having a temporal quality that is easily disrupted by technical mediation, are critical to human communication and social interaction. They are more difficult for us to model and render, but nonetheless, I believe that temporal distortions participate fundamentally in one's sense of being on the same page, being in synch, having or sharing time together.
As a consequence, proximity should not only be based on spatial co-presence anymore, but instead tuned to the frequencies of virtual presence. Proximity in the age of its technical production becomes temporal. Proximity mediated through presence technologies produces continuity in spite of physical separation from one another.
In the "social" web, one trades physical presence for virtual presence negotiation in order to get access to people, obtaining their attention, knowing whether a person is there, and there for oneself. Presence technologies provide a temporal continuity through discontinuous participation, creating a sense of being with others who aren't there by projections of oneself in the virtual world. By doing so, presence technologies have the capacity to bring connectedness to people. They help spanning time and weaving a social fabric whose consistency simulate a "being there" for one another in time, but not space.

The real promise of the "social" web is to help satisfy the ever more pressing desire for attention. It's not the associated communication technologies which are important, but rather the individual and social practices into which the technology becomes embedded: messaging, talking, trading, dating, buying, selling, etc… They all participate into how one is perceived as present in the "virtual world". And when everything else has become boring, only "social" presence remains. In this world, I see presence as a social involvement, one that calls out attention, or to put it another way, in the "social" web it is presence that drives the way people trade attention.

In spite of this simple economical equation, none of the so called "social" networks have yet embarked on a re-architecture based on real-time presence technologies; instead they keep using overhauled legacy web techniques.

Technorati Tags: , ,

Labels: ,

Monday, November 13, 2006

Of compartments and silos...

Brad Casemore presented earlier the upcoming integration of Yahoo! messenger into their mail service interface as

In another example of how online applications are becoming richer and more useful, Yahoo announced today that it will embed instant messaging into its web-based email program within the next few months, allowing  users to partake in live chats from Yahoo Mail and to obviate the need for installation of a desktop IM application.

I generally appreciate Brad's comments in the way they keep a positive tone. In this particular case, I am less incline than he is to only find positive aspects in this upcoming integration.  First of all, there is this persistence by one of the remaining consumer instant messaging incumbents to stick to proprietary protocols, rather than embracing open and documented standards. But more generally, as I pointed out earlier, I don't think Yahoo's current "me too" attitude is in any way giving a sign of "online application becoming richer and useful". If industry trend there is, it exemplify this industry inability in general, and of Yahoo! in this particular case, to properly grasp the current usage trends, and to provide relevant solutions to problems at hand.

Mike Gotta has perfectly analyzed the growing disaffection for email observed in "the current set of digital natives (those that have grown up using computers)". With the commoditization of tightly interwoven communication tools, the challenge of technology is not to offer a single command point trying to aggregate a "pot-pourri" of legacy and current communication channels, but rather to enable the proper use of the most appropriate channel and to move seamlessly between channels/devices at any time, while keeping the conversation active and rich.

I believe this is the kind of evolution we notice when we observe how teenagers prefer their IM client to an email client. But I would not qualify this of "generational". In my opinion, it results simply from a different "learning" context and different "social" priorities. The important point is that IM is nothing but another channel for communication, although more in line with the natural real-time nature of face-to-face communication than email in the current impersonation of online "social" spaces.
Looking at the upcoming UI mock-up, which does not provide the slightest hint of presence enabled contact list, makes me wonder if anyone at Yahoo! has even noticed that this evolution has already started. From the look of it and the justifications provided in the original announcement , email and IM are still living in far away silos at Yahoo! But can we really expect a walled garden proponent not to keep the few neurons at its disposal in separate cubicles?

As I hinted before, a true improvement would be to provide a generalized messaging interface, and leave the final routing decision to a combination of presence and user action. On the surface, one could argue that the proposed possibility of copying an email under redaction into an IM provides the same functionality. But this would be missing the true nature of asynchronous communication: it is a special case of synchronous communication, not the other way round. And to notice this subtle difference requires thinking out of the silos.

Technorati Tags: , , , ,

Labels: ,

Non verbal presence

I have reported earlier how different media type can affect the nature of peoples' interaction and by consequence influence the medium chosen by an individual who wishes to communicate. Many factors are affecting the level to which a medium is perceived as sociable, warm, sensitive, personal or intimate when it is used to interact with other people. Peoples in distributed environments are adjusting to perceived physical contact and closeness, even if it is not possible, just as people strive for intimacy equilibrium in the real world. They are also eager to retain the possibility of doing interactive acts in this environment, even if they would not do them due to social rules in the real world or the mediated context.
They will thus look for technical mediations involving impressions as much as they involve expression. For technologies that mediate presence, this mean:

  • Assisting the medium in its production of perception.
  • Minimizing the distortion or amplification of affective movements.
  • Allowing the user's action structuring.

I have previously presented views on how some form of feedback, when added to presence technologies, was desirable to minimize the intrusive nature of notifications and, by consequence, to limit some of the medium induced distortion. But I believe presence systems have a natural bias towards machine to machine communication, and most UI (user interface) still fall short at conveying non-verbal contextual cues. Although we find a host of presence attributes that can be aggregated and used to infer "enhanced" availability states, the available rendering techniques to assist the medium and enrich the user's perception of its environment still fall short of providing an adequate answer.

For example, in text based communication systems, commonly used UI substitutes for non-linguistic cues, such as gestures and facial expressions, are avatars, smileys and other fixed design elements. Clearly these "signs" have at best a reduced correlation to the user's intended meanings. And where a user's facial expressions are directly expressive, these are indirectly expressive. In particular, their appearance doesn't vary from user to user. Further more, they have to be interpreted in context. When a smiley used in an UI, the user has to resort not only to its knowledge of the author, but also to the context of email, IM, chat, or whatever communication tool is in use. In that respect, I don't agree with the argument that these design features enhance presence. They just slightly increase the palette of expressions, and minimally assist the medium in producing an impression.

I believe current human facing presence systems, although they like to qualify themselves of "enhanced" presence providers, still exhibits a very primitive capacity to render information about posture and non-verbal cues as they are perceived by the individual to be present in the medium. Maybe this is further amplified by the strong application (vs activity) oriented nature of today’s windowing UIs, and the difficulty many designers have to free themselves from "best practices" when these are nothing but entrenched habits. I think our UIs are showing their age and definitively lack the dynamic and ingredients necessary to the required production of perception.

Technorati Tags: , ,

Labels: ,

Saturday, November 11, 2006

VoIP is a series of tubes...

But unlike the Internet, these tubes can be filled without allowing inter-tube communication. Unless I am mistaking, we are getting yet another replay of the "my little walled garden" show, this time with point to point voice communication as the guest star. Just take the same incumbents as in the case of consumer instant messaging services, add eBay with Skype, and Google with GTalk and you have the new landscape. What is also interesting is the parallel uptake of VoIP hard phone devices for use with these closed services. We already knew the series of Skype phones, but more recently the movement has accelerated with examples of Yahoo! and GTalk phones.

I will resist enumerating the reasons why walled gardens and non interconnected “tubes” cannot provide a sustainable model, as I have done so here before. But this multiplication of closed voice services and associated devices raise two questions.

Firstly, the technological context in which these services are being created is different from the context of instant messaging. At the time where IM services have originally been created, there was no established standard for instant messaging, which in turn led to the development of proprietary and incompatible protocols. In comparison, SIP was available as an open VoIP standard. So why wasn’t SIP chosen by all these players to incorporate in their soft clients? Apart from AOL which included a SIP stack in its latest AIM clients, the other incumbents, including Microsoft, as well as Skype and Google came up with their own VoIP solution. I have various reasons coming to my mind that could explain why SIP was not chosen, amongst them I can cite:

  • Inability of the players to conceive voice services outside the existing traditional telco model, where the only conceivable option for them is to play the role of clearing house. Using a proprietary VoIP system would force the end users to go through the incumbent's gateways to reach out to other VoIP services.
  • Fear that adopting SIP would give an unfair advantage to Microsoft which had clearly endorsed this standard in its enterprise products. The current situation is proving that this possibility was grossly exaggerated, as MSN does not use SIP for VoIP, and there is no concrete proof that SIP is used to a large extend by MSN yet.
  • Inherent complexity of a SIP solution ranging from the phone configuration, recurring subject cited by many VoIP observers, to the associated infrastructure necessary to support and control the service. Furthermore, SIP is a documented standard, but its standardization process is similar to a vendors' consortium initiative. As a result, most of the SIP equipments are only manufactured by traditional telecom vendors, and the cost of a SIP infrastructure may have been excessive.
  • Scarce SIP expertise amongst the developers that have been put in charge of implementing the service and integrate voice in the IM clients. Skype and GTalk are typical examples, where developers had a protocol at hand that was solving many issues SIP is facing with NAT traversal. In such condition, it was easiest to just add signaling and voice transport to the existing protocol than to learn SIP in order to embed a full stack.

Secondly, why do all the VoIP pundits remain silent before the obvious rise of these new walls?  I find this silence deafening and profoundly puzzling. I don't know if you are like me, but I don't think it is difficult to infer that all these voice services are of no real value to the end user. I was under the impression we were in an era of users' empowerment and "social" communications. Furthermoe, if the attention economy is effectively the natural economy of the Internet, then voice is the first natural way to draw someone else's attention. After all, what does a baby do immediately after it is born but cry to draw attention? Voice is the ultimate open medium, and is the first and most ubiquitous human way of communication, far before writing. But when someone can only use words in the limited context of a closed community, without the possibility to be heard outside, it looks strangely similar to being deprived from basic freedom of speech.

I can hazard a possible explanation to their silence. Maybe the said pundits, having been exposed to long to the traditional telco business, are only able to conceive a single type of voice applications: toll gates, which are the natural complement of non interconnected “tubes”.

Technorati Tags: , , ,

Labels:

Thursday, November 09, 2006

Streamlining remote notifications

Mridul recent post highlights a point where XMPP could improve its presence notification model when presence subscribers are distributed amongst many servers. That said, the described behavior is by no mean specific to XMPP and is also found in SIP/SIMPLE early specification. As the issue exhibit some commonalities in both protocols, it is not extraordinary that the proposed solutions to decrease the network traffic generated by notifications revolve around notifying lists of subscribers instead of notifying each subscriber in turn. This is a rather common design in many publish/subscribe, where subscriptions' lists are advertised to the brokers nearest to the subscribers, with a task for them to perform the final notification. When applied in the context of XMPP, the driving idea is to delegate the atomic notification to the home presence server for the concerned domain.

Obviously the expected gain only results from a steamlined traffic between servers. The most noticeable gains would only be achieved when a particular user has more than one contact located on a remote server. In an adverse case presenting a widely spread distribution of remote subscribers (i.e. many remote subscribers each on different home servers), the gain would certainly be lower than expected.

The very nature of XMPP, with its persistent presence subscriptions, implies that the two cases in which a large number of presence packets will be exchanged are at the begining and end of a user’s session:

  • When the initial presence is published after a user login,
  • When the final presence is published at log out.

At a user login, an XMPP client delegates the appropriate presence notifications to the user’s home server; an initial presence will combine two separate processes:

  • A notification of the user's presence to all the watchers of its presence state,
  • A probe of each user's subscriptions to receive the contacts' presence states.

At logout time, only the notification of the final “unavailable” presence to all watchers is taking place.

I disagree with the way the problem is presented in the post, as the described use case assumes that the presence subscriptions between users on each server are symmetrical. Although that may account for a majority of cases, this cannot be extended as a generic case.

I also disagree with the underlying assumption that the two processes of notifying all watchers and probing to receive contacts' presence states are symmetrical. XMPP specifically differentiate between "presence-out" (outgoing notifications) and presence-in (incoming notifications). In effect, probes only occur once at the beginning of a user's session, whereas notifications will happen for every user's presence state change. As a result, I think these two processes have to be handled differently. I also believe support for these two processes would be best discovered and implememented independently by servers. Consequently:

  • Probing multiple contacts' presence states is a one time operation and can leverage XEP-0033 Extended Stanza Addressing to group all target JIDs in a single stanza. The probe result can be returned by the contacts home server using the same mechanism.
  • Notifying watchers requires establishing a transient dynamic watchers list on the contacts' home server. This list's address would then be used when sending notifications in order to minimize the inter-server traffic. The contacts' home sever will then be responsible for the local notification of every contact's resouces.

In this later process, I believe that XEP-0144: Roster Item Exchange would be more appropriate that XEP-0033 for the watcher list management. After all, the watcher list we are trying to build is nothing but a roster extract for the subscribers of a particular domain. The XEP-0144 extension provides all necessary management features to add/delete entries in the list. In our particular use case, I think the proper container is best provided by an InfoQuery stanza. An illustration of this approach is given bellow, which assumes the multi-notification service has previously been discovered through XEP-0030:

<iq to="notifier.denmark.lit"  from="horatio@denmark.net" 
       type="set" id="1234">
  <x xmlns='http://jabber.org/protocol/rosterx'>
    <item action="add" jid="rosencrantz@denmark.lit"/>  
    <item action="add" jid="guildenstern@denmark.lit"/>         
  </x>
</iq>
 
<iq to="horatio@denmark.net" from="notifier.denmark.lit/a341ff7cd2"
       type= "result" id="1234"/>

<presence to="notifier.denmark.lit/a341ff7cd2" from="horatio@denmark.net/palace"/>

You will note how the notification service returns the list address in the InfoQuery result, and how this qualified address is used for sending the presence stanza. If the list needs to be later updated during the user's session, it can be achieved incrementally:

<iq to="notifier.denmark.lit/a341ff7cd2"  from="horatio@denmark.net" 
       type="set" id="3456">
  <x xmlns='http://jabber.org/protocol/rosterx'>
    <item action="add" jid="hamlet@denmark.lit"/>  
  </x>
</iq>
 
<iq to="horatio@denmark.net" from="notifier.denmark.lit/a341ff7cd2"
       type= "result" id="3456"/>

In order to keep the management of the list simple, I would make the list transient for the duration of a user's session only. The overhead of re-creating it for every user's login is largely offset by a management that do not have to deal with permanent remote lists. The watcher list can be removed either when the final presence "unavailable" is sent to the multi-notification service, or by using a specific InfoQuery request extended with XEP-0144.

The possible side effects related to glitches in communication between the servers will have to be addressed separately using the mechanisms that are currently under discussion at the JSF.

Technorati Tags: , , ,

Labels:

Wednesday, November 08, 2006

Media relay hidden query

I received additional information following my last experiment on Google using media relay. It happens that the GTalk service also implements a full XMPP extension to provide information about its relay servers. This extension, which replaces the "google:relay" query is hidden behind the STUN extension which is documented on their developers' site.

Using my favorite client I was able to check that using the described query gives the result that can be seen bellow:

<iq type="get" to="romeo@gmail.com" id="1" ><query xmlns='google:jingleinfo'/></iq>

<iq from="romeo@gmail.com" type="result" to="romeo@gmail.com/psiC3BE970F" id="1" >
<query xmlns="google:jingleinfo">
<stun>
<server host="stun.l.google.com" udp="19302" />
<server host="stun1.l.google.com" udp="19302" />
<server host="stun4.l.google.com" udp="19302" />
<server host="stun3.l.google.com" udp="19302" />
<server host="stun2.l.google.com" udp="19302" />
</stun>
<relay>
<token>CAESHgoVamxzZWd1aW5lYXVAZ21haWwuY29tEOCJq4TsIRoQRWsKcIug9O8RySUrR05+tw==</token>
<server host="relay.l.google.com" udp="19295" tcp="19294" tcpssl="443" />
<server host="relay2.l.google.com" udp="19295" tcp="19294" tcpssl="443" />
<server host="relay3.l.google.com" udp="19295" tcp="19294" tcpssl="443" />
<server host="relay1.l.google.com" udp="19295" tcp="19294" tcpssl="443" />
<server host="relay4.l.google.com" udp="19295" tcp="19294" tcpssl="443" />
</relay>
</query>
</iq>

To conclude, it seems that current GTalk clients (1.0.0.100) are using the HTTP/XMPP combination I described earlier, and that future versions may use this XMPP only query to discover the relay servers. In the meantime, I suppose they are waiting for the extension to be final to add it to the existing documentation…

Technorati Tags: , , , , , , ,

Labels: , ,

Morphing conversation UI

I am not a typical "example" when it comes to user interface, and I have my own idiosyncratic preferences, especially for communication applications. In particular, I despise the way we have been forced by the incumbent desktop software players to use mail clients to manage information. Their influence is so insidious, that even their strongest open-source contenders stick to the same screen real estate concept.

I came across this short post by Mike Gotta elaborating on the ineluctable death of the email client as we know it.

Concerning enterprise environments, at some point a few years down the road, shifting demographics as Danah Boyd points out in this post, will have an interesting impact on the future of e-mail clients. Once "digital natives" become a large part of the workforce, it's likely that we'll see a tipping point where users will prefer real-time communication front-ends to async front-ends. Yes, e-mail clients will support unified messaging and will morph to provide a real-time communication (RTC) user experience but younger workers may prefer to live in more natively-designed RTC clients (such as Microsoft Office Communicator and IBM Sametime), especially as those clients support both social and work-related capabilities.

I am a bit disappointed because my relief is not readily in sight, but this analysis is pre-announcing what I believe to be the irreversible evolution of our way to interact with desktop communicating applications. There was a not so distant time where the latest hipe was about getting IP to every workstation. The natural evolution would be to get conversations to the workstations, any type of conversation, and have all the associated communication stacks available as local servers to any application whishing to use them. At that point, it will become obvious that email is just an asynchronous channel for real time text exchange in a wide presence enabled communication system, instead of instant messaging being a real time version of email, as the current generation of "office" software would like us to believe. And I hope this difference will bring UIs more to my taste.

Technorati Tags: , , , ,

Labels: , ,

Tuesday, November 07, 2006

VoIP is irrelevant

At least for the purpose and in the context of this poll! When your [truck, ship, plane, add your own transport here] is stuck in the middle of a distant nowhere following a breakdown, the first thing you think of is getting in touch with the [driver, captain, pilot, etc…]. At that point, your only intention is to communicate, to exchange and gather information, for finally come to some decisions and trigger some actions. And during that process, the last thing you will be worrying about is if part of your communication is carried out using VoIP. It is irrelevant. Even the cost of the communication will become irrelevant. What you would want though, once the communication has been established, is to take photos or videos of the incident to be able to share them with the appropriate expert for deciding on the better course of action, and at the same time to start the claim procedure with your insurance company. What you would want is to quickly gain access to the nearest repair facilities and arrange for you transport to be taken care of, because your business depends of it.

Even if the transport of multiple data streams using the same underlying technology has become increasingly more common, it does not make VoIP relevant in itself. The induced fall of the wall between the voice and data silos is relevant. The ensuing cross domain knowledge dissemination between these two areas of expertise is relevant. The growing awareness of the possibilities offered by merging multiple communication types into applications is relevant. And only at that point will it become relevant to use VoIP, not the other way round. Obviously, all these positive consequences will only happen at the slow pace of human behavior changes. And this is much too slow for the buzz makers.

More profoundly what is relevant is "to impart, to share, to make common". And in that respect, the current impersonations of VoIP are far from providing this basic communication attribute, as they are repeating the creation of wall gardens in the same way early email providers did, in the same way consumer IM providers still do. And by taking this approach, their proponents are as good as the poll's author and other VoIP's meme builders: clueless…

Technorati Tags: ,

Labels:

Monday, November 06, 2006

Relay or no relay, that was the question

The miracle of the blogosphere happened again. My little rant about Jingle media relaying produced the expected effect, and I now have the answer: Google is effectively using media relaying in GTalk to cater for the 8% of NAT traversal cases not covered by their implementation of ICE. According to the information source,

the client discovers the relay's host, port and other information using a proprietary XMPP extension. The client communicates with the relay service to allocate ports.

It's a proprietary protocol. The team would like to replace the proprietary protocol with TURN.

Earlier today I conducted a simple experience of placing a call between two GTalk clients, with diagnostic logging enabled. Then I looked for an indication of media relaying in the resulting log, and here are my findings.

Immediately after initializing the XMPP session, the client goes on sending the following stanzas before any IM application requests :

[007:201] [8a8] SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Mon Nov 06 10:48:29 2006
[007:201] [8a8]    <presence type="unavailable"/>
[007:201] [8a8] SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Mon Nov 06 10:48:29 2006
[007:211] [8a8]    <iq type="get" id="7">
[007:211] [8a8]      <query xmlns="google:relay"/>
[007:211] [8a8]    </iq>

The client eventually gets an answer from the server:

[007:581] [8a8] RECV <<<<<<<<<<<<<<<<<<<<<<<<< : Mon Nov 06 10:48:30 2006
[007:581] [8a8]    <iq to="romeo@gmail.com/Talk.v98734E454A" id="7" type="result">
[007:581] [8a8]      <query xmlns="google:relay">
[007:581] [8a8]        <token>
[007:581] [8a8]          CAESHgoVamxzZWd1aW5lYXVAZ21haWwuY29tEIO2uPPrIRoQEWhSGqW0sC45unw91a8uNg==
[007:581] [8a8]        </token>
[007:581] [8a8]      </query>
[007:581] [8a8]    </iq>

Later on during the exchange, it appears that the client is attempting a connection through HTTP to a relay.l.google.com host:

[138:820] [8a8] HTTPPortAllocator: starting request 1
[138:820] [8a8] HTTPPortAllocator: sending to host relay.l.google.com
[138:840] [8a8] HtmlWindow::GetHostInfo

[138:910] [38c] ReuseSocketPool - Creating new socket
[138:910] [38c] Resolving addr in PhysicalSocket::Connect
[138:910] [38c] === DNS RESOLUTION (relay.l.google.com) ===

[139:080] [38c] relay.l.google.com resolved to 216.239.37.126
[139:090] [38c] ReuseSocketPool - Opening connection to: relay.l.google.com:80
[139:391] [8a8] HTTPPortAllocator: HTTP request 1 succeeded with code 200

At this stage we enter the Google's Jingle transports candidates' negotiation. The client prepares the various candidates and then goes on trying them in sequence:

[139:401] [c8c] Jingle:Net[0:192.168.254.100]: Allocation Phase=Udp (Step=0)
[139:401] [c8c] Jingle:Port[rtp:local:Net[0:192.168.254.100]]: Added port to allocator
[139:401] [c8c] Jingle:Port[rtp:stun:Net[0:192.168.254.100]]: Added port to allocator
[139:401] [8a8] SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Mon Nov 06 10:50:41 2006
[139:401] [8a8]    <iq to="juliet@gmail.com/Talk.v100C3349241" type="set" id="37">
[139:401] [8a8]      <session xmlns="http://www.google.com/session" type="transport-info"
                                          id="3854034496"
                                          initiator="romeo@gmail.com/Talk.v98734E454A">
[139:401] [8a8]        <transport xmlns="http://www.google.com/transport/p2p">
[139:401] [8a8]          <candidate name="rtp" address="192.168.254.100"
                                                 port="1780" preference="1"
                                                 username="GYPRLTG33vQGFOTZ"
                                                 protocol="udp" generation="0"
                                                 password="47CqsrM5wDUyYbiK"
                                                 type="local" network="0"/>
[139:401] [8a8]        </transport>
[139:401] [8a8]      </session>
[139:401] [8a8]    </iq>

Following the specification the client determine the best connection from the candidates:

[140:462] [c8c] Jingle:Channel[rtp|__]: New best connection: Conn[0:rtp:local:192.168.254.100:1780->rtp:local:192.168.254.101:1237|C-w]
[140:462] [8a8] SEND >>>>>>>>>>>>>>>>>>>>>>>>> : Mon Nov 06 10:50:42 2006
[140:462] [8a8]    <iq to="juliet@gmail.com/Talk.v100C3349241" id="83" type="result"/>

At this point, the client has established a preferred connection for direct RTP, but it nevertheless goes on and prepare a further connection through the relay, but as my call was local, it did not issued the corresponding candidate, instead connecting point-to-point through the local UDP candidate:

[141:464] [c8c] Jingle:Net[0:192.168.254.100]: Allocation Phase=Relay (Step=1)
[141:464] [c8c] Jingle:Port[rtp:relay:Net[0:192.168.254.100]]: Added port to allocator
[141:464] [c8c] Connecting to relay via udp @ 216.239.37.126:19295

[142:525] [c8c] Jingle:Net[0:192.168.254.100]: Allocation Phase=Tcp (Step=2)
[142:525] [c8c] Jingle:Port[rtp:local:Net[0:192.168.254.100]]: Added port to allocator
[142:535] [c8c] Jingle:Conn[0:rtp:local:192.168.254.100:1783->rtp:local:192.168.254.101:1240|--w]: set_connected

From this scenario, I can hazard a hypothesis about the way a GTalk client negotiates media relaying.

  • The client uses a proprietary XMPP extension to query the relay service and receives an opaque token if the request is successful. I believe the token is destined to authorize a later use of the media relay to the client who made the request.
  • The client then retrieves the appropriate relay parameters though HTTP, probably presenting the previous token as an authorization reference, and not through an XMPP interface as my source stated.
  • The client then creates a transport candidate from the received parameters, and assign it a lower priority than the UDP and TCP candidates. It uses this candidate in the transport negotiation sequence. 

Although I am missing the proper test environment to verify my hypothesis, my first finding concens the inaccuracies in the description of an otherwise rather standard and expected technical solution. I leave you to decide why the mechanism has been described as XMPP, where in fact it appears to be a mix of HTTP and XMPP. I personally find this disturbing, as it can be leading to more inaccuracies of the sort finding their way into the Jingle specification. I am ready to accept that Google is concerned, but I do not believe disclosing this mechanism will ever put that company at risk. 

But, when I reflect on the position taken publicly by the authors of the XMPP Jingle specification in favor of using TURN to deal with media relaying negotiation, I am worried of the possible impact of their stand on the specification completion time and on the ensuing implementation by developers.

As I stated earlier, ICE and TURN are "work in progress" drafts, which have already been lingering at the IETF for over a year since Google announced the GTalk service. During that past year, the activity around the XMPP Jingle specification has been very slow to take up.

  • On one hand, the specification has only recently been published toward a last call before JSF council approval.
  • On the other hand, I believe the inherent complexity of media support libraries slows down the implementation of Jingle at the client level because developers need to master these new and unfamiliar concepts. 

Apart from notable support in open source IPBX, implemented by developers well versed in the intricacies of media communication programming, I only know of the the Jabbin project as a tentative implementation of a free Jingle client...

I believe it would not be realistic to push at all cost a Jingle specification highly dependent on forthcoming RFCs without any foreseeable time of publication of the said standards. The community would benefit more from an interim specification, leveraging instead existing RFCs, such as raw RTP, STUN and media relaying for NAT traversal, to be later updated when the ICE draft is made into an RFC. This approach would have the added advantage of being less complex from a programming stand-point, and would provide a more gradual learning curve for the developers to get accustomed to the subtleties of multi-media communication.

Technorati Tags: , , , , , , ,

Labels: , ,