Saturday, May 27, 2006

Breaking free from topic names

In many ages, it has been of classification as of authority: men always favored hierarchy. The latest discussion surrounding the publication of JEP-0060, the XMPP publish-subscribe standard extension, unfortunately gives us another example of this tendency. We have learned many times in the past, when implementing publish-subscribe systems, that referencing object of interest by identifiers with embedded structural information make life exceptionally difficult. This is commonly found in "topic name" based systems. Such systems are extremely easy to design and implement, and usually appear to be useful early in their lives. Over time, however, the usefulness decreases, and the inconvenience grows...

In a distributed information organization such as the Internet, using a hierarchy becomes inappropriate as soon as

  • the information has to be federated between multiple organizations,
  • the information source classification structures change over time.

We all know how slowly hierarchical organizations change, and history has taught us many times they usually evolve by revolution rather than evolution. Bob Wyman, sums up very well the inadequacy of the "topic name" approach in publish-subscribe systems when saying:

Historically, pubsub systems that have embedded structural or classification information in their topic names have found that it is very, very difficult to implement reorganizations since either there is no mechanism to notify subscribers of the changes in structure or the protocols to do so are very expensive and unreliable.

We can avoid this problem with JEP-0060 by ensuring that everyone knows that the node identifiers associated with topics must remain opaque.

Hierarchical "topic names" are a residue of "topic-based" systems, which make up the mass of the incumbent form of publish-subscribe. In those systems, subscriptions expressiveness is limited to simply specifying a topic name. Users would subscribe to a topic and be notified of every change published to that topic. However, it often arises that subscribers are only interested by a subset of a topic's information and wish to avoid unnecessary notifications. In "topic-based" systems, the solution brought to this problem goes through creating a sub-topic that would only notify a subset of the changes published to the main topic. But, as other users will have a different point of view on the world, the system will end up using a classification best represented as a graph. Unfortunately, as understanding the areas of interest becomes better over time, it is likely that new topics will be created to map the wider concepts' understanding, and existing topics will be moved from one point of the graph to another. Several answers have been proposed by academic research to solving this issue, but everyone agree that it is more robust to use opaque topic names and provide a different mechanism for expressing the structure of the topic-space. Bob pointed to one such research areas done in ISO on "topic maps" and classification systems. He further explained

The names chosen by XTM are a bit confusing but, what we have in JEP-0060 as a nodeID for a topic should be treated as a "subject indicator". Thus, it should be an unambiguous identifier for a particular subject or topic. (In XTM systems, the "subject indicator" is often just a unique URI or URN). In XTM systems, the "structure" of the space of "subjects" is provided via the "topic". In such a system, the "topic" is meta-data for the subject.

RDF would also provide an appropriate answer in de-coupling an object of interest from its reference. RDF and topic maps are both identity-based technologies. That is, the key concept in both is "symbols" representing identifiable "things", which statements can be made about. This document explains how one could use either technology and still provide the expected separation of concerns. The author also presents a method to cross map one technology into the other. Those interested by the subject can find a exhaustive coverage at the W3C of RDF/Topic Maps interoperability.

Again, as in the case of addressing, the important point is that the reference of the object of interest is distinct from the meta-data that describes that object's relationship to other things. This identifiers can be reused in multiple places in the overall classification structure and it is even possible to share references across possibly incompatible systems. System would only have to agree on using the same "subject indicators" even though they use different classification structures and names for various levels in their structures.

Technorati Tags: , , , , , , ,