In case you hadn't noticed, when I'm not getting hit by flying trout, I like to think of myself as a bit of a real-time web guy. However, there's one thing I'd like clarification on, and I'm confident somebody will be able to shed some light on this:
What are the differences between XMPP PubSub and PubSubHubbub and in what situations should each be used?</p>
I do think I know some details about this but I’d like what I think I know to be confirmed or indeed corrected.</p>
My knowledge of the real-time web, and specifically real-time client push, comes from working for Caplin Systems who pioneered building Comet servers around 10 years ago. From working with Caplin Systems and with Comet servers I’ve gained an understanding of how connections are made using different technologies and maintained in different scenarios. I also believe that I have a good understanding of what the best connection method is to deliver real-time data whether the delivery be from server to client using real-time client push or server to server using PubSubHubub or XMPP PubSub.</p>
Persistent Connections</p>
Unless the frequency of data updates is very low, and if you want to truly deliver data in real-time, the best type of connection is a persistent one.</p>
As well as a persistent connection being able to deliver data faster it also means that your data subscriber does not have to handle a new HTTP connection for each piece of data it receives. This can actually be a really big thing.</p>
As part of building Kwwika I wrote a demo which integrated the real-time server push capabilities of Superfeedr with the real-time client push capabilities of Kwwika to build a real-time news reader (RTNR) (blog article). In order to receive updates from Superfeedr I had to implement a publisher-subscriber outside of the Kwwika infrastructure. I decided to write a PubSubHubbub implementation in ASP.NET so that I could use a few other .NET Kwwika components to easily integrate with Kwwika.</p>
So, each time Superfeedr has an update for the RTNR it establishes a HTTP connection to the RTNR PubSubhubbub server and sends the update which is then parsed and pushed into Kwwika to be distributed to subscribing web clients. If the RTNR is subscribing to something that is updating really frequently (such as “google” using Superfeedr track) then that can mean the RTNR server has to handle a lot of HTTP requests. Since I am running the demo on a small Amazon Windows instance running IIS it doesn’t take all that long, under a heavy load, for the server to stop responding.</p>
There must be a better way!</p>
Real-time client push</p>
By way of a comparison let’s first take a look at real-time client push.</p>
Before WebSockets provided the simplest way (but as yet not the most reliable since the WebSocket standard is not set in stone and isn’t supported cross-browser) to achieve real-time push to a web browser the best connection method was to maintain a streaming and persistent HTTP connection from the publisher to the client subscriber (using an IFRAME, XMLHttpRequest or XDomainRequest). The persistent connection means that the connection has already been established so as soon as the publisher has any new data for the subscriber it can instantly be pushed to it. This means that the data can be delivered to the web browser as close to real-time as possible.</p>
Real-time Server Push</p>
PubSubHubbub</p>
PubSubHubbub turns things on it’s head in comparison to real-time client push (it actually uses HTTP as it was designed and it’s HTTP streaming that uses things in reverse. See Reverse Ajax.). In this case the publisher still pushes the new data to the subscriber but it does it using a HTTP request. The problem with this is that for each push a new HTTP connection needs to be established and the data then needs to be transferred. Establishing a connection takes time and resources so clearly a single persistent connection is a better solution. This is where I think XMPP PubSub is a better solution.</p>
XMPP PubSub</p>
If you need your data to be delivered in truly real-time (or as close to real-time as web technology will allow) then you have to use a persistent connection between the publisher and the subscriber. It’s my understanding that since XMPP was developed as an instant messaging protocol that it uses a persistent connection.</p>
Conclusion</p>
Here are things as I see them:</p>
- If you want your data in real-time you should use a persistent connection between the publisher and subscriber.
- If you are making a server to server subscription to data that doesn't update all that often and instant real-time doesn't matter then PubSubHubbub is fine.
- If you are making a server to server subscription to data that updates very frequently then you need to use a persistent connection and XMPP PubSub is a must.
One thing to also consider is if you are subscribing to multiple sources which update frequently then maintaining a persistent connection to each of those sources, assuming they even allow you to, takes a considerable amount of effort and potentially resource. This is where a service like Superfeedr comes in to its own. They manage the subscriptions and connections to the sources for you which means you only need to maintain one persistent connection to Superfeedr. They do all the heavy lifting so you don't have to.