Blue Sky
On the face of it, it’s a great idea. The benefits of pub/sub are well-understood. Pub/sub (push) can be far more efficient than polling (pull) in many cases, especially when it saves the client from either having to make frequent connections to the server and having to analyze the response looking for what has changed.
Unfortunately, the way rssCloud needs to be implemented has what I fear to be significant and potentially fatal flaws. Before I dig into them, it’s helpful to provide a little background on best practices using push or pub/sub architectures.
Best Push Practices with HTTP
- Polling: Client connects to server and asks for an update. The connection ends when the server has responded. Repeat ad nauseum at the frequency needed by the client. This is equivalent to what most RSS readers do today: periodically check a feed for updates.
- Piggybacking: Client connects to the server and asks for something else. The server responds, notes the client’s subscriptions, and tacks on any pending notifications on the response to the request. Combined with polling in an Ajax or RIA app, this can be a really efficient way to handle notifications. (But, for RSS readers, there probably isn’t much to glean from this pattern.)
- Long-polling: Client connects to server and asks for updates. The connection stays open as long as the server allows, and the server will send multiple notifications while the connection is open. In layman’s terms, this is akin to being put on hold but not hanging up the phone. When the connection is closed or times out, the client will reconnect. Sort of like what you do when you get disconnected from tech support. Again.
- Streaming: Client connects to server and asks for updates. The connection stays open ad infinitum using a dedicated socket, pretty much like you would get from a streaming media server. Whereas you can usually implement the other approaches with a standard web server, this approach typically requires your server and network infrastructure to be above the norm.
Tenet #1: It is the client’s responsibility to make the HTTP connection. We all take it for granted, but establishing a network connection involves a fair bit of work. With all of these patterns, the burden of “connectability” is put on the client. A server is supposed to be highly-available, but clients come and go. If the client can’t contact the server, it is usually the client’s problem. The work of establishing a connection and handling failed connections, retries, etc. is thus distributed to the clients. This is an important feature as the volume of concurrent clients increases.
Tenet #2: Make as few connections as possible and keep them open as long as possible. This is because the HTTP handshake itself – along with authentication – is the most expensive part of the networked operation. The fewer times a client/server has to jump through the connection hoops, the better.
A noteworthy aside: Apple’s iPhone push architecture uses long-lived (persistent) sockets to push App notifications to each iPhone. This is roughly equivalent to the long-polling or streaming patterns, but uses raw binary data instead of HTML or JSON. In any case, both of the Tenets are still employed: your iPhone connects to the APNS and keeps the connection open as long as possible.
So, with these best practices in mind, lets take a look at rssCloud…Clouded Flaw #1: Don’t Call Me, I’ll Call You
But if you really want instant-message-like updates on an RSS feed, it isn’t practical to make your reader check the feed every 5 seconds. The connection overhead of contacting the server and waiting for the reponse would be obnoxious and just shy of a denial-of-service attack. Recall Tenet #2 discussed earlier: “Make as few connections as possible.” Those connections are expensive, after all.
So the idea behind rssCloud is great: The server can just tell your reader when the feed has updated. No unnecessary connections are made, right?
This level of work is similar to sending a mass email via SMTP. The handshaking / retry work from the mailing daemon to each unique SMTP relay is a well-known problem. Unless your server has a lot of distributed processing available, the time-of-delivery between the first recipient and the last recipient will grow as the list of subscribers grows. The delay is exacerbated with every failed connection that must time-out before the next connection is attampted. Peformance degrades on a linear scale: The more subscribers, the longer it takes to deliver all the messages. These delays are common in the world of SMTP: We’ve all gotten an email 15 minutes after a coworker got the same email.
With email, we’re used to this kind of delay. But if the goal of rssCloud is to have near-realtime updates, this kind of delay is a deal-killer. Unfortunately, there’s more….
Clouded Flaw #2: Your Fat Reader Got Fatter, Not Smarter
And you had better leave your reader running at all times, even when you’re not at the computer. Because if the server tries to send you an update, it couldn’t connect to your reader… strike 1. Three strikes, and you’re off the subscription list for notifications. The rssCloud server will (presumedly) try again later, but if you didn’t leave your reader running over the weekend, you might be auto-unsubscribed by Monday morning. Which means the developers that make your reader will have to build in contingency plans for re-subscribing every time it starts up.
Also, lets say your reader was running, but there was some temporary network disruption right when the server was trying to connect to your reader. You’ll have missed the notifications and never know it. You’ll have a strike and never know it. This is what happens when you make the server responsible for establishing the connection and break Tenet #1: your client never knows when something goes wrong.
If the responsibility was on your RSS reader (client) to connect to the server and ask for notifications, it could react to network disruptions and reconnect as soon as possible. You could know how long you’ve been missing notifications. But since rssCloud puts the connection burden on the server, the only way for your reader to know if you’re still “current” is to … do a GET or a HEAD on the RSS feed itself … Which is what RSS readers are already doing without rssCloud.
Back to square one.
Is There a Silver Lining?
On the bright side, the need is real, and lot of really sharp people are looking at the problem.
- No embedded server in RSS readers required
- No firewall configuration for users
- Reduced server requirements for rssCloud server hosts
- Potentially reduced lag time in notifications, especially for large numbers of clients
- Reduced long-term state on the server; there won’t need to be subscriptions, just sessions
- Improved data integrity by making the client aware and responsible for reconnects/refreshes
What do you think?

Email


