TechBrew

Informative geekery on software and technology

Is rssCloud All Wet?

September 8th, 2009 by Mark Woodman

Rogers Cadenhead recently posted his thoughts on why the rssCloud concept “failed to catch on.”    This is a response to his post and an attempt to foster further discussion in the community.

Blue Sky

The main idea behind rssCloud is that your RSS reader can ask to be notified when a feed changes (push), rather than having to check the feed on a regular basis (pull).  This is accomplished by having your RSS reader subscribe to updates with the feed server.  When the feed is updated, the server will then notify your RSS reader accordingly.  This message pattern is known as publish/subscribe or just “pub/sub.”

On the face of it, it’s a great idea.   The benefits of pub/sub are well-understood.    Pub/sub (push) can be far more efficient than polling (pull) in many cases, especially when it saves the client from either having to make frequent connections to the server  and having to analyze the response looking for what has changed.

Unfortunately, the way rssCloud needs to be implemented has what I fear to be significant and potentially fatal flaws.   Before I dig into them,  it’s helpful to provide a little background on best practices using push or pub/sub architectures.

Best Push Practices with HTTP

The patterns for providing frequent updates to a client over HTTP are fairly well-known in Ajax / RIA arenas.  (See:  Comet for DHTML, or BlazeDS channels for Flash/Flex)   Here are the main ones:
  1. Polling:  Client connects to server and asks for an update.  The connection ends when the server has responded.   Repeat ad nauseum at the frequency needed by the client.  This is equivalent to what most RSS readers do today:  periodically check a feed for updates.
  2. Piggybacking:  Client connects to the server and asks for something else.  The server responds, notes the client’s subscriptions, and tacks on any pending notifications on the response to the request.  Combined with polling in an Ajax or RIA app, this can be a really efficient way to handle notifications.  (But, for RSS readers, there probably isn’t much to glean from this pattern.)
  3. Long-polling:  Client connects to server and asks for updates.  The connection stays open as long as the server allows, and the server will send multiple notifications while the connection is open.  In layman’s terms, this is akin to being put on hold but not hanging up the phone.  When the connection is closed or times out, the client will reconnect.   Sort of like what you do when you get disconnected from tech support.  Again.
  4. Streaming: Client connects to server and asks for updates.  The connection stays open ad infinitum using a dedicated socket, pretty much like you would get from a streaming media server.  Whereas you can usually implement the other approaches with a standard web server, this approach typically requires  your server and network infrastructure to be above the norm.
Among these solutions, two main tenets emerge:

Tenet #1:
It is the client’s responsibility to make the HTTP connection. We all take it for granted, but establishing a network connection involves a fair bit of work.   With all of these patterns, the burden of “connectability” is put on the client.  A server is supposed to be highly-available, but clients come and go.  If the client can’t contact the server, it is usually the client’s problem.  The work of establishing a connection and handling failed connections, retries, etc. is thus distributed to the clients.  This is an important feature as the volume of concurrent clients increases.

Tenet #2:  M
ake as few connections as possible and keep them open as long as possible.  This is because the HTTP handshake itself – along with authentication – is the most expensive part of the networked operation.   The fewer times a client/server has to jump through the connection hoops, the better.

A noteworthy aside:  Apple’s iPhone push architecture uses long-lived (persistent) sockets to push App notifications to each iPhone.  This is roughly equivalent to the long-polling or streaming patterns, but uses raw binary data instead of HTML or JSON.    In any case, both of the Tenets are still employed: your iPhone connects to the APNS and keeps the connection open as long as possible.

So, with these best practices in mind, lets take a look at rssCloud…

Clouded Flaw #1: Don’t Call Me, I’ll Call You

Currently, your RSS reader just uses polling to check a feed on a regular basis.  A decent RSS reader will do this using  HTTP HEAD requests, which are a very efficient way to check for updates.   A HEAD request just checks the timestamp on the feed without having to get the feed contents.  This is a nice bandwidth saver.

But if you really want instant-message-like updates on an RSS feed, it isn’t practical to make your reader check the feed every 5 seconds.   The connection overhead of contacting the server and waiting for the reponse would be obnoxious and just shy of a denial-of-service attack.  Recall Tenet #2 discussed earlier: “Make as few connections as possible.”  Those connections are expensive, after all.

So the idea behind rssCloud is great:  The server can just tell your reader when the feed has updated.  No unnecessary connections are made, right?

Unfortunately, there’s a major problem lurking in the shadows.  Recall Tenet #1 discussed above: “It is the client’s responsibility to make (and keep open) the HTTP connection.”   The connection/retry work should be distributed to the clients.  But with rssCloud, the burden of making the connections, handling retries, etc. is now shouldered completely by the server.

This level of work is similar to sending a mass email via SMTP.   The handshaking / retry work from the mailing daemon to each unique SMTP relay is a well-known problem.  Unless your server has a lot of distributed processing available, the time-of-delivery between the first recipient and the last recipient will grow as the list of subscribers grows.  The delay is exacerbated with every failed connection that must time-out before the next connection is attampted.   Peformance degrades on a linear scale:  The more subscribers, the longer it takes to deliver all the messages.  These delays are common in the world of SMTP:  We’ve all gotten an email 15 minutes after a coworker got the same email.

With email, we’re used to this kind of delay.  But if the goal of rssCloud is to have near-realtime updates, this kind of delay is a deal-killer. Unfortunately, there’s more….

Clouded Flaw #2: Your Fat Reader Got Fatter, Not Smarter

In order to support rssCloud server connections, your desktop RSS reader will have to run an embedded web server at all times to get notifications.  This is some significant development work for the folks who make your reader.  And for it to work correctly, you will need to have your firewall configured to allow incoming connections from the rssCloud server.   That requirement alone makes it a non-starter for many, many people in both residential and corporate environments.

And you had better leave your reader running at all times, even when you’re not at the computer.  Because if the server tries to send you an update, it couldn’t connect to your reader… strike 1.   Three strikes, and you’re off the subscription list for notifications.     The rssCloud server will (presumedly) try again later, but if you didn’t leave your reader running over the weekend, you might be auto-unsubscribed by Monday morning.  Which means the developers that make your reader will have to build in contingency plans for re-subscribing every time it starts up.

Also, lets say your reader was running, but there was some temporary network disruption right when the server was trying to connect to your reader.  You’ll have missed the notifications and never know it.  You’ll have a strike and never know it.  This is what happens when you make the server responsible for establishing the connection and break Tenet #1: your client never knows when something goes wrong.

If the responsibility was on your RSS reader (client) to connect to the server and ask for notifications, it could react to network disruptions and reconnect as soon as possible.  You could know how long you’ve been missing notifications.  But since rssCloud puts the connection burden on the server, the only way for your reader to know if you’re still “current” is to … do a GET or a HEAD on the RSS feed itself … Which is what RSS readers are already doing without rssCloud.

Back to square one.

Is There a Silver Lining?

Personally, I have grave misgivings about the current flavor of rssCloud.   It is trying to solve a hard problem, but has created some even harder problems for itself in the process.

On the bright side, the need is real, and lot of really sharp people are looking at the problem.

My advice is this: I believe that if rssCloud is to succeed on any significant scale, the server will need to get out of the call-the-client business.  Taking a page from Comet best practices, the server API should be fleshed out to accommodate long-polling, and let RSS clients do the connection / retry / recovery work.   This course correction would have several strong advantages:
  1. No embedded server in RSS readers required
  2. No firewall configuration for users
  3. Reduced server requirements for rssCloud server hosts
  4. Potentially reduced lag time in notifications, especially for large numbers of clients
  5. Reduced long-term state on the server; there won’t need to be subscriptions, just sessions
  6. Improved data integrity by making the client aware and responsible for reconnects/refreshes
The RSS world needs what rssCloud is offering, so my hope is that the specs will evolve to make this all possible.

What do you think?

Trackback URI | Tags: Feeds · News · Opinion

13 responses so far ↓

  • 1 Rogers Cadenhead // Sep 8, 2009 at 7:25 pm

    Great post. There are a lot of technical considerations that are being glossed over in the excitement of dusting off RSSCloud to provide “real-time RSS.”

    There’s another that I wonder about. If clouds make Twitter-style services possible over RSS, that means people posting dozens of short messages a day.

    Under RSSCloud, each one will trigger a request of the full RSS feed.

    That’s a lot of bandwidth just to get one short update after each notification. Wouldn’t it be preferable to pass the new item to the client instead of sending a notification to trigger RSS polling?

  • 2 Ken Stewart | ChangeForge // Sep 8, 2009 at 8:33 pm

    It is probably a fair assessment to note that I am viewing this from an end-user perspective, so the attention to helping describe the impact of technical decisions was needed and welcome – so thank you for that.

    I had read through Dave’s site, and was admittedly a bit lost on why “real time RSS” is considered so important. After reading your article, I’m not entirely sure I see the benefits above and beyond what is presently defined – but neither am I the most educated in this area of expertise.

    So with that said, I felt your parallel to SMTP handling (of which I know a bit more about) to be very useful and on-point as I attempted to discern client vs. server calls. Overall, I have to say I don’t really understand why we would need to adopt a push mechanism for this type of technology given as it would seem to begin to mirror other protocols already in place – so why not leverage those in a conjunctive fashion (again, without understanding the full technical scope).

    In summary, I agree with your assertions because I usually attempt to balance my background within business network administration, project management, and line of business focus to draw conclusions as to whether and how technology can be leveraged to complete a given business objective.

    That’s where the rubber leaves the road for me, as I’m hard pressed to see how the energy exerted to move to a new standard supersedes the blossoming adoption of a technology which still seems mysterious to many out there.

    Your thoughts are appreciated, and thank you for taking some time to outline this. I welcome your feedback and assistance in helping me understand if I might be missing something that could shift the perspective.

    Warmest Regards,
    K

  • 3 Is rssCloud All Wet? | sull is vocally active // Sep 8, 2009 at 8:38 pm

    [...] Is rssCloud All Wet?. [...]

  • 4 sull // Sep 8, 2009 at 10:17 pm

    nice post.

    let me throw some vague stuff out there and i’d like to hear your thoughts.

    html5 web sockets

    html5 offline apps and/or google gears

    webapps (wrapped in mozilla prism, fluid or other app wrappers for single-site-apps)

    p2p / client / server apps – transfer rss sub status log file that lists all feeds that have been updated (since last transfer) and client pulls down those feeds or only those new feed items (fragmented feeds).

    webhooks

    just random thoughts. maybe you can ellaborate for me ;)

  • 5 Mark Woodman // Sep 8, 2009 at 11:53 pm

    Sull,

    I had mulled over the HTML 5 Web Sockets API when writing this post, but decided I was being long-winded enough without bringing it up. :) While I think the concept is pretty cool, I’m not sure it’s a good fit for RSS. My primary hesitation is that it brings in a whole dependency chain on HTML 5 and DOM handling. That’s a heavy stack on top of “plain old XML”.

    As for “fragmented feeds”, I’ve been kicking around the term “PartialRSS” lately. As Rogers has commented, the best scenario for push notifications would be to actually push the deltas (new items) rather than just a “something changed” message.

    I’ll let you expand on the rest of your own vagaries. Lets hear it!

  • 6 Mark Woodman // Sep 9, 2009 at 12:14 am

    Rogers,

    Wouldn’t it be preferable to pass the new item to the client instead of sending a notification to trigger RSS polling?

    Absolutely. Sending delta updates would be a better use of the notification because it cuts the communications overhead in half… the new content is the notification.

    Along those lines, it is easy to think up an RSS server API that would facilitate “PartialRSS” feeds which only include what you want. For example:

    The entire feed: http://somehost/somefeed
    Only items since a given time: http://somehost/somefeed?since=09082009,1300
    Only the items before a given time: http://somehost/somefeed?before=09082009,1300
    Only the most recent item: http://somehost/somefeed?limit=1

    And so on. A server with this kind of flexibility could then more easily support notifications that pushed out the newest items to clients.

  • 7 Martin Probst // Sep 9, 2009 at 2:56 am

    Good analysis. One thing you fail to mention is the overhead caused at the server by managing the persistent connections and reader state.

    Currently a webserver handing out static XML files can serve RSS. It doesn’t need any special knowledge, storage, etc. for the clients.

    With this proposal, and all pub/sub things, the clients subscribe at the server, so now the server needs to remember subscribed clients, and for each of them potentially the list of things they haven’t seen yet (this is probably why rssCloud apparently avoids partial updates – they are too heavy for the server). And the server has to remember which clients seems to be dead, queue up changes for slow clients, etc.

    The individual storage needed for one client is certainly not much, but if you have a popular page with 100′000s of subscribers, it’s going to kill you, while handing out a static file with conditional GETs (I think no one uses HEAD for that) is at least possible, and can be easily supported by proxies etc.

    I think pub/sub will only make sense in certain EAI scenarios where you have a very limited number of clients, and they all have the resources and technical skill to set up web servers.

  • 8 James Holderness // Sep 9, 2009 at 2:10 pm

    First, a technical nit-pick: RSS readers typically use ETags or Last-Modified headers to check for updates efficiently – not HTTP HEAD.

    As for some of the other criticisms of RSS cloud that have been raised:

    Firewalls aren’t necessarily a show-stopper. Many modern routers include support for UPnP which enables a client to open up ports on the firewall as needed. So while this may limit the potential users of RSS cloud on the deskop, it’s not like everyone behind a firewall is screwed.

    Clients don’t have to fetch the full feed when they receive an update notification. Many feed readers have had support for partial feed updates via RFC 3229 for years now. If necessary a server could limit its cloud access to only those clients that supported partial feeds.

    As for long-polling and streaming, I don’t believe those techniques are practical when you have lots of feeds. There is an upper limit to how many TCP connections you can keep open at once, and even if you don’t hit that limit, it’s wasteful keeping a bunch of connections idling.

    Now if you had a tiered system, where you connected to a small number of hubs that in turn connected to your hundreds of feeds, a permanent connection makes more sense. I’m assuming Google’s PSHB works something like that. However, there need to be a large number of hubs to make this kind of thing palatable, otherwise it’s just another centralised point-of-failure.

    I don’t think RSS cloud is without its problems, but I’m not yet convinced that it’s completely unworkable. As a client developer, the only reason I never considered implementing it in the past was because there was no significant server support. With Wordpress backing, though, I would think it’s at least worth a look now.

    Regards
    James

  • 9 Mark Woodman // Sep 9, 2009 at 2:45 pm

    James,

    Thanks for the clarifications re: ETags/headers. I did a little hand-waving on that part, and I appreciate the nitpick.

    And thanks as well for bringing up RFC 3229. I haven’t seen much lately on RSS reader support for that spec. Anyone know of a reasonably current survey?

    Your point on the practicality of long-polling/streaming is well-taken : it isn’t practical to do that with a large number of feed providers. Just like Apple’s Push Notification Service, this is really only practical with a centralized service provider. FeedBurner and Pingomatic both filled similar niches… highly-available and centralized services.

    There is a trade-off between having decentralized service endpoints with degraded reliability VS a centralized service endpoint and higher reliability.

    It’s important to note that Dave Winer doesn’t want the centralization, and doesn’t sound concerned about reliability. He writes: “I designed this to be decentralized … I’m not risking anything, because we know that polling works for RSS. rssCloud is an optimization, its purpose is to make RSS faster. But if it fails RSS still works.”

    If rssCloud is really just intended to be an optional “nice if it works, oh well if it doesn’t” feature, then the implementation details probably don’t matter that much. It will be interesting to see how/if it catches on… my experience is that users get pretty upset when they come to rely on something that starts to degrade in performance. Maybe it will take some initial successes before the pain drives rssCloud to evolve.

  • 10 Rogers Cadenhead // Sep 9, 2009 at 10:10 pm

    Many feed readers have had support for partial feed updates via RFC 3229 for years now.

    They do? Where? Where? I had no idea.

  • 11 James Holderness // Sep 10, 2009 at 2:37 pm

    I haven’t checked recently, but from the tests I’ve done in the distant past I know that FeedDemon, IE7 (and thus the Windows Feed Platform), JetBrains Omea, NewzCrawler, RSS Bandit, and Snarfer all support RFC 3229.

    Also, there’s an old list of implementations (both client and server) here:
    http://wyman.us/main/2004/09/implementations.html

    I actually expected support to be higher than that (and perhaps these days it is), because for a client, supporting RFC3229+feed is dead simple – like your cat walking over the keyboard and accidentally implementing it (assuming you already support etags/last-modified).

    And Rogers, I’m surprised you had no idea, considering RFC3229 was at one point on the todo list for the RSS BPP. I kind of thought it was common knowledge for RSS geeks.

    Back on the subject of RSS cloud – like Dave, I think of it as a neat optimization, rather than a vital component. However, for people for whom real-time communication is critical, I can see how RSS cloud would not be adequate.

    Still, I’m keen to play with it when I have some spare time. I know it’s pointless, considering I typically read feed articles hours, if not days, after they’ve been published, but semi-real-time updates would just be cool in a geeky kind of way.

  • 12 Rogers Cadenhead // Sep 15, 2009 at 2:20 pm

    Regarding the evolution of RSSCloud, I think it’s highly unlikely the changes you suggest will be considered *unless* the revision of the interface becomes a public process.

    I’ve written up some thoughts on that regard on my blog today:

    http://workbench.cadenhead.org/news/3559

  • 13 The rssCloud Debate | Dramatic Bloggger // Dec 19, 2009 at 1:51 pm

    [...] There’s a Reason RSSCloud failed to Catch On. I necessary read. Then Mark Woodman wrote Is rssCloud All Wet? Another necessary read. Both Rogers and Mark made valid points why rssCloud cannot succeed. Dave [...]

Leave a Comment