TechBrew

Informative geekery on software and technology

Is rssCloud All Wet?

September 8th, 2009 by Mark Woodman

Rogers Cadenhead recently posted his thoughts on why the rssCloud concept “failed to catch on.”    This is a response to his post and an attempt to foster further discussion in the community.

Blue Sky

The main idea behind rssCloud is that your RSS reader can ask to be notified when a feed changes (push), rather than having to check the feed on a regular basis (pull).  This is accomplished by having your RSS reader subscribe to updates with the feed server.  When the feed is updated, the server will then notify your RSS reader accordingly.  This message pattern is known as publish/subscribe or just “pub/sub.”

On the face of it, it’s a great idea.   The benefits of pub/sub are well-understood.    Pub/sub (push) can be far more efficient than polling (pull) in many cases, especially when it saves the client from either having to make frequent connections to the server  and having to analyze the response looking for what has changed.

Unfortunately, the way rssCloud needs to be implemented has what I fear to be significant and potentially fatal flaws.   Before I dig into them,  it’s helpful to provide a little background on best practices using push or pub/sub architectures.

Best Push Practices with HTTP

The patterns for providing frequent updates to a client over HTTP are fairly well-known in Ajax / RIA arenas.  (See:  Comet for DHTML, or BlazeDS channels for Flash/Flex)   Here are the main ones:
  1. Polling:  Client connects to server and asks for an update.  The connection ends when the server has responded.   Repeat ad nauseum at the frequency needed by the client.  This is equivalent to what most RSS readers do today:  periodically check a feed for updates.
  2. Piggybacking:  Client connects to the server and asks for something else.  The server responds, notes the client’s subscriptions, and tacks on any pending notifications on the response to the request.  Combined with polling in an Ajax or RIA app, this can be a really efficient way to handle notifications.  (But, for RSS readers, there probably isn’t much to glean from this pattern.)
  3. Long-polling:  Client connects to server and asks for updates.  The connection stays open as long as the server allows, and the server will send multiple notifications while the connection is open.  In layman’s terms, this is akin to being put on hold but not hanging up the phone.  When the connection is closed or times out, the client will reconnect.   Sort of like what you do when you get disconnected from tech support.  Again.
  4. Streaming: Client connects to server and asks for updates.  The connection stays open ad infinitum using a dedicated socket, pretty much like you would get from a streaming media server.  Whereas you can usually implement the other approaches with a standard web server, this approach typically requires  your server and network infrastructure to be above the norm.
Among these solutions, two main tenets emerge:

Tenet #1:
It is the client’s responsibility to make the HTTP connection. We all take it for granted, but establishing a network connection involves a fair bit of work.   With all of these patterns, the burden of “connectability” is put on the client.  A server is supposed to be highly-available, but clients come and go.  If the client can’t contact the server, it is usually the client’s problem.  The work of establishing a connection and handling failed connections, retries, etc. is thus distributed to the clients.  This is an important feature as the volume of concurrent clients increases.

Tenet #2:  M
ake as few connections as possible and keep them open as long as possible.  This is because the HTTP handshake itself – along with authentication – is the most expensive part of the networked operation.   The fewer times a client/server has to jump through the connection hoops, the better.

A noteworthy aside:  Apple’s iPhone push architecture uses long-lived (persistent) sockets to push App notifications to each iPhone.  This is roughly equivalent to the long-polling or streaming patterns, but uses raw binary data instead of HTML or JSON.    In any case, both of the Tenets are still employed: your iPhone connects to the APNS and keeps the connection open as long as possible.

So, with these best practices in mind, lets take a look at rssCloud…

Clouded Flaw #1: Don’t Call Me, I’ll Call You

Currently, your RSS reader just uses polling to check a feed on a regular basis.  A decent RSS reader will do this using  HTTP HEAD requests, which are a very efficient way to check for updates.   A HEAD request just checks the timestamp on the feed without having to get the feed contents.  This is a nice bandwidth saver.

But if you really want instant-message-like updates on an RSS feed, it isn’t practical to make your reader check the feed every 5 seconds.   The connection overhead of contacting the server and waiting for the reponse would be obnoxious and just shy of a denial-of-service attack.  Recall Tenet #2 discussed earlier: “Make as few connections as possible.”  Those connections are expensive, after all.

So the idea behind rssCloud is great:  The server can just tell your reader when the feed has updated.  No unnecessary connections are made, right?

Unfortunately, there’s a major problem lurking in the shadows.  Recall Tenet #1 discussed above: “It is the client’s responsibility to make (and keep open) the HTTP connection.”   The connection/retry work should be distributed to the clients.  But with rssCloud, the burden of making the connections, handling retries, etc. is now shouldered completely by the server.

This level of work is similar to sending a mass email via SMTP.   The handshaking / retry work from the mailing daemon to each unique SMTP relay is a well-known problem.  Unless your server has a lot of distributed processing available, the time-of-delivery between the first recipient and the last recipient will grow as the list of subscribers grows.  The delay is exacerbated with every failed connection that must time-out before the next connection is attampted.   Peformance degrades on a linear scale:  The more subscribers, the longer it takes to deliver all the messages.  These delays are common in the world of SMTP:  We’ve all gotten an email 15 minutes after a coworker got the same email.

With email, we’re used to this kind of delay.  But if the goal of rssCloud is to have near-realtime updates, this kind of delay is a deal-killer. Unfortunately, there’s more….

Clouded Flaw #2: Your Fat Reader Got Fatter, Not Smarter

In order to support rssCloud server connections, your desktop RSS reader will have to run an embedded web server at all times to get notifications.  This is some significant development work for the folks who make your reader.  And for it to work correctly, you will need to have your firewall configured to allow incoming connections from the rssCloud server.   That requirement alone makes it a non-starter for many, many people in both residential and corporate environments.

And you had better leave your reader running at all times, even when you’re not at the computer.  Because if the server tries to send you an update, it couldn’t connect to your reader… strike 1.   Three strikes, and you’re off the subscription list for notifications.     The rssCloud server will (presumedly) try again later, but if you didn’t leave your reader running over the weekend, you might be auto-unsubscribed by Monday morning.  Which means the developers that make your reader will have to build in contingency plans for re-subscribing every time it starts up.

Also, lets say your reader was running, but there was some temporary network disruption right when the server was trying to connect to your reader.  You’ll have missed the notifications and never know it.  You’ll have a strike and never know it.  This is what happens when you make the server responsible for establishing the connection and break Tenet #1: your client never knows when something goes wrong.

If the responsibility was on your RSS reader (client) to connect to the server and ask for notifications, it could react to network disruptions and reconnect as soon as possible.  You could know how long you’ve been missing notifications.  But since rssCloud puts the connection burden on the server, the only way for your reader to know if you’re still “current” is to … do a GET or a HEAD on the RSS feed itself … Which is what RSS readers are already doing without rssCloud.

Back to square one.

Is There a Silver Lining?

Personally, I have grave misgivings about the current flavor of rssCloud.   It is trying to solve a hard problem, but has created some even harder problems for itself in the process.

On the bright side, the need is real, and lot of really sharp people are looking at the problem.

My advice is this: I believe that if rssCloud is to succeed on any significant scale, the server will need to get out of the call-the-client business.  Taking a page from Comet best practices, the server API should be fleshed out to accommodate long-polling, and let RSS clients do the connection / retry / recovery work.   This course correction would have several strong advantages:
  1. No embedded server in RSS readers required
  2. No firewall configuration for users
  3. Reduced server requirements for rssCloud server hosts
  4. Potentially reduced lag time in notifications, especially for large numbers of clients
  5. Reduced long-term state on the server; there won’t need to be subscriptions, just sessions
  6. Improved data integrity by making the client aware and responsible for reconnects/refreshes
The RSS world needs what rssCloud is offering, so my hope is that the specs will evolve to make this all possible.

What do you think?

→ 13 Comments Trackback URI | Tags: Feeds · News · Opinion

ROME 1.0 Released

March 11th, 2009 by Mark Woodman

ROME

It has been a long time coming, but the Java library for RSS and Atom utilities called ROME has finally made it to version 1.0 (changelog). Thanks to all of the contributers and the hard work on the dev team for making it possible!  

New to ROME? For a quick tutorial on how to get started, check out my piece on XML.com : “ROME in a Day: Parse and Publish Feeds in Java“.


→ No Comments Trackback URI | Tags: News

President Obama – Lost in Translation

January 20th, 2009 by Mark Woodman

Fox News has an “Automatic Transcription” feature for its videos.  The disclaimer of “may not be 100% accurate” is understated by 95%.  Here’s the presidential oath of Barak Obama, the way their transcription software heard it:

“I. — Barack Hussein Obama I solemnly swear Barack Hussein Obama do solemnly swear. That I will execute the office of president of the United States faithfully — execute … Get off faithfully the president the office of president and — I just — the United States — wheels. — the best of — ability and will miss my children. Preserve protect and defend the constitution of the United States. Preserve protect and defend the constitution of the United States so help you got so homey — congratulations mr.”

"I will miss my children."

"I will miss my children."

Woops.  To be fair, there were two people talking at the same time, which is a nightmare for voice recognition software.  

But you’ve got to love phrases like “Get off faithfully the president”,  ”will miss my children”, and “so help you got so homey.”   Maybe Fox News should hire out the transcription software as a writer for Saturday Night Live… it’s funnier than most of the people they’ve hired.

→ No Comments Trackback URI | Tags: Humor · News