Incrementally serving RSS feeds

We’re living in a Web 2.0 world where content is increasingly more being served using RSS feeds. When more and more people are tracking your site through RSS, these RSS feeds will start to amount for a rather big portion of your monthly bandwidth (especially if you consider that most readers refresh each feed once an hour).

Fortunately, there’s something we can do about this! You can use a service like rsscache.com or feedburner, but both aren’t half as smart as they could be (rsscache.com comes close, but they keep track of your IP address, which just doesn’t work in a world of laptops). Here’s the smart way:

The browser cache: say hello to an old friend.
If your website serves RSS feeds through a static file, your webserver will take care of most of this. However, most RSS feeds are generated on-the-fly. This has advantages: it’s easy to do and your feed is always brand new. There’s a disadvantage though: unless you explicitly add it, you lose all the advantages of the browser cache. Why?

To explain that, we have to make a short (I promise) trip into the world of HTTP.

When you try to view a webpage, your browser wil send a request to the webserver. The webserver will then send back the webpage you requested, but it will also send back some headers, invisible to the user. These headers control a lot of things, like the filetype of the served file and the way this file should be cached (starting to see where we’re going?). One of these headers is the ETag header. It’s basically a label for the file you just received. Every time the file changes on the server, it gets a new ETag. Now if your browser requests that file again, it will also send the ETag of the cached copy back to the webserver. The webserver will check that ETag, if it’s still valid, it’ll simply respond back to the browser: 304 Not Modified. And that’s it. No file transferred, zero bandwidth used. Your browser will use it’s cached copy, which is way faster than receiving it again. Everybody happy!

Unless you’re serving from a PHP script… You see, PHP scripts are dynamic, they get regenerated every time, so the webserver can’t calculate an ETag. Which means we have to do it ourselves. Thankfully, this isn’t very hard.

Now do it in pieces
But wait, don’t start cheering yet, we can do it even better! RSS has a nice property: the feed reader (which works just like a web browser, it also uses the ETag headers) will store all RSS items they’ve received. This means we only have to send it the new blog posts. And that’s a lot better than sending out everything you wrote in a month every time someone asks for it.

So how can we do this? *crickets* You guessed it! ETag! If we get a request with an ETag, we know exactly which version the client has. So we can easily find out which posts we have to send. Now isn’t that even better? Instead of sending out the entire feed again when you’ve posted a new part of your adventures, only the new content will be downloaded. And most of the time, that save you about 90% in bandwidth.

Give it to me!
Now here’s the best part: even though it all sounds quite complicated, it’s actually dead simple to add this. I’ve written a simple PHP class to easily add this to your website. For those interested, the code is over here. It’s a bit of a hack (I use a slightly different version to serve my own website), but it works. Usage instructions are inside the script. Let me know if you run into any problems. But: no money back when it scares your cat. Enjoy!

June 2, 2007 22:23 #php

Comments