Tom Says: Code something crazy every day you feel like it!
I know that my Atom and RSS feeds have been misbehaving. It bothered me, too. But today I pushed in some changes which should normalize things, and I think they could be useful for other people, too. They're Ruby code, made for my Rails site, but they work outside Rails too.
But take note! These are hacks that work for me because I hand-code the HTML for my articles and understand the trade-offs of the string-replacement approach. It seems robust, but not that robust.
Broken images and links in the feed? When URLs are relative and the HTML is rendered anywhere but on the site – like in an RSS reader – relative URLs aren't relative to the site any more. I use this naïve helper to resolve the difference by making the URLs absolute.
def absolutify_urls(html)
require "uri"
site = "http://alltom.com/"
html.gsub(/(src|href)="([^"]*)"/) do |url|
$1 + '="' + URI.join(site, $2).to_s + '"'
end
end
Note: Perhaps this function is a bit too naïve. URI failed with the exception "URI::InvalidURIError (bad URI(is not URI?): http://groups.google.com/group/comp.os.plan9/browse_thread/thread/0337a915236e0379 )" because the URL has a space character at the end (see it?). This took out my Atom feeds for a little while.
When I want to split an article into sections using header tags, I need them to start at <h4> on the site to fit in with my layout, but <h1> in RSS feeds where I have no layout. I choose to base headers at <h1> and shift them down when needed. Here is my naïve code for shifting headers down by three.
def adjust_header_levels(html)
html.gsub(/<(\/?)h([0-4])>/) do |tag|
"<" + $1 + "h" + ($2.to_i + 3).to_s + ">"
end
end
Who else can't wait for the <section> and <h> tags?
Posted Mar 24, 2007, in the afternoon. Updated updated Jan 13, 2008, in the evening: Note about fragility of absolutify_urls.