A Lovely Harmless Monster

Really Simple Syndication ain’t so simple

Okay, I figured out what was causing the morass of RSS feed traffic in Robot Wars. It's not the RSS clients at fault, it's the way kiki generates the feed. It's quite, shall we say, idiosyncratic.

The way kiki builds the RSS feed is this: it looks at every page on the site in alphabetical order by URL slug. If that page has a designated tag, it adds it to rss.xml. I had this set to blog, so for each page it looks at, if it has the blog tag, it adds it to the feed. Note that, despite being the oldest entry in the feed, A foolish blunder is listed first because it starts with the letter A. So that's inconvenient.

It's okay though, because each entry has a <pubDate> attribute with the date the entry was posted, so some RSS readers will take all the entries and put them in reverse chronological order by publication date. Emphasis on some. Some of them just look at the posts and display them in whatever order they're in. If you're one of the unlucky holdouts still using the most recent version of FeedSlurp from 2009 (or new ones that just ignore this tag idk): sorry! You get to see all the posts in alphabetical order by URL slug. How does an RSS reader set up in this manner determine which entry, if any, is the "new one"? The answer is yes.

Kiki rebuilds the rss.xml file every time anyone accesses index.php, for any reason. That includes bots, so when I had my URL verified on my Mastodon profile, and a post of mine got boosted to 600 instances, all 600 instances pinged my website to make sure the verification was still valid, and the rss.xml file got rebuilt 600 times. Kiki really wants to be sure everything's included!

When RSS clients check to see if the feed's been updated, they don't check the file size, they check whether the file is literally the same as the last time it checked. When the rss.xml is rebuilt hundreds of times a day, the answer to that question is "no". So RSS clients will re-download the file however often they're set up to check it. Could be a couple times a day, could be 10 times an hour.

If a client re-downloads a 250k XML file 10 times an hour, that's 60 MB a day, 1.8 GB a month, 21.9 GB a year. Nearly a blu-ray's worth of data in each calendar year, mostly just to verify that some text is the same as the text that it saw last time.

There are two attributes you can add to an RSS feed that supposedly control how often feed readers will fetch them: <lastBuildDate> and <TTL>. The former is the date the feed was generated, and the latter is how many minutes the client is supposed to wait before trying to fetch the feed again. I don't know how commonly these attributes are respected, but kiki doesn't include them in the RSS feed by default, possibly because they're not widely used or possibly because the developer was unsure how it'll behave if the <lastBuildDate> is always now.

What I done

First of all, kiki no longer generates rss.xml. It now generates an intermediary XML file, and when I want to update the blog, there's a script I can manually invoke that will copy the contents of the freshly generated intermediary file into rss.xml. You should receive the benefits of this new system immediately, with no change in subscription required on your end.

I added a <lastBuildDate> attribute of whenever the feed was last modified, which seems sensible, and a <TTL> attribute of 180. I have occasionally posted two blog entries in the same day, but it's rare, and three hours should be more than enough lead time whenever this happens. I don't know if this will help RSS readers put things in the proper order if they're not already doing it with <pubDate>, but I figure it couldn't hurt.

I also changed which entries get included in the feed. Instead of anything with the blog tag, it's now anything with the new tag, which I will manually ensure is limited to the three most recent entries. As much as I like being able to read an entire blog archive within an RSS client, I figure it'll be less annoying for clients unable to determine newness if only 3 "new" entries show up every time the blog is updated instead of, like, a billion.

I've been using the Neocities RSS guide to help me format everything correctly, which has been very helpful. It's intended for use on a free Neocities account, which doesn't allow scripting languages, so the only way to do an RSS feed is either generate it on your own machine with a static site generator or do it by hand, like the site suggests. Doing it by hand seems a little absurd to me, though. Like you're already theoretically doing the the whole blog by hand in HTML, are people really out there then also doing the XML by hand? Way too much friction for me, I need it to be automated. That's why I was on bearblog before kiki was a thing. I want that stuff taken care of for me. The guide was still helpful for letting me know what my feed was missing, though.

The guide says this:

If you want to add HTML in the description that's possible as well, but you first have to run it through a tool that escapes special characters like <.

So uh, kiki doesn't do that? It just blats the raw HTML of the post into the description field. It seems fine, but if this makes your RSS reader break, let me know, there must be a ready-made PHP function I can plug in that'll sanitize the output.

Second, the guide doesn't really have any advice for making sure posts show up in the proper order other than manually placing the newest entries at the top of the XML file. I don't know how I'd go about doing this, other than adding a completely custom sorting algorithm, and I'm afraid that's above my pay grade at the moment. I have no doubt I could learn how to do this, given enough time, but I don't have that. Time, I mean.

One "fix", I suppose, would be to rename all of my blog entries so the file and URL slug starts with a decreasingly large number. The slug for A foolish blunder would be 9999-a-foolish-blunder, the slug for Coding as a craft would be 9998-coding-as-a-craft, etc. But in my opinion, this is a hideous solution. There's no craft there, other than in the Frankenstein sense.

There is one thing you, as the reader, can do if the regular feed isn't working for you, and you just want to be notified of new posts as they arrive with no fuss. Every time I update the blog, I make a post on the fediverse using a specific hashtag. You can subscribe to the RSS feed of that hashtag and it'll all be taken care of. You lose out on the ability to read the whole post from within your RSS reader, but hey. That's the price of progress I guess.

You may also, of course, follow me on the fediverse to receive the same updates; but I can see you wince at that suggestion. I get it. The fediverse is still social media. You're probably here, in the home of Indiana Webb, specifically because social media sucks. And I'm with you! I like blogs more than mastodon, I like owning my own stuff and using open protocols and having full control over the experience. It's just that, well, having full control sometimes means there are a few pies that need juggling. And without the benefit of the pie-juggling machine, sometimes I'll drop one on my face. If you can handle me at my pie-covered, you definitely deserve me at my blog-doing. $$pat$$

💡 Addendum

Oh my God, I just tracked down a bug that was driving me bananas. I couldn't put a proper timestamp on posts because, somewhere along the line, time was being converted from 24-hour format to 12-hour. It looked correct to me, but I was looking in the wrong place: the rss.php had a rogue function that used a lowercase h instead of an uppercase H. Of course. Literally no one cares but me, but I'm so happy to have fixed it, let me have my little victory dance.

Thoughts? Leave a comment