Download the whole IF Archive

Wednesday, February 21, 2024

Comments: 24   (latest April 10)

Tagged: if, interactive fiction, ifarchive, archiving

I help run the IF Archive. I have for, oh, about 25 years now.

It's not a demanding job. Mostly the server just runs itself. We have a cadre of volunteers who file the games and write up the descriptions. (Thank them!)

Occasionally we change out some of the underlying server configuration, like when we started using a CDN for load balancing. But that's, like, once every few years.

Low maintenance is great. However, it means that we don't respond to feature requests very quickly. Or at all, sometimes.


Here's one we've never had a good answer for:

Dear IF Archive: I would like to download all your files so I can play all the games. How do I do that? Love, Suzie.

(Simulated request on closed track. Real-life Suzies may vary in their IF enthusiasm.)

When people ask this question, we mostly shrug and point at web-scraper tools. It's not hard to find all the files by browsing the folders. Or you can look at Master-Index.xml, which lists every individual file in easy-to-parse form. (Well, "easy" -- that's 13 megabytes of XML right there. Sorry, I hadn't heard of JSON at that point.)

For a while in the early 2000s we allowed a few people to use rsync to copy the Archive files and offer mirror servers. However, this was a serious hassle (early-2000s Linux firewall config? Not friendly) and it never entirely seemed worth the trouble. The CDN is much easier.

But then, how hard would it be to shove all the files into one big package and make that available for downloading? Disk space is cheap.


Long story short, on an experimental basis, we did that. The documentation is here, but it's short so I'll just play you the chorus.

If you want to download everything on the IF Archive in a single massive chunk, use this URL:

https://iftf-ifarchive-download.s3.amazonaws.com/ifarchive-all.tar.gz

That's 30 gigabytes, no foolin', which is why I haven't made that a hyperlink. If you grab that puppy, it should be on purpose. (And I don't particularly want automated web crawlers to grab it either. I mean, they will, but I'm not going to encourage them. AWS download cost is pennies but the pennies add up.)

That 30 GB file is updated weekly. It will grow over time, of course, but it's hard to say how fast.

Availability is subject to future review, as they say. It's an experiment! We'll see how the AWS fees stack up against utility.

I realize that very few of my loyal readers will have need of this feature. The number of people who web-scrape the Archive out of personal interest (rather than, you know, being a web-scraper bot) is probably countable on a grue's molars. But if that's you, feel free to try this new one-stop freebie.

Speaking of which: Would it be useful to have another download link for recent files? Say, all files touched in the past 30 days. That wouldn't be too hard to arrange.


Comments from Mastodon


Comments from Andrew Plotkin