Forum Backups and ATOM Feed

I thought this was mentioned once before, but in case it wasn’t, backups of all public posts exist at ifarchive.org. This happens once per quarter. Each backup includes the posts since the beginning (not just the new stuff), making it possible to get historical views even if old posts are later edited or deleted.

Private messages, private forums (such as for the IFComp authors) and user account information is not included. The .zip archive for each quarter contains a single large text file containing all posts.

http://ifarchive.org/indexes/if-archiveXinfoXintficforum.html

Also, the ATOM feed for the forum is available here:

feed.php

As with the backup, this includes only the public message boards.

Thanks! This is very valuable.

I downloaded the 20131001 version because I want to local-search into code blocks[*] but… the backup is broken, definitely.

Just compare this post with the one backed up (line 224990).

Some code is lost (angle brackets confuses the thing, I think). Even text is lost after the first code block.

Who can fix this?

Is the script used to generate the backup available?

Thank you very much.

[*] The search function of the forum doesn’t seems to index code, am I right about this?

Yep, definitely an angle-bracket problem. Thanks for catching that.

Merk will have to be the one to look into it.

I will have to look into what you’re talking about and see what might be causing that. Thanks.

(Update) I’m working on it now. There’s a bit in the export where it removes HTML in a brute force way (by taking out everything between < and > tags). That’s what’s causing code to end up missing, and then the rest of the text just vanish (beginning < with no ending >).

I don’t remember why I’m removing HTML. I think it was to try to keep things text-oriented. But obviously that’s a problem in situations like this, so I’m going to see what things look like without trying to remove HTML. It looks like there are also other places where HTML was posted intentionally to show examples, and that would be a problem as well. Probably just a bad and unnecessary idea on my part.

Try this: http://www.intfiction.org/intfic-archive-20131025.zip

This seems to fix the problem. I still had to leave a little translating in place so that there wouldn’t be a bunch of unnecessary HTML markup for all the emoticons in posts, but I think that’s working too. If all’s well, I can either upload a new version to the archive now, or just wait and let it kick off on January 1st.

Perfect!
That was fast. Thanks, Merk!

Pardon my insistence. Is there any possibility of obtaining the script? It would be fantastic to dump a backup like this from our spanish community forum database (foro.caad.es).

Yeah, I don’t mind. It’s written in Perl. There’s also a Python piece that zarf provided, but it doesn’t appear to be anything copyrighted or overly complex, so I’ll need to include that too. (I couldn’t figure out how to get the UTF encoding right, so the Python script undoubles the encoding that my export spits out.)

I’ve attached it to a PM for you.

Got it. I’ll study and try the script-combo (with your kindly .txt) in a few days.

Thank you both.

Please do. As you say, all it does is take double-UTF8-encoded text and unscrews it back to correctly-UTF8-encoded.