I thought this was mentioned once before, but in case it wasn’t, backups of all public posts exist at ifarchive.org. This happens once per quarter. Each backup includes the posts since the beginning (not just the new stuff), making it possible to get historical views even if old posts are later edited or deleted.
Private messages, private forums (such as for the IFComp authors) and user account information is not included. The .zip archive for each quarter contains a single large text file containing all posts.
I will have to look into what you’re talking about and see what might be causing that. Thanks.
(Update) I’m working on it now. There’s a bit in the export where it removes HTML in a brute force way (by taking out everything between < and > tags). That’s what’s causing code to end up missing, and then the rest of the text just vanish (beginning < with no ending >).
I don’t remember why I’m removing HTML. I think it was to try to keep things text-oriented. But obviously that’s a problem in situations like this, so I’m going to see what things look like without trying to remove HTML. It looks like there are also other places where HTML was posted intentionally to show examples, and that would be a problem as well. Probably just a bad and unnecessary idea on my part.
This seems to fix the problem. I still had to leave a little translating in place so that there wouldn’t be a bunch of unnecessary HTML markup for all the emoticons in posts, but I think that’s working too. If all’s well, I can either upload a new version to the archive now, or just wait and let it kick off on January 1st.
Pardon my insistence. Is there any possibility of obtaining the script? It would be fantastic to dump a backup like this from our spanish community forum database (foro.caad.es).
Yeah, I don’t mind. It’s written in Perl. There’s also a Python piece that zarf provided, but it doesn’t appear to be anything copyrighted or overly complex, so I’ll need to include that too. (I couldn’t figure out how to get the UTF encoding right, so the Python script undoubles the encoding that my export spits out.)