Now testing: Somewhat faster version of Counterfeit Monkey

I think this is enough of an improvement to warrant a new release:

https://dl.dropboxusercontent.com/u/24609432/Counterfeit%20Monkey%20Test.gblorb

At least for me it is now slow-but-playable with Lectrote. A full-game regtest takes about 40 seconds for this version versus a little over two minutes for the old one. I’ve also fixed a couple of bugs.

It would be nice if people could test it in web browsers and on portable devices. Report any bugs on Github or in this thread.

There are still some really low hanging fruit in terms of rewriting a little of the template code. I still haven’t gotten around to building a profiling Glulxe yet, but I’d appreciate seeing any profiles anyone can generate.

Here is a profile report of a run through the entire game, with the default ten slowest functions. Here is a zip with all the files used to create it, including a game compiled without Ultra Undo (otherwise the profiling will not work), in case anyone wants to have a proper look at it. It is interesting that it seems to spend so much time in VM_Save_Undo.

In general, it seems that the slowest remaining parts is approaching and conversing.

Approaching is when you type GO TO HOSTEL rather than N. The slow part isn’t really so much the actual pathfinding as the fact that the game will walk you to your destination silently, step by step. GO TO HOSTEL might translate to “Silently try going N, E, N, E, E”. The single slowest command in the testing script for the game is asking Higgate how we might return a book. That will make both you and her approach the Language Studies Seminar Room, step by step. For each of these steps (one step for you, one for her) the game makes basically the same checks as when you walk into a room normally. That is what makes it slow. A lot of these checks might be unnecessary; there is already code that skips certain parts depending on “if the player is hurrying,” i.e. approaching. There might also be situations where it makes sense to just teleport the player to the destination rather than walking step by step.

The conversation system still runs through a lot of quips every turn to see which ones are viable, i.e. allowed to be said. In theory that should not be necessary: it is basically a conversation tree of interconnected quips. It is analogous to room connections, and how the game doesn’t have to look through every room in the game each turn to see which ones you can go to. This might be quite a bit of work to change, though.

Is there perhaps a depth limit that is being hit tracing VM_Save_Undo (which is from Ultra Undo is it not?), it looks like it’s 5 ops and a whooping 220ms per call but looking at it I’d assume it’s child calls that involve file I/O and serialising the entire game state that’d be making it take that long.

VM_Save_Undo does its work with a VM opcode (@save or @saveundo), and is therefore counted as a single op. The work is implemented in the interpreter (C code) so it can’t be traced this way.

It’s easy to explain what it’s doing: the interpreter is saving the current RAM state. You speed this up by re-engineering the game to use less RAM.

Just to clarify: the profiled game is not using Ultra Undo. The profiling code will not work with the @restart, @restore, @restoreundo, or @throw opcodes, so for testing purposes Ultra Undo was commented out.

Sorry, I don’t quite understand this. An undo save state of the game seems to be 128 K. The interpreter is copying this in memory 549 times. That is the kind of thing you’d think would be almost instant on a modern computer, but here it takes over two minutes. Is it doing some kind of compression on this or something? Is serializing the game state really that cpu-intensive?

The VM does very simple (run-length encoding) compression. Keep in mind that when you see a 128K undo state, that’s after compression. CF’s RAM use is a bit over 4 megabytes. The interpreter is running through that much data, comparing it to the original game file, and squishing it down to 128K.

Could you try adding “#define SERIALIZE_CACHE_RAM (1)” to the glulxe.h header file and recompiling the interpreter? That might improve that.

Yes, that knocked VM_Save_Undo down to eighth place, with about 104 seconds shorter total running time.

EDIT: Commenting the #define VERIFY_MEMORY_ACCESS (1) line seems to knock another 75 seconds off the running time.

Okay. The idea of SERIALIZE_CACHE_RAM is that the interpreter saves the original game file in memory for comparison, instead of reading it off the disk each time.

I thought this was an unnecessary optimization, but obviously it’s worthwhile for large games. (And it doesn’t hurt for small games.) I’ll add that to the default Glulxe config.

Also, using Dannii Willis’s patch to make Glulxe profiling compatible with Ultra Undo, it is clear that it makes no difference performance-wise to write the game states to disk instead of keeping them in memory.

I recommend keeping VERIFY_MEMORY_ACCESS active. It costs a bit of time, but it prevents game bugs from possibly crashing the interpreter, which is strongly desirable even for an end-user interpreter.

It sounds useful for debugging, but is writing to “ROM” likely enough to warrant the runtime cost? Shouldn’t a game only crash itself if it has under/overrun bugs, why should the interpretter crash?

The interpreter is written in C. An overrun bug can do anything.

I should shoulder my share of the arghh here. The VERIFY_MEMORY_ACCESS bounds-checking code was only added in 2008, and I didn’t configure it on by default until 2012.

But, to be clear, that’s embarrassingly sloppy – it should have been turned on in 2008. Or, actually, in 2000.

How much of a performance hit would it be? Pretty neglible I’d assume? And much less than compiling Inform with Strict error checking on?

I think the discussion was based on Angstsmurf’s suggestion that the performance hit amounted to 75 seconds on the Counterfeit Monkey test case. It may be a case of a really small negligible hit adding up, as it seems like the cost is paid on every read/write of a byte, 16bit value, or 32bit value from the game’s address space.

As I understand VERIFY_MEMORY_ACCESS is changing the interpreter rather than changing the game, is it out of scope for the project? Also, I note github.com/DavidKinder/Git claims to be a faster interpreter, is it? And if the interpreter can be changed what kind of solutions can be considered, can say hardware memory protection be leveraged atleast where available?

Yes, making the Glulxe interpreter faster is of course worthwhile, but Counterfeit Monkey really doesn’t have that many performance problems on Glulxe, and none at all on Git.

My main goal has been to make the game run well on Parchment or Lectrote, and that would likely mostly involve changing the code of the game, not the interpreters.

if changing game source is an option, what about translating/transpiling some of the i6 code NI spits out directly to Javascript or asm.js? It seems to me like i6 functions could be mapped onto Javascript better than arbitrary Inform bytecode can, which would potentially bring massive gains - though it would be a lot of work.