I7: The hashing code of Counterfeit Monkey

Angstsmurf · May 22, 2016, 8:05am

I’m trying to port Counterfeit Monkey to Inform 7 6M62, and this is giving me a hard time. On 6G60 it produces:

01011000000000000000000000 is the hash of <bed> . 01011000000000000000000000 is the hash of <bede> . 00001000000010010000000000 is the hash of <temptress> . 00001000000010010000000000 is the hash of <empress> . 00001000000010010000000000 temptress with t removed.
6M62 (with the change to TEXT_TY_CharacterLength) gives a “Glulxe fatal error: Memory access out of range: (92A00)”.

I’ve experimented with different variants of this and read the release notes of 6L02, but I still have no idea how to fix this.

[code]“Hashtest” by Author.

The observatory is a room.

To say (N - number) as letter-hash:
(- ShowLetterHash({N}); -).

To decide which number is the letter-hash of (T - indexed text):
(- LetterHash({T}) -).

To decide which number is (X - number) with (Y - number) removed:
(- ({X} & (~{Y})) -).

To demonstrate letter-hash of (T - indexed text):
say “[fixed letter spacing][letter-hash of T as letter-hash] is the hash of <[T]> .”;

When play begins:
demonstrate letter-hash of “bed”;
demonstrate letter-hash of “bede”;
demonstrate letter-hash of “temptress”;
demonstrate letter-hash of “empress”;
let t be the letter-hash of “temptress”;
let r be the letter-hash of “t”;
say “[fixed letter spacing][t with r removed as letter-hash] temptress with t removed.”;

Include (-
[ LetterHash T i n c d b m;
! n = TEXT_TY_CharacterLength(T) post 6L02, I think?
n = IT_CharacterLength(T);
for (i=0: i<n: i++) {
d = -1;
c = BlkValueRead(T, i);
if ((c >= ‘a’) && (c <= ‘z’)) d = c-‘a’;
if ((c >= ‘A’) && (c <= ‘Z’)) d = c-‘A’;
if (d >= 0) {
b = 1;
while (d > 0) { d–; b = b2; }
m = m | b;
}
}
return m;
];
[ ShowLetterHash h i b;
for (i=0, b=1; i<26; i++, b=b2) {
if (h&b) print “1”; else print “0”;
}
];
-);[/code]

vlaviano · May 22, 2016, 8:42am

This produces the output that you want, but I haven’t looked deeply into the TEXT_TY/Blk stuff, so there might be lurking issues.

[ LetterHash T i n c d b m cp pk;
	cp = T-->0;
	pk = TEXT_TY_Temporarily_Transmute(T);
	n = TEXT_TY_CharacterLength(T);
	for (i=0: i<n: i++) {
		d = -1;
		c = BlkValueRead(T, i);
		if ((c >= 'a') && (c <= 'z')) d = c-'a';
		if ((c >= 'A') && (c <= 'Z')) d = c-'A';
		if (d >= 0) {
			b = 1;
			while (d > 0) { d--; b = b*2; }
			m = m | b;	
		}
	}
	TEXT_TY_Untransmute(T, pk, cp);
	return m;
];

Angstsmurf · May 22, 2016, 8:53am

It works! A million thanks!

Now we’ll just have to wait for the test runs to finish to see if the game is finally finishable.

What does TEXT_TY_Untransmute actually do?

vlaviano · May 22, 2016, 9:00am

From what I gather, strings can be packed or unpacked, and the transmute/untransmute switches between the formats.

It seems as though the text was in the wrong format for iterating over its characters the way the hash function wanted to.

Angstsmurf · May 22, 2016, 10:53am

It goes all the way to the ending now. Yay!

EDIT: Much slower than the 6G60 version, but still!

Peter_Piers · May 22, 2016, 11:00am

Oy vey. That’s disheartening. I was placing high hopes in your Open-Source-Counterfeit-Monkey project eventually resulting in something actually playable on mobile devices.

Angstsmurf · May 22, 2016, 11:06am

Don’t lose hope yet. I’d bet at least some of the slowness is due to bugs, and there’s still plenty of room for optimization.

Peter_Piers · May 22, 2016, 11:26am

Fingers crossed, best of luck to all you brave souls!

Draconis · May 22, 2016, 3:17pm

If the hashing is expensive, could most of it instead be done at compile-time? Have a script parse through the Inform 6 intermediate code, take the name of each object, hash it, and put the hash in as a literal property? The run-time hash changes for hard mode could be pre-computed as well, or even “now the hash of the pear is the hash of the prickly-pear” to let the script do it.

Angstsmurf · May 22, 2016, 4:00pm

That’s definitely an interesting idea. I think that it would mostly have an effect on the startup time, but that could certainly use a little shortening.

At least some of the slowness I saw was apparently caused by bad interaction between Threaded Conversation and Smarter Parser. I just replaced the lastest version of Smarter Parser with a modified older one, and it already seems faster.

EDIT: if anyone wants to try out my latest version, it can be found here:
https://github.com/angstsmurf/counterfeit-monkey/tree/6L38

zarf · May 22, 2016, 4:16pm

Don’t guess, measure!

(Running the game in Quixe or Lectrote is an easy way to do this. It dumps timing information to the Javascript console on every command.)

Other thoughts:

Compile-time hashing is a good plan. Supplement it with a debug command which recomputes all the hashes and checks them against the precomputed ones.

The stanza

b = 1;
while (d > 0) { d--; b = b*2; }
m = m | b;

…can be replaced with a single Glulx @astorebit opcode. (But maybe the whole LetterHash() function goes away when you have compile-time hashing? I don’t know if it’s re-called during play.) (I guess it will have to be if you implement that debug command!)

You don’t want a lot of transmute/untransmute operations during play. If that’s happening, you’ll want to create a permanent (precomputed) table of I6 arrays containing the name text in character form.

Angstsmurf · May 22, 2016, 7:13pm

I’m afraid my Glulx assembly skills are not quite up to snuff. How would I use @astorebit in this case? I tried to read the opcode reference but realized I don’t even know how to get the address of a variable.

vlaviano · May 22, 2016, 9:00pm

So, I got the @astorebit working, but I had to rewrite ShowLetterHash because of bit-numbering differences. This is the output that I get:

When comparing it to the original output in the OP, it’s different, but, thinking about it, this makes more sense. Each bit position from left to right corresponds to the presence of a letter of the alphabet from A to Z. So, both bed and bede start with 01011 (=ABCDE, with the B, D and E positions set). If that’s the case, then the bit patterns for temptress, empress, and temptress w/ t’s removed ought to be as above, right?

Here’s the code that I’m using:

Include (-
Array m1 --> 1;
[ LetterHash T i n c d cp pk;
	cp = T-->0;
	pk = TEXT_TY_Temporarily_Transmute(T);
	n = TEXT_TY_CharacterLength(T);
	m1-->0 = 0;
	for (i=0: i<n: i++) {
		d = -1;
		c = BlkValueRead(T, i);
		if ((c >= 'a') && (c <= 'z')) d = c-'a';
		if ((c >= 'A') && (c <= 'Z')) d = c-'A';
		if (d >= 0) {
			@astorebit m1 d 1;
		}
	}
	TEXT_TY_Untransmute(T, pk, cp);
	return m1-->0;
];
[ ShowLetterHash h i j n b;
	for (i=0: i<26: i++) {
             n = (3 - i / 8) * 8 + i % 8;
             for (j=0, b=1: j < n: j++, b=b*2);
             if (h & b) print "1"; else print "0";           
	}
];
-);

(This code is Glulx-only.)

Angstsmurf · May 22, 2016, 9:09pm

All right, so it has to be an array. Excellent work! I wonder how the old code ever worked at all.

vlaviano · May 22, 2016, 9:22pm

Actually, I think it did, but that we were both compiling to z-code where the words are 16-bits long, the problem being that 26 > 16.

When I compile the earlier code to glulx, it produces this same correct output.

(Also, it took me longer than I’d like admit to realize that the a in astorebit stands for array.)

Angstsmurf · May 22, 2016, 9:32pm

That makes a lot of sense.

If I’m reading the right number, the @astorebit version cuts the startup time from 40503 ms to 27273 ms in Lectrote. Not too shabby!

zarf · May 22, 2016, 11:15pm

If I’d been thinking when I posted earlier, I would have recommended @shiftl instead.

b = 1;
while (d > 0) { d--; b = b*2; }
m = m | b;

can be written as

@shiftl 1 d b;
m = m | b;

Then you don’t need the array.

Angstsmurf · May 23, 2016, 12:34am

Unfortunately, the slow 40503 ms version tested above turned out to be the old 6G60-compiled version.

A new compile with the original while-loop shows negligible difference between the three versions. Not that anything else was to be expected, really.

While-loop version: startup 25446 ms, restart 25047 ms, test it-construction 81209 ms.
Astorebit version: startup 25331 ms, restart 24776 ms, test it-construction 83168 ms.
Shiftl version: startup 26393 ms, restart 25915 ms, test it-construction 83497 ms.

Of course these are just single runs and not proper measurements in any way. If nothing else it shows the 6M62 version is actually faster in some respects. Now to investigate why the “don’t talk to no one rule” is checked 139 times every turn.

Dannii · May 23, 2016, 2:01am

Alternatively you could use a hard coded lookup array and replace five comparisons and a subtraction with one array access.

Angstsmurf · May 24, 2016, 2:21pm

Just so that you don’t think I’ve given up: I did write a script that inserted the hash codes into the i6 source and switched off the calculations at the start of the game. It shaved off about 2.5 seconds of the pause after answering yes to the initial question. Which is fine, but I think we can conclude that the hashing isn’t particularly expensive in this game. The major slowness comes from somewhere else.

EDIT: It is somewhat annoying that you can’t just have a long list of lines like “Garage has hash code 1358955008” in the Inform 7 code – the compiler will always get confused trying to work out exactly which object I mean, especially regarding generic things like drawers and chairs. Of course it is doable in theory, but too much work. Much easier to insert it into the i6 intermediary code.

It also annoyed me that I was unable to work out how to generate the expected codes on the fly in Perl – in the end I had to resort to reading from a pre-made list. Probably the fastest way, though.