Experimental parser

fiziwig · January 22, 2012, 9:07pm

In connection with my project to put together a lean, mean IF machine that runs in php on any web server (without any special server-side software) I’m testing out my “IIB Parser” IIB stands for “Ignorance Is Bliss” because the parser doesn’t know anything at all about grammar, and can’t tell a verb from an adjective, and yet still gets the job done.

The description of my project is here: fiziwig.com/intfic/design.html

and you can play in the parser sandbox with the live parser here: fiziwig.com/intfic/parser.php

The parser is 83 lines of php and the entire source code is listed at the bottom of the page on the parser sandbox page, along with all the object tables and command format tables used by the parser.

I also have a partial demo game (Hunt the Wumpus) at: fiziwig.com/intfic/wumpus/

Although the functionality is not complete yet, and I haven’t put the IIB parse into it yet either. But it does show how game state can persist without using cookies or server-side data files.

–gary

George · January 22, 2012, 9:59pm

I don’t understand that statement, since the explanation of the parser in the design document relies on grammatical concepts. What do you mean here?

fiziwig · January 22, 2012, 10:12pm

The first page of that document is old stuff. I went back and tossed out all the grammar-related stuff since I didn’t end up using that approach anyway. The document is now up to date with the code. (You may have to refresh the page in your browser if it still looks the same as before.)

George · January 23, 2012, 12:15am

Yes, that makes more sense now . That’s a very interesting technique. I think at heart most advanced IF parsers can be considered tokenizing scope parsers, for example mud.co.uk/richard/commpars.htm
and groups.google.com/group/rec.arts … ode=source.
.

fiziwig · January 23, 2012, 12:37am

Thanks for those links. It’s always interesting to study other approaches. Many years ago I played around with chatbots and did a lot of parsing code, but mostly is was built up from linguistic principles, and I borrowed the whole Brill Tagger for parts of speech tagging. This time around I’m looking to strip everything down to the bare essentials. Provided that it still works, of course.

–gary

George · January 23, 2012, 12:44am

Cool, I’d never heard of that. It does raise the question of backtracking (not that it would be significant) – how does the split/stat technique deal with possible backtracking based on changes in world state as the parser processes a command?

edit to add: and scope too, now that I think about it. Like the ‘cat in the hat’ vs. the ‘cat in the hat’. For example, in a room with a ‘cat in the hat’ and a cat in a hat, the command “give the cat in the hat to the cat in the hat”. Pretty dumb, but not totally out of the question!

fiziwig · January 23, 2012, 1:41am

Hmmm. Food for thought.

I just uploaded a newer version of the parser sandbox that now handles things like “Take the marble out of the box with the tongs”, or even: “using the long tongs, remove the red marble from the tin container.” I’m up to 135 lines of code now. I also added an “Anti-Alias” function. Basically, this simplifies the split words tables, which are now much shorter, by reducing the vocabulary. It changes words into their “normalized” word. For example, if I type “remove the red marble out of the tin box using the tongs” it re-writes that as “get the red marble from the tin box with the tongs.” before it even starts parsing.

Your first question, re changes of state “as the parser processes a command”. I’m not sure I follow what you mean. Being a php script the action is strictly turn-based because nothing can happen between moves. The script is not loaded and run until the server gets a command from the browser.

As for the second question, Wow! That would confuse a human. However, he is called “the cat in the hat” because he’s a cat, and he’s in a hat, so in a sense “the cat in the hat” is synonymous with “the cat which is in a hat.” So either they are both the same object, or they are indistinguishable objects. On the other hand, if you wanted to give the book titled “the cat in the hat” to the actual “cat in the hat” then maybe you’d have to put quotation marks around the book title? Or maybe hyphenate the name the-cat-in-the-hat? I don’t really know.

That sounds like a problem for the story author to grapple with. (Passing the buck, I know. But if the code is going to stay lean and mean I have to stay on the right side of the 80/20 line.)

But that does bring up one valid point I had neglected to consider: disambiguation by location. Given that the red marble is in the wood box and the blue marble is in the leather bag, “take the marble out of the bag” will currently return “ambiguous”, because there are two marbles in the world. But it really should know that the only marble in the bag is the blue one, so that has to be it. To add that to the parser I will need to add code to keep track of the world state, which I don’t currently have in there.

Back to the drawing board. That gives me something to do tomorrow.

Anyway, the latest version of the parser is here: fiziwig.com/intfic/parser.php

–gary

joningold · February 4, 2012, 12:45pm

Interesting. I like the approach. I can’t say I really understand it, though. My main question - how scalable is it? For instance, could you extend it to handle things like:

“open up the box”
“lift up the marble.”
“open the box up”
“climb up the ladder”
“climb down the ladder”

where that extra word - up - can be placed around sort of anywhere in the sentence (but could potentially be significant, as in the last two cases).

jon

Alex · February 4, 2012, 8:23pm

This parser approach sounds fundamentally similar to Quest’s parser. The concept of “split words” is analogous to the command patterns that Quest uses, as you have to have something to separate the “variable” parts of the player input.

The example of “boil soup in a kettle over a medium flame” would be handled with a Quest command pattern of “boil #object1# in #object2# over #object3#”, and you can specify alternate patterns by separating them with semicolons - so the full three lines of the “Expanding the Vocabulary of Commands” example on the linked page would be equivalent to:

boil #object1# in #object2# over #object3; boil #object1# over #object3# in #object2#; over #object3# boil #object1# in #object2#

You can handle extra optional words by specifying alternative forms of the command pattern in a similar way. For example:

open #object#; open up #object#; open #object# up

would be sufficient to handle “open the box”, “open up the box” and “open the box up” (as well as versions without “the”).

Personally I think this is easier to read syntax than the suggested function-calling syntax, but the approach to parsing sounds very similar.