intfiction.org

The Interactive Fiction Community Forum
It is currently Tue Dec 18, 2018 7:29 pm

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 11 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Replacing the Parser?
PostPosted: Tue Jun 19, 2018 3:02 pm 
Offline

Joined: Fri Oct 18, 2013 10:13 am
Posts: 2669
Location: The Midwest
During a discussion in a different thread, it was mentioned that Inform's parser doesn't deal well with syntactic ambiguity. That, along with Graham Nelson's recent (and exciting!) talk on the future of Inform, and the fact that my new job involves parsing natural language, has made me wonder…

Why doesn't the Inform parser use some standard parsing algorithm, like Earley, and produce proper parse trees?

This seems like it could improve disambiguation by orders of magnitude, and Glulx's memory model offers plenty of room for the data structures involved.

Is there a strong reason why this is a bad idea, aside from the fact that it wouldn't fit in the Z-machine? And how much hubris would be involved in trying to craft a replacement?

_________________
Daniel Stelzer


Top
 Profile Send private message  
Reply with quote  
PostPosted: Tue Jun 19, 2018 3:55 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 5840
Quote:
Why doesn't the Inform parser use some standard parsing algorithm, like Earley, and produce proper parse trees?


Because it was written in 1992 and incrementally upgraded since then, never rewritten from scratch.

Quote:
And how much hubris would be involved in trying to craft a replacement?


Writing an IF parser is always hubris. :) Many have started, few have finished the job. If you're aiming at a "mature" equivalent of TADS/Inform, I mean.

Glulx's memory model offers plenty of room, but trying to use it for heap allocation is no fun. It wasn't designed for that, and my retrofits aren't very good.

Other than that, you've just got the usual problem of writing an IF parser, which is that it's about a year's intensive work. That's my rule of thumb.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Wed Jun 20, 2018 6:30 pm 
Offline
User avatar

Joined: Sat Jun 25, 2016 12:13 pm
Posts: 251
Plugging in an off-the-shelf parser isn't totally straightforward. You have to deal with resolution of context against the world model. If you do this too eagerly, you can fail.

Consider a "simple" world with just "on" and "in";

> put the red cup and saucer in the bowl on the table.

Code:
put (the red (cup and saucer)) in (the bowl on the table).
put (the (red cup) and saucer) in (the bowl on the table).
put (the (red (cup and saucer)) in the bowl) on (the table).
put (the (red cup) and (saucer in the bowl)) on (the table).
put (the ((red cup) and saucer) in the bowl) on (the table).
put (the (red (cup and saucer in the bowl))) on (the table).


So it depends on which things are "red" and whether there's a bowl on the table or not. amongst other things. it's worse when words can be multiple parts of speech and when words like "and" can also be used to break up commands;

"in the kayak"
> eat spare food and paddle.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Thu Jun 21, 2018 11:07 am 
Offline
User avatar

Joined: Mon Aug 28, 2017 12:07 pm
Posts: 61
Is there an up-to-date article anywhere that constrasts and compares the existing parsers used in IF systems today?

If attempting a new parser, I would think the goals are neither to reinvent the wheel nor to solve all the problems of natural language processing. Instead, the focus should be simply on making something that is at least a little bit better than what exists today.

I guess what I'm looking for is a table of all the different types of sentences different systems can process today.

_________________
"She is not refined. She is not unrefined. She keeps a parrot."


Top
 Profile Send private message  
Reply with quote  
PostPosted: Thu Jun 21, 2018 10:24 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 5840
That question is hard to answer in a table. I don't know how to describe I7's capabilities without a long lecture with a lot of examples.

Here's a quote from Inform's Standard Rules. This gives you an idea of what the Inform parser can handle:

Quote:
Understand "get in/on" as entering.
Understand "get out/off/down/up" as exiting.
Understand "get [things]" as taking.
Understand "get in/into/on/onto [something]" as entering.
Understand "get off/down [something]" as getting off.
Understand "get [things inside] from [something]" as removing it from.
Understand "pick up [things]" or "pick [things] up" as taking.
Understand "stand" or "stand up" as exiting.
Understand "stand on [something]" as entering.
Understand "put [other things] in/inside/into [something]" as inserting it into.
Understand "put [other things] on/onto [something]" as putting it on.
Understand "put on [something preferably held]" as wearing.
Understand "put [something preferably held] on" as wearing.
Understand "put down [things preferably held]" or "put [things preferably held] down" as dropping.
Understand "insert [other things] in/into [something]" as inserting it into.
Understand "drop [things preferably held]" as dropping.
Understand "drop [other things] in/into/down [something]" as inserting it into.
Understand "drop [other things] on/onto [something]" as putting it on.
Understand "drop [something preferably held] at/against [something]" as throwing it at.


This tells you that Inform's parser understands a VERB followed by zero or more tokens, each of which is a PREPOSITION (perhaps with variations) or a NOUN-PHRASE. This is simple and basically true. (Leave aside numbers and text topics for the moment.)

What this *doesn't* tell you is how noun phrases work. This is rather more complicated -- most of I7's advancement over I6 is in the area of noun phrase support. Conditional synonyms, property- and relation-based synonyms.

The other thing it doesn't tell you is how disambiguation is handled between possibilities.

The improvements mentioned in this thread are basically the intersection of all these domains. "Drop the plant in the pot in the garbage" is either "DROP [noun]" or "DROP [noun] IN [noun]", depending on how you slice out the noun phrases.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Sat Jun 23, 2018 11:46 am 
Offline
User avatar

Joined: Mon Aug 28, 2017 12:07 pm
Posts: 61
Thanks, I think I'm finally getting it. It's not enough to figure out what the list of noun phrases is. You also have to figure out how the noun phrases complexly nest in order to figure out where the dividing line is between the direct object and the indirect object and that is only knowable in the context of the world model and also taking into consideration whether the verb takes 0, 1, or 2 objects.

_________________
"She is not refined. She is not unrefined. She keeps a parrot."


Top
 Profile Send private message  
Reply with quote  
PostPosted: Sat Jun 23, 2018 8:45 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 5840
Also, the work of building the parser includes allowing the author to tie it into the world model. Stuff like that.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Fri Jul 06, 2018 2:02 pm 
Offline
User avatar

Joined: Mon Aug 28, 2017 12:07 pm
Posts: 61
zarf wrote:
This tells you that Inform's parser understands a VERB followed by zero or more tokens, each of which is a PREPOSITION (perhaps with variations) or a NOUN-PHRASE. This is simple and basically true. (Leave aside numbers and text topics for the moment.)

What this *doesn't* tell you is how noun phrases work. This is rather more complicated -- most of I7's advancement over I6 is in the area of noun phrase support. Conditional synonyms, property- and relation-based synonyms.

The other thing it doesn't tell you is how disambiguation is handled between possibilities.

The improvements mentioned in this thread are basically the intersection of all these domains. "Drop the plant in the pot in the garbage" is either "DROP [noun]" or "DROP [noun] IN [noun]", depending on how you slice out the noun phrases.


I tried some experiments in Inform 7 and the example I wrote didn't seem to a very good job with handling noun phrses or with disambiguation. Most likely I'm making a beginner's mistake, but the Inform parser is seems less powerful than I expected.

Code:
The Empty Lot is a room.
A garbage heap is in the Empty Lot.
The box is on the garbage heap.
A blue tin can is in the box.
A green tin can is in the Empty Lot.


[transcript]

Empty Lot
You can see a garbage heap (on which is a box (in which is a blue tin can)) and a green tin can here.

>put tin can in box on garbage heap.
I only understood you as far as wanting to put the green tin can in the box.

[/transcript]

It did not ask me to disambiguate between "putting the tin can, which is in the box, on the garbage heap" and "putting the tin can in the box, which is on the garbage heap."

So, did I just write bad code? Or does the inform parser just not understand commands with noun phrases modifying other noun phrases?

_________________
"She is not refined. She is not unrefined. She keeps a parrot."


Top
 Profile Send private message  
Reply with quote  
PostPosted: Fri Jul 06, 2018 3:42 pm 
Offline

Joined: Tue Mar 09, 2010 2:34 pm
Posts: 5452
Location: Burlington, VT
The Inform parser doesn't understand that "tin can in box" means the tin can that is in the box by default. The only one of those that is understood by default is "take can from box" and a few synonyms--and that involves the dreaded "[things inside]" token which has a truly incredible amount of parser code devoted to it, for just that one case.

To make "in" and "on" understood by default you have to add Understand lines for them, as discussed here:

Code:
Understand "in [something related by reversed containment]" as a thing.
Understand "on [something related by reversed support]" as a thing.


So the parser isn't that powerful by default, but you can build in some extra power using understanding by relations.

But then, as discussed further down that thread, you get the problems with "put can in box on heap" (or even "put the can in the box," which really is unambiguous to ordinary readers). The parser grabs as much of the command as it can for the initial noun phrase, leaving nothing left over for the preposition or second noun. So both those commands get processed as, effectively, "put can," and the parser asks what you want to put the can in.


Top
 Profile Send private message  
Reply with quote  
PostPosted: Fri Jul 06, 2018 4:19 pm 
Offline
User avatar

Joined: Mon Aug 28, 2017 12:07 pm
Posts: 61
matt w wrote:
The parser grabs as much of the command as it can for the initial noun phrase, leaving nothing left over for the preposition or second noun. So both those commands get processed as, effectively, "put can," and the parser asks what you want to put the can in.


OK, so basically it parses greedily rather than trying all possible nestings of noun phrases.

Here is something else I don't understand. I added a red tin can to my experiment.

Code:
The Empty Lot is a room.
A garbage heap is in the Empty Lot.
The box is on the garbage heap.
A blue tin can is in the box.
A green tin can is in the Empty Lot.
A red tin can is in the Empty lot.


[transcript]
You can see a garbage heap (on which is a box (in which is a blue tin can)), a green tin can and a red tin can here.

>put tin can in box on garbage heap
Which do you mean, the blue tin can, the green tin can or the red tin can?

[/transcript]

I was surprised that the put command in the three can example was actually parsed whereas the two can example failed.

_________________
"She is not refined. She is not unrefined. She keeps a parrot."


Top
 Profile Send private message  
Reply with quote  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ]  Go to page 1, 2  Next

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: Google [Bot] and 20 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group