intfiction.org

The Interactive Fiction Community Forum
It is currently Tue May 21, 2013 12:32 am

All times are UTC - 6 hours [ DST ]




Post new topic Reply to topic  [ 63 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next
Author Message
PostPosted: Wed Sep 14, 2011 8:05 pm 
Offline

Joined: Mon Jun 09, 2008 8:58 pm
Posts: 679
Location: Seattle
I've started on the grand ambition of rewriting Inform's parser in Inform 7, meaning, the entire Parser.i6t template file. My goal in this is primarily for teaching and learning purposes: we get questions in this forum pretty regularly about just what, exactly, is going on inside that parser. While there's a handful of Inform 6 experts here who know this, I feel it would be in everyone's best interest if more people could diagnose parser-related problems with their works-in-progress.

My technical goal in this is to have the Inform 7 compiler generate Inform 6 code that resembles the original Parser.i6t as close as possible. This is so I don't have to test anything. :) Seriously, a full regression test on a rewrite would be nearly impossible to construct, let alone verify. By ensuring the generated I6 corresponds line-by-line with the original, I also ensure that the I7 code isn't slower than the original I6. Or at least, not by much.

I'll further try to use the original I6 names where possible for variables, functions, etc., though I can only go so far. Inform names parameters and locals formulaically, like t_0, t_1, and so on, but the names will at least appear in the comments in the generated I6.

I have two examples already. Version 1 looks like I block-copied Parser.i6t from Appendix B into the IDE and hit compile. Except, it actually does compile. Version 1 is just a base-line to start from. Version 2 I've modernized three functions -- ScopeWithin, PlaceInScope, and AddToScope -- with quite a bit of technical bits added in a volume at the beginning of the file for support. Version 2 also compiles and works, and indicates what later versions will likely look like. Comments at the beginning of the game file are mine, while most of the comments spread throughout are Graham's from Appendix B.

For the purposes of correctness I won't include any fixes or upgrades that the extensions site have. For now, this is a pure translation project, nothing more.

_________________
Blog at Gamasutra :: Programmer's Guide to Inform 7 :: Seattle I-F


Last edited by Ron Newcomb on Fri Sep 16, 2011 5:50 am, edited 1 time in total.

Top
 Profile Send private message  
 
PostPosted: Wed Sep 14, 2011 11:04 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 2084
Do you think that a line-by-line translation will be more readable than the I6 original?

Looking at this, I see mixed improvements. "The parser's current word" is way better than "wn". "Do scope action and recurse on ... excluding ... under ..." is better than "DoScopeActionAndRecurse(..., ..., ...)", but only by the prepositions. Then you have a lot of lines that are essentially adding spaces and "the" here and there.

The overall logic is exactly the same, which is to say occult in the extreme. (I see you translated the line "wn = match_from", in PlaceInScope. A couple of days I mentioned that particular line, as the immediate cause in the infinite-loop-scope-bug thread. I said I had no idea what it's doing there. I still don't. Do you? Anybody?)

I realize that you've chosen a (sanely) limited goal here; the alternative is "write a new parser in pure I7." (I'm not touching that either, believe me.) I'm wondering what the specific benefits will be. ("Let's find out" is a perfectly good answer.)


Top
 Profile Send private message  
 
PostPosted: Thu Sep 15, 2011 2:49 am 
Offline

Joined: Mon Jun 09, 2008 8:58 pm
Posts: 679
Location: Seattle
zarf wrote:
Do you think that a line-by-line translation will be more readable than the I6 original?
Not as readable as a rewrite of course, but I think almost any function written in I7 is more readable than its I6 counterpart.

zarf wrote:
Looking at this, I see mixed improvements. "The parser's current word" is way better than "wn". "Do scope action and recurse on ... excluding ... under ..." is better than "DoScopeActionAndRecurse(..., ..., ...)", but only by the prepositions. Then you have a lot of lines that are essentially adding spaces and "the" here and there.
I had a good name for wn only because I already knew exactly what it was from my previous exposure to the code. The phrase and match_from I have less of an idea, hence the too-literal names. Since the I6 names are preserved for variables via "translates into I6 as" and to-phrase naming, I can completely rename their I7 selves like I did with wn. It's just I need to understand the code better before I can pick better names. You should have seen what I had before I hit upon those particular prepositions. Ugh!

zarf wrote:
The overall logic is exactly the same, which is to say occult in the extreme. (I see you translated the line "wn = match_from", in PlaceInScope. A couple of days I mentioned that particular line, as the immediate cause in the infinite-loop-scope-bug thread. I said I had no idea what it's doing there. I still don't. Do you? Anybody?)
I have a philosophy that if working code is quick and relatively bullet-proof, but its logic looks convoluted, sometimes it's just a question of what you're naming things. Sometimes the name of a variable is a poor match for the concept that it eventually ended up standing for. Since this project allows me to name many of the I7 constructs however I wish (parameters excluded), it's possible we find that the parser isn't as convoluted as it seems. It might even be elegant but we can't see it.

It's also possible to highlight certain patterns in the code, such as a long string of if-statements, which look like they'd be happy as a rulebook. The scoring especially. This can be brought out with I7 comments even if I can't express it directly code-wise. The line you bring up in particular I've seen multiple times, and the word "reset" comes to mind with it. I believe it's a safety mechanism in case other GPRs didn't reset wn like they're supposed to, or an off-by-one error happened. (Or maybe, since many matching routines only work on wn, it's using up wn to use one of those functions. If parsing is done, wn can be trashed.)

When constructing the Custom Library Messages extension, I slipped in the comment [reset to] in a now...is... assignment so I could tell when an assignment was meaningful or not. The same can apply here.

zarf wrote:
I realize that you've chosen a (sanely) limited goal here; the alternative is "write a new parser in pure I7." (I'm not touching that either, believe me.) I'm wondering what the specific benefits will be. ("Let's find out" is a perfectly good answer.)

I don't think a brand new I7 parser would gain much traction for the same reason that Glulx didn't immediately obsolete the Z-machine. So at the very least this project could be a stepping stone toward a better day: more people could join in with sensible opinions on the parser due to better readability, increasing pressure to standardize things like improved disambiguation, transparent scoring, etc. Besides, not everyone speaks I6, and that alone is a hurdle.

So I think there isn't much in the way of real technical benefits despite being a technical challenge. But I think the "soft" benefits of bringing in more community eyes are definitely worth it.

And I'll try to comment more. :)

_________________
Blog at Gamasutra :: Programmer's Guide to Inform 7 :: Seattle I-F


Top
 Profile Send private message  
 
PostPosted: Thu Sep 15, 2011 10:46 am 
Offline

Joined: Fri Jul 16, 2010 2:09 pm
Posts: 1950
This is very cool! My idea for a parser rewrite was to keep it in I6 but to break the big routines into small, readable chunks with better naming. But this looks like the beginning of an even better solution. Thanks! (edit: I see that's what you've actually done, by putting chunks of I6 code into the I7 source.)

Sheesh, sometimes I think I must be the dumbest programmer in the world. Why didn't I think to use "to decide" phrases to set constants?

I have no shame in admitting that this is beyond me, though:

Code:
To decide which grammar token is the noun token: [0] (- NOUN_TOKEN -). To decide which grammar token is the end of line token: [15] (- ENDIT_TOKEN -).


Are you defining "fake" values for the grammar token KoV? Is this a form of typecasting? I wouldn't have expected it to be legal to define a numeric value for a KoV while only giving its name as a phrase-name. I can see that it could be useful for 0, which is not a normally used KoV. (right?) But is it dangerous to specify another integer - couldn't there be collisions? I'm assuming you're doing it because there's a gap in values between topic and EOL, so perhaps it makes sense as a way to keep the numeric values in sync with I6 constants, but it looks arcane and scary.


Top
 Profile Send private message  
 
PostPosted: Thu Sep 15, 2011 2:42 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 2084
It's just type-casting. Specific values have to be defined for grammar-token values that are referred to in I6 code. Ultimately, I guess, once all the I6 code is gone, those specific value definitions will be eliminated -- the code will be able to rely on "naturally-generated" numbers for each grammar-token value.


Top
 Profile Send private message  
 
PostPosted: Thu Sep 15, 2011 2:43 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 2084
Quote:
Since this project allows me to name many of the I7 constructs however I wish (parameters excluded), it's possible we find that the parser isn't as convoluted as it seems. It might even be elegant but we can't see it.


Nope. :/


Top
 Profile Send private message  
 
PostPosted: Thu Sep 15, 2011 2:56 pm 
Offline

Joined: Fri Jul 16, 2010 2:09 pm
Posts: 1950
This project may inspire me to do that refactoring I wanted to do. But don't let that stop you, if that's something you'd like...


Top
 Profile Send private message  
 
PostPosted: Thu Sep 15, 2011 4:28 pm 
Offline

Joined: Mon Jun 09, 2008 8:58 pm
Posts: 679
Location: Seattle
capmikee wrote:
(edit: I see that's what you've actually done, by putting chunks of I6 code into the I7 source.)
Eventually those chunks will be commented out and the I7 version will sit beside, so no I6 other than some glue code will be left.

capmikee wrote:
I have no shame in admitting that this is beyond me, though:
Code:
To decide which grammar token is the noun token:  (- NOUN_TOKEN -).
To decide which grammar token is the end of line token:  (- ENDIT_TOKEN -).
Are you defining "fake" values for the grammar token KoV? Is this a form of typecasting? I wouldn't have expected it to be legal to define a numeric value for a KoV while only giving its name as a phrase-name. I can see that it could be useful for 0, which is not a normally used KoV. (right?) But is it dangerous to specify another integer - couldn't there be collisions? I'm assuming you're doing it because there's a gap in values between topic and EOL, so perhaps it makes sense as a way to keep the numeric values in sync with I6 constants, but it looks arcane and scary.
Apparently it isn't beyond you because you understand it, and its potential problems, perfectly. Zarf's also right in that the actual number that the enums/constants/kovs/words stand for isn't important, so as long as every use of which is taken into account, I don't have to use the above gymnastics to force a particular name to compile to a particular number.

zarf wrote:
Nope. :/
O ye of little faith. :) A trick I've started using is exposing the same I6 variable to I7 multiple times under different names, and sometimes with different types. This is useful. A couple of parser variables actually (but only sometimes) hold a routine to call with indirect(), which one I7 incarnation of I have typed as "phrase object -> nothing" or "phrase nothing -> truth state" as appropriate. This helps.

(I just wish I7 implemented it's "applied to" and other of its new functional-programming features were implemented with I6's indirect(). That would have been tidy.)

_________________
Blog at Gamasutra :: Programmer's Guide to Inform 7 :: Seattle I-F


Top
 Profile Send private message  
 
PostPosted: Thu Sep 15, 2011 7:15 pm 
Offline

Joined: Sat Jan 23, 2010 4:56 pm
Posts: 2084
Quote:
(I just wish I7 implemented it's "applied to" and other of its new functional-programming features were implemented with I6's indirect(). That would have been tidy.)


I'm pretty sure that indirect() was deprecated after Inform 5, and only hangs on through habit and indifference. (It's not mentioned in the DM4, for example.) Any line "indirect(x)" can be written as "x()", and that's got to be clearer I6 code.


Top
 Profile Send private message  
 
PostPosted: Fri Sep 16, 2011 1:11 am 
Offline

Joined: Mon Jun 09, 2008 8:58 pm
Posts: 679
Location: Seattle
zarf wrote:
I'm pretty sure that indirect() was deprecated after Inform 5, and only hangs on through habit and indifference. (It's not mentioned in the DM4, for example.) Any line "indirect(x)" can be written as "x()", and that's got to be clearer I6 code.

Oh that's interesting. The IDE still colorizes it like child(), sibling(), and parent(). Thank you for that info. An I7 rule is invoked the same exact way from the I6, so I might be able to simplify some stuff.

Right now I'm working out ways that I7 should access an I6 array, especially if it's sometimes a routine instead, as with add_to_scope.

_________________
Blog at Gamasutra :: Programmer's Guide to Inform 7 :: Seattle I-F


Top
 Profile Send private message  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 63 posts ]  Go to page 1, 2, 3, 4, 5 ... 7  Next

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group