Cross Compile Data Integrity in a Virtual Machine

I recently mentioned on ifMUD that I wished we had a system that allowed cross-compile saved games. The reasons for this I think are obvious, simple, and would be extremely beneficial to the parser-based IF community. Especially if you’re building something large and plan on maintaining it and providing support to even a moderately-sized audience.

So what I’d like is a detailed discussion on the mechanics involved and why this is “hard”. I highly doubt that it’s “impossible”, but I assume there are serious challenges. What are those challenges? What would it take to start thinking about something that enabled this feature?

David C.
textfyre.com

I don’t have time for a full answer this weekend, but here’s the short form:

It’s not impossible, but I think it does necessarily put the burden of consistency checking on the game author. That makes IF authorship way more of a hard programming problem than it is now (and it’s pretty serious now).

So it would, for most authors, be buggy.

I know how I’d do it, and I’d trust myself to make it work right if I built it into the game from day one. I wouldn’t try to retrofit it into a game; and if I were teaching someone else, it would be hard work for both of us.

I knew I’d get that as a first response, but what I’m looking for is to set aside the why not’s and talk about the technical details and then talk about the repercussions.

I know you’re at DragonCon, hence the short answer, but I’d hope there are a few other technically minded folk who can chime in…

I know some interpreters saved the game state by dumping the contents of (virtual) memory to disc, so any change to the layout of that part of the game file (e.g. adding a new object) would make it not line up. (Disclaimer: I don’t know if this technique is still used, or if the same problem applies to whatever replaced it.)

Just brainstorming… (so silly things are expected)

I suppose that a naive first step would be enumerating the kinds of changes that make a saved game incompatible with previous versions (and avoid doing such changes :P). This is, of course, highly VM dependant (I know a bit of T3VM).

Then design an automatic translation, to cope with each one. If this is not possible, provide a hook to make manual adaptation (the author burden zarf mentions, IIUC).

Is it possible to machine-detect such changes (from comparing codebases, or VM game images)? If not, maybe allow some kind of annotation.

Do multi-version conversions (N to N+d, being d>1) as multiple single chained steps.

Are you thinking on converting the saved games out-of-band (as “cross-compile” suggests to me) or within the VM when loading them?

Did you research/conclude something on your own that could be shared?

Some documentation for T3VM (that choose to not allow this kind of compatibility):
tads.org/t3doc/doc/techman/t3spec/save.htm

How I would do it is to write a JSON library and then manually store everything in a JSON file.

You could do this by storing each bit of data with a text key, but if you were careful, you could store lists of data in arrays instead. The important thing to do would be to make sure the order never changes. So for example you could store all the global variables simply as an array. To make sure the orders didn’t change you’d want to define everything in one place. I’d use a tool to compare the compiled I6 file to check that the order doesn’t get disrupted between releases.

You could probably do the same for object properties. Again they should be defined in one place. I would probably store objects by name or by a custom id.

Then you’re left with arrays, which probably can be stored as arrays.

I can’t look at any more costumes tonight, so I’ll write a longer answer.

This is going to be a very generalized look at the problem. It’s not Inform-specific. (In fact, Inform is organized the opposite of this way, as we’ll see.)

Imagine you have an IF game in some abstract sense. It’s a bundle of information. It’s not organized in a very regular way – not like a database schema of regular tables. Some objects have common properties (“location”) but some have object-specific properties, there are a bunch of ad-hoc global variables, etc.

To save the game state in a portable way, we have to give every bit of information a name and then save the big heap of named data. When we read the data back in with a new version, hopefully most of the names haven’t changed – they still exist in the new version and they mean the same thing. For the ones that have changed, we supply an upgrade routine that massages the data. (Makes the appropriate tweaks as it’s read in.)

Simple example. Every object has a location, which is another object. This models a simple containment tree. (Skip over I7’s complexity of different containment/support/incorporation relations.) So at startup time, we might have

LAMP.LOCATION = LIVINGROOM
PLAYER.LOCATION = EASTOFHOUSE
LEAFLET.LOCATION = PLAYER
DIAMOND.LOCATION = NULL

(The diamond has not been created yet so it starts off-stage.)

We can easily throw some global variables into the heap:

GLOBAL.SCORE = 0
GLOBAL.MAXSCORE = 325

In version 2, you decide to add a treasure, the emerald, which starts in the kitchen. Save files from version 1 aren’t going to mention the emerald at all, of course. So you need a function (give it a conventional name like V1TO2) which sets all the emerald properties when a v1 save file is loaded. It could also increase the max-score global to 350.

If you decide to delete the diamond in v3, that might be trickier. If it vanishes from the player’s inventory (or from the trophy cabinet), the V2TO3 routine might have to appropriately decrease the score. But what if the player handed the diamond to the troll as a bridge toll? You could bring the troll back into play, but if the player is on the far side of the bridge, the game could wind up in an impossible state. There are various ways to work around this; you’ll have to pick one.

(One option is always for the V2TO3 routine to say “Sorry, this is too large a version jump, I can’t restore v2 files.” Current IF systems effectively always fall back on this option. We’re trying to be nicer, but in practice we might wind up with some version jumps that are “save-safe” and some that aren’t.)

It’s critical, though perhaps not obvious, that the author has to be involved in the process of naming objects (and properties, and globals). It’s no use the compiler auto-assigning names OBJECT0, OBJECT1, OBJECT2, etc. Because first, that makes the V1TO2 function hellishly obfuscated; and second, what if the author inserts a new object early in the game? It throws off all the other numbers; now the V1TO2 has to really long because it’s reshuffling every single data bit. No author wants to write that. (And the compiler can’t auto-generate it because, well, that’s the problem we’re trying to solve in the first place!)

Now presumably every object (and global, etc) already has a “source code name” that the author made up. (Slightly tricky in I7, but again, skip that for now!) So this naming problem is in a sense already solved. But this means that we’re exposing the source code names in a way which the author isn’t used to. In a typical IF program, if you decide to rename a global variable, you do your context-sensitive search-and-replace and it’s done. But in this system, you have to also add a line to a V1TO2 function. The fact that the global changed names gets embedded in the program, and it has to stay there forever (or until you decide to stop supporting v1 save files).

So complexity accumulates in unpleasant ways. One more thing to remember, one more potential bug.

Let’s go back to substantive data changes (as opposed to renaming things). I’ve skipped an important distinction: static properties (which never change over the course of the game) versus dynamic properties (which do). Similarly we distinguish global constants from global variables.

Our life is much simpler if we don’t store constants and static properties in the save file at all. They are fixed in the game file at startup, and the restore mechanism never touches them. So we can update them in v2 without any trouble at all! The player launches the v2 game: all the static properties have the correct v2 value. The player restores a v1 save file: the static properties are not touched, they still have the correct v2 values. Copacetic.

You can go a long way with static properties. For example, in my Inform games, I always treat “description” as static; I (almost) never change an object’s description. If the object is mutable, I give it a description value like “[if in Kitchen]…[else]…[end if]”. That’s fine in the save system I’m describing. It’s code, but Inform already treats code as static.

So you can imagine changing an object’s description in v2 of a game. As long as you treat description as a static property, this will not require any V1TOV2 work; it’s a guaranteed safe fix.

In fact you can imagine a large class of updates which only involve updates to static data. (Fixing textual typos, fixing logic errors within functions, changing constants, etc.) This is the domain within which Inform could have safe, reliable version bumps with no extra work by the author.

Unfortunately, Inform is in no way engineered for this sort of thing. Inform doesn’t have much notion of static properties. (You can say “The weight of a treasure is always 12,” but there’s no way to convince Inform that descriptions are fixed for each individual object throughout the game.)

I say “unfortunately”, but really it’s kind of a feature. If you did decide to change a description property halfway through the game, would you want an extra stumbling block? You’d want to just do it, write an apologetic comment to yourself, and move on. And this gets back to the complexity thing. This save scheme seems awesome until you realize that every time you add a damn property you have to think about its save-and-restore strategy. IF games are full of ad-hoc properties. The languages are designed to make that easy.

Inform, in particular, is downright efflorescent with saved state. Every time you write “[first time]…[only]” or “[one of]…[stopping]”, bam, that’s a new flag. Every time you write a rule “if examining foo for the third time”, bam, that’s a new counter. A relation is implicitly a property, unless it’s a many-to-many relation, in which case it’s an array. Grammar tables are stored as arrays; there’s no built-in way to alter them during play, but the system allows the possibility.

Having to think about save-and-restore problems for every one of these features is an almighty pain in the spinal cord. It’s nice to think, oh, I will only make safe changes this version – but one little “[first time]…[only]” sneaks in…

In real life, bug fixes are messy. Last month I had a silly HL bug – “PUT ME IN BEAKER” sometimes succeeded. The fix was, logically, a purely static change: I reduced the scope of that rule from “things” (which included the player) to a smaller class. But the fix would still require a V1TOV2 rule because the player shouldn’t be in the beaker! To really be reliable, I’d have to check that case and move the player out. (And then there’s the case where you follow up by emptying the beaker, so the player winds up off-stage…)

Repairing game state is hard. It requires serious-ass debugging skills – it’s way harder than just finding the bug and fixing it. If you screw it up, well, your game is buggier than before – for players with v1 save files. And that’s when you say, screw this, I’ll just refuse to load v1 save files into v2.

So that makes perfect sense. It’s good to have that as a start. But I want to dig deeper.

Forget Inform and TADS and all current tools. If we were to attempt building a platform with this as a requirement, how would we go about that?

We would probably have to have some mechanism to track progress in a story and when a new version comes out, somehow recreate the same state from a save file.

So in Ver 1 the player (visited Living Room, visited Kitchen, picked up knife, dropped knife, visited Living Room, visited Foyer, visited Basement Hallway).

Version 2 we add a hallway between the kitchen and the living room.

Our new tool would allow the author to handle this by creating some sort of transitional help:

When restoring a version 1 saved game:
Insert [visited Hallway] before visited Kitchen from Living Room.

When restoring a version 2 saved game:
Replace gold sword with silver knife.

This floats alongside the things you (Zarf) were saying, but it’s an interesting idea.

David C.
www.textfyre.com

So we could probably implement a new save routine, but it would require tracking every action executed by the player. The Save routine would do the standard Quetzal save, but also save this action history. If the player attempts to restore a file into a new game file (cross versions), the Restore routine would drop the Quetzal data and re-run all of the actions along with any alterations from the new game file.

There are two issues with this (at first glance).

Random Numbers: We would have to manage the RNG so that game play worked exactly the same. I think it would be safe to have the RNG seeded and save the first N values into the save files. This can be reused in any subsequent Restores, regardless of version.

Performance: Executing potentially hundreds of turns could be costly, but I see this as a minor inconvenience. And too, a well-built interpreter could look for old save files and convert them on a background thread. Or we just ask the player to wait for “Upgrade to complete…”

David C.
www.textfyre.com

For the RNG: You could seed the RNG every turn, based on some sort of timer, and save that number to the file along with the actions.

Just to add…I view this as something of a post-release feature. Not something to be used when building something or in alpha state.

The idea of saving an action history is interesting. (As opposed to a history of player commands, which has come up before but has serious flaws.)

I suspect that the action history runs into problems analogous to the state snapshot (which is what I was describing). They might be more tractable. If you had a clear model of checking versus executing safe actions, you would at least be assured that you’d wind up in a consistent state.

But this gets back to the problem of retrofitting this sort of thing onto a game. If you have to keep a strong code discipline to make cross-version save reliable, you’re screwed, because no author will bother being careful during development; it’s enough of a struggle to make the game work in normal play.

Well there are different kinds of authors. The hobbyist author is unlikely to care about such things. But we’re increasingly seeing people do IF works for commercial release. These are the people that would be disciplined programmers. To enable multiple release bug-fixes without effecting the published product.

I can’t contribute too much to this discussion but the above … don’t count on that commercial aspect leading to “disciplined programmers.” We have an entire software industry that, by and large, shows that to be an unworkable assumption. It doesn’t mean things don’t get done and we don’t have good products – but that’s generally not because of some uptick in how disciplined developers have become.

Then again, “disciplined” is a vague term so you have to define that a bit better. If the mechanism that you guys are talking about enforces a convention, then you sort of build the discipline in and make it harder to get around. (Not impossible, but harder. You actively have to work against the conventions.) This is essentially what good frameworks do: make doing the “right thing” easier than doing the “wrong thing.”

If we do make an effort to implement a cross-compile save process, it would be iterative and the “framework” aspects would likely come from feedback.

But I agree, eventually everything should have rails for newbies.

So to play around with this in real life, is there currently a way to trap every action and then replay them in I7?

You could put a new rule in the Turn Sequence rulebook to add the current action to a list. That would record it before any other rules interfered, and should only trap top-level actions (rather than “try…”).

To replay, I suppose you’d have to record the list somehow, but I don’t think stored actions can be written to files (because of the object pointers).

This looks promising.

I can store the [current action] and on replay, load those into a table and loop through them. This does require fiddling with the VM, but I have FyreVM, which I can fiddle with all I want. I can store the list of actions in the engine, then when save is called, create a blorb of both the quetzal data and my action list data.

On load, I open the blorb, try the quetzal data (or test it somehow), and if it’s not for the loaded game file, load the list of actions into an I7 table, run each action, handle errors nicely.

There would have to be a small I7 extension required and this would only work if the VM implemented the action recording feature.

The extension would also have commands for versioning actions. I think a translation table would work.

Table of Action Translations:
old action new action
taking yourself looking


There may be a need to turn actions into strings and back to actions, but is probably possible. Needs more thought.

David C.
www.textfyre.com

You could also store commands instead of actions, sort of like the standard record/replay in Inform. I’ve got a system a bit like this in one of my WiPs: the player can choose a “checkpoint” to jump to on the first turn, which is internally executed by running through a long string of commands taken from my Skein.

Actions are safer.