Wednesday, April 17, 2019

What is ZIL anyway?

The Infocom ZIL code dump has kicked off a small whirlwind of news articles and blog posts. A lot of them are somewhat hazy on what ZIL is, and how it relates to MDL, Lisp, Z-code, Inform, and the rest of the Golden-Age IF ecosystem.
So I'm going to talk a lot about it! With examples. But let's go through in chronological order.
The first version of Zork was MDL Zork. This is what Tim Anderson, Marc Blank, Bruce Daniels, and Dave Lebling wrote as MIT hackers around 1977 to 1979. MDL, the MIT Design Language, was a Lisp-like functional language created at MIT. MDL ran on the PDP-10, so that's where this ur-Zork ran.
Zork was ported to Fortran by Bob Supnik in 1980, and then to C. These versions, generally known as "mainframe Zork" (or "Dungeon") circulated among DEC user groups, wasted years of mainframe user time, and -- along with "mainframe Colossal Cave" -- changed the lives of many. (Including me, age 9 or so.)
At the same time, those MIT hackers formed a company and set out to... well, to make business software, if you look at those early plans. But they figured they'd first make a quick buck by porting Zork to home computers. However, MDL programs couldn't possibly run on an Apple 2 or a TRS-80. So the Infocom folks sat down and designed the Z-machine.
I'm not going to go through this story in depth. For that, I recommend Jimmy Maher's overview at the Digital Antiquarian. Here's the quickie: Infocom would write a game in ZIL, or "Zork Implementation Language" -- a high-level language derived from MDL. A compiler called ZILCH would then compile the game into a binary format.
The binary format, Z-code, was a program for an imaginary computer called the Z-machine. (Today we would say "virtual machine".) Nobody intended to build a Z-machine. But they could write a program that emulated the Z-machine. This program, called ZIP, was compact enough to run on a 16-bit home computer. So Infocom could distribute ZIP and the Z-code file on a floppy disk (or cassette tape!) and thus sell a playable game.
ZIL is not a mystery. We haven't had a lot of ZIL code available before this week. But someone scanned an Infocom ZIL manual years ago. You can read that manual here. (It's dated 1989, and primarily written by Steve Meretzky -- "SEM".)
But what kind of language is ZIL?
Above I said it was "derived from MDL". This is no surprise; the Infocom people wanted to reuse as much of MDL Zork as possible in their new commercial product. The resemblance is obvious, and indeed some of ZIL Zork was nearly identical to MDL Zork. Here's a function from the combat implementation in MDL Zork:
<DEFINE VILLAIN-STRENGTH (VILLAIN
                      "AUX"  (OD <OSTRENGTH .VILLAIN>) WV)
    #DECL ((VILLAIN) OBJECT (WV) <OR FALSE VECTOR>
           (OD VALUE) FIX)
    <COND (<G=? .OD 0>
           <COND (<AND <==? .VILLAIN <SFIND-OBJ "THIEF">>
                       ,THIEF-ENGROSSED!-FLAG>
                  <SET OD <MIN .OD 2>>
                  <SETG THIEF-ENGROSSED!-FLAG <>>)>
           <COND (<AND <NOT <EMPTY? <PRSI>>>
                       <TRNN <PRSI> ,WEAPONBIT>
                       <SET WV <MEMQ .VILLAIN ,BEST-WEAPONS>>
                       <==? <2 .WV> <PRSI>>>
                  <SET OD <MAX 1 <- .OD <3 .WV>>>>)>)>
    .OD>
And here's the same code from the ZIL version:
<ROUTINE VILLAIN-STRENGTH (OO
                       "AUX" (VILLAIN <GET .OO ,V-VILLAIN>)
                       OD TMP)
     <SET OD <GETP .VILLAIN ,P?STRENGTH>>
     <COND (<NOT <L? .OD 0>>
            <COND (<AND <EQUAL? .VILLAIN ,THIEF> ,THIEF-ENGROSSED>
                   <COND (<G? .OD 2> <SET OD 2>)>
                   <SETG THIEF-ENGROSSED <>>)>
            <COND (<AND ,PRSI
                        <FSET? ,PRSI ,WEAPONBIT>
                        <EQUAL? <GET .OO ,V-BEST> ,PRSI>>
                   <SET TMP <- .OD <GET .OO ,V-BEST-ADV>>>
                   <COND (<L? .TMP 1> <SET TMP 1>)>
                   <SET OD .TMP>)>)>
     .OD>
Pretty similar, right? But under the surface, rather different things are going on.
MDL is a functional language in the Lisp vein. Pretty much everything boils down to linked lists. Executing a program means freely constructing and throwing away lists, so there must be a garbage collector behind the scenes. The MDL compiler does what it can to eliminate memory allocation; efficient MDL code might be compiled to static machine code. But if the compiler can't do that, you wind up allocating stuff on the heap.
But the Z-machine doesn't have a garbage collector. It has no primitive operations for constructing lists or allocating objects. There are no Z-machine instructions for finding the head and tail of a list. (The famous car and cdr primitives of Lisp, called 1 and REST in MDL. The Z-machine don't have 'em.)
(You may know that I wrote a small Lisp interpreter for the Z-machine. It was fun, but the Z-machine gave me no help at all! I had to build the Lisp heap and list data structures myself, out of primitive Z-machine byte arrays. Same with the garbage collector. It's all terribly inefficient and janky.)
So how does ZIL perform these operations? Answer: it doesn't! As far as I can tell, the ZIL Zork code doesn't use Lisp-style (linked) lists at all. The 1, NTH, and REST functions appear only rarely, and I believe they always apply to static strings or arrays. MAPF, a basic Lisp tool which transforms one list into another, doesn't appear at all.
In contrast, the MDL source is filled with 1, NTH, REST, MAPF, and other functional-language constructs.
So in one sense, ZIL is a completely different language from MDL. It's a C-like compiled language which operates entirely on fixed data structures. But in another sense, as you can see from the examples above, they're almost identical! How does this make sense?
My answer is that game logic is a fairly narrow sort of programming. The combat example sets some local variables and looks up some object properties (fixed data structures!). It compares numbers; it compares variables to objects. And there's a bunch of "if" statements with "and"s and "or"s. Familiar, right? There are dozens of ad-hoc game scripting languages with similar features. You don't use a language like that to solve graph-theory problems, much less build a compiler.
(The Ancient Terror puzzle in Enchanter does involve some graph theory, but this is implemented with some rather clunky nested loops. Not a Lisp-y solution at all.)
It's not that functional methodology is impossible in ZIL. The language clearly imports as much of MDL's functional model as it can. It's just that this isn't very much. ZIL has about as much of MDL as can be compiled into static (non-allocating) code. Which makes sense -- that's exactly what the ZILCH compiler did.
To bring the story up to the present... In 1993, Graham Nelson released Inform. This language (and compiler) had exactly the same goal as ZIL: to efficiently compile source code into Z-machine files. Graham didn't have access to any information about ZIL at that point, so he just made up his own language, which was mostly like C because that's what he was familiar with.
(Of course C itself was designed to run efficiently on fixed data structures. So Graham had an easier job, in some sense. Not to take anything away from his accomplishment!)
Inform (up through version 6) became the cornerstone of the Z-machine ecosystem. But the Z-machine had some hard limits; it had been designed for 16-bit microcomputers and could not easily be expanded. So by 1997 there was a clear need for a new virtual machine. This is where I come into the picture. I designed Glulx as a replacement VM.
Amusingly (or ironically, or inevitably), Glulx had one core goal: to efficiently compile Inform 6 code to a new virtual machine. Every line of I6 code had to work essentially the same as it had before. Glulx used 32-bit values instead of 16-bit values, and I reorganized the memory layout and the instruction table. But the core data structures looked very much like those of the Z-machine.
As a result, it should be possible to compile ZIL to Glulx! I don't think anybody's done it. But Glulx was shaped by I6, I6 was shaped by the Z-machine, and the Z-machine and ZIL were shaped by each other. The chain of influence extends all the way from Joel Berez's coffee table to mine.
Inform 6 was followed by Inform 7, a completely new language which compiles to Inform 6 source code. At least for now. Future versions may compile to a new abstraction. But that's a story for another time.
Footnote: my conclusions about ZIL have to be qualified: I could be wrong. I don't know ZIL or MDL, really. I have an MDL programming guide open as I write this...

EDIT-ADD:
Given the MDL and ZIL code examples above, I thought it was worth adding the Inform 6 equivalent. This is from Allen Garvin's hand-polished I6 port of the ZIL code.
[ VillainStrength oo villain od tmp ;
    villain = oo-->0;
    od = villain.strength;
    if (od >= 0) {
        if (villain == thief && Thief_engrossed) {
            if (od > 2) {
                od = 2;
            }
            Thief_engrossed = false;
        }
        if (second && second has weapon && oo-->1 == second) {
            tmp = od - oo-->2;
            if (tmp < 1) {
                tmp = 1;
            }
            od = tmp;
        }
    }
    return od;
];
If you check, this is line-by-line equivalent to the ZIL version above. But it's much more readable to modern eyes, isn't it?
To some degree, this is just because I6 is part of the C-derived family of languages. (I'm told that "Algol family" is a better term, but I've never touched Algol.) This includes C, C++, C#, Javascript, Perl, Swift... very different languages, but they all share the basic assumption that if (X) print(Y) is a sensible way to write code. A programmer these days might never have used any other kind of language.
I6 follows C very closely, in this example. The only quirks are:
  • the --> operator (for arrays, where C would have oo[0], oo[1], oo[2]);
  • the has operator (for testing object flags; in ZIL this was FSET?);
  • defining functions in square brackets, which is idiosyncratic.
Everything else is expressed in ways that look completely natural. Mind you, the lower-case text helps a lot. But we're very used to code where identifiers are text, operators are (mostly) punctuation, and one line of code is conventionally one abstract step of procedure.
In 1979, for people brought up on Lisp, MDL probably wasn't as strange. No doubt they would say that it's much simpler to put the operator at the beginning of every call (<SET OD .TMP> rather than od = tmp;). Test operators always end with ?, whereas in the I6 code you have to look for an if to figure out whether you're performing a test. The difference between period and comma as atom prefixes is no doubt a concise way to express something (which I haven't bothered to look up what it is). And so on.
I still think the lower-case code is easier on the eyes, but hey.

EDIT-ADD 2: Tim Anderson has kindly sent along a lot of details about MDL -- or Muddle, as I should be calling it. I include his comments verbatim below.
A few comments on Muddle, ZIL, and their friends:
Although the original PDP-10 Zork was written in Muddle, it was, unintentionally, much closer to ZIL (which didn't exist yet, obviously) than it had to be. There were a few reasons for this:
  • Lisp-ish languages were, by outsiders, thought to be inherently slow, so it was a point of pride to write code (and build a tool chain) that disproved this. (There was a benchmark done with some math-heavy computation where compiled Maclisp was faster than FORTRAN, on the same hardware.) A lot of things in the Zork code were there at least in part because of this.
  • The basic objects (rooms and so on) were not lists, they were vectors (or perhaps templates; the difference was slight)--each object was a block of contiguous memory, much like a C struct. Quicker access to the elements, smaller memory footprint, and--much closer to the data structures you'd find in C, or ZIL.
  • There was a lot of effort put into not provoking the garbage collector (No consing, please, this is Lisp!). The Muddle garbage collector, when it ran (there was a lot of innovation in garbage collection later, but this was 1977), just stopped processing for a while, made a clean, compact copy of the heap, replaced the old one with that, and let you continue. This could take several CPU seconds, which on a time-sharing system could turn into a really long long time--very bad for interactive game play. In addition: the machine being used had a 256K word address space, with 512K of physical memory, and was supporting 10, sometimes 20 users. Keeping the memory footprint down was a good way to keep your friends: as much as possible of the code was in read-only memory--allowing it to be shared between processes, and ensuring that there was a clean separation later on, in ZIL. And avoiding dynamic memory allocation reduced the really significant load produced by the garbage collector.
  • As Jesse mentions in the comment about macros, they provided full access to the Muddle interpreter at compile time, and were certainly used in the original Zork as well as later on. The only memory saving produced by compilation was the elimination of macro expansion at runtime (in Muddle, as in Lisp, a call to a macro would be evaluated twice: first to run the macro, which returned some freshly consed-up code, then to evaluate that; the compiler did the first step at compile time, then compiled the result). Muddle, like Lisp, had as its basic function EVAL, which ran the interpreter on a single piece of uncompiled code. Zork never called that--slow, prone to unexpected consing, and so on. So it didn't get in the way when moving to ZIL (or to FORTRAN); everything that was running in Zork was compiled, which limited the bad (inefficient, difficult to port) things that Muddle supported in the interpreter.
  • Each Muddle atom could have two values--a global, and a local (,FOO was syntactic sugar for <GVAL FOO>; .FOO, <LVAL FOO>). Locals, as with Lisp, had dynamic scope--in interpreted code, functions called from within an invocation of MYFUNC could access any locals bound within MYFUNC (assuming that no intervening function had another binding of them, of course). In compiled Muddle, though, local variables that needed to be accessed from outside the function itself had to be declared SPECIAL; any other locals ended up having, in effect, static scope within the body of the defining function. Zork used no specials: it made for faster, and, honestly, more understandable code. And also made it easier to port to statically-scoped languages.
  • The Muddle compiler acquired some intrinsics to allow direct (single machine instruction) access to the key parser output variables PRSA, PRSO, and PRSI. This was a different way of thinking about things, and again required a bit more discipline.
  • It was unnecessary to declare the type of anything in Muddle, but that would make the compiler's job much harder, and reduce performance. So pretty much every variable, and every field in the Zork objects, had a type declaration associated with it. Faster code, but also easier to port to a language without dynamic types.
  • MAPF (or its friend, MAPR--map First vs. map Rest) was probably used in Zork somewhere, but not in the most general form, where it constructed a new list; rather, it was an easy way to iterate over a list. In that form, it would be relatively easy, if it came to that (I don't think it did), for Zilch to do something sensible, or even to write a macro that would construct the iteration at compile time.
The virtual machine used for Cornerstone was quite a bit more powerful than the Z machine. There were a few reasons for that:
  • It was required. A nearly-relational database, with what passed for a GUI in those days, had need of a lot more functionality than a text adventure.
  • The minimal target hardware was something like a PC-AT (maybe an XT): hard drive, 640K of memory, etc. There was no need to run on an Apple II or a Commodore 64.
  • Infocom had a fair amount of experience with these things by then.
  • Not to be invidious, but the people building the tool chain for Cornerstone were better at that kind of development (years of experience, and it was their primary job--not true with ZIL, except for the people building the individual Z machine implementations) than those defining ZIL and the Z machine. They weren't as good at writing games, though, so it worked out.
(-- Tim Anderson)

12 comments:

  1. "Inform (up through version 6) became the cornerstone of the Z-machine ecosystem."

    When talking about the Z-machine, cornerstone feels like one of those words that's its own opposite...

    Side note: I believe Cornerstone's VM is much better suited to hosting functional MDL programs than the Z-machine.

    "As a result, it should be possible to compile ZIL to Glulx! I don't think anybody's done it."

    If anyone wants to try, it's a mere Exercise For The Reader to add a new implementation of the interfaces in Zilf.Emit: here's the issue and the interfaces.

    "I have an MDL programming guide open as I write this..."

    Plug: After years of flipping back and forth through those giant PDFs, I eventually transcribed the main MDL guide to make it searchable.

    ReplyDelete
    Replies
    1. I almost used "cornerstone" at least twice more in the post, but then decided to limit myself to one. :) Be glad I didn't start working in the words "suspect," "witness", "deadline", "border zone"...

      For years I had the impression that Cornerstone had a Z-code engine inside it. (For query parsing, not an entire database written in ZIL.) Only in this current round of discussion did I realize it was a distinct VM. Thanks for the reference link. I see it has heap allocation opcodes, which are a start.

      And thanks for the MDL docs as well.

      Delete
    2. If you write about the office-politics side of Infocom history, perhaps you can work in Bureaucracy, Infidel, the Lurking Horror...Suspended.

      Delete
  2. Just as a bit of historical accuracy, Zilch compiled the ZIL code into an intermediate Z-machine assembly code, which was then passed to a second program, Zap, to make the final Z-code story file. At least by the time I came on the modern IF scene in 1996, Inform compiled directly to Z-code.

    ReplyDelete
    Replies
    1. Right. Some of the Infocom source repos include .ZAP files as well as .ZIL files.

      Inform (1-6) always compiled directly to Z-code. The Inform compiler recognizes assembly opcodes as well as Inform code statements, so you can use the same tool for both tasks. You can even mix assembly and Inform in the same function if you want.

      Note that Inform's assembly language is not the same as Zap's! Inform uses the list of opcode names that came out of the InfoTaskForce reverse engineering work in 1990-ish. They didn't have access to ZIL/ZAP source so they made up their own names for everything.

      Delete
  3. The relationship between ZIL and MDL is a little more complicated than "MDL inspired ZIL", because there's a lot of what can only be described as MDL code in the game source. It's just limited to running at compile time.

    For example, maze-program.zil from Bureaucracy loads a text file from the source directory and uses it to generate tables that are compiled into the game. gamesound.zil from Sherlock defines new property definition parsers to enable the property syntaxes that are used in other files, like (SIZE 1 MASS 4).

    So the fixed-allocation, "ZIL" part of the language mostly only exists inside ROUTINE definitions. In e.g. DEFMAC, you can use all of MDL, you just have to return valid ZIL code if the macro is used in a routine. This could potentially be used to extend the language with things like runtime lists.

    Also, speaking of janky runtime allocators, see pmem.zil and pstack.zil in the V6 games.

    ReplyDelete
  4. "The difference between period and comma as atom prefixes is no doubt a concise way to express something (which I haven't bothered to look up what it is)"

    I'm trying to figure out ZIL/MDL myself these days, but it seems like "." is referring to variables outside of the definition and commas within it.

    Playing around with ZILF,
    <DEFINE SQ (X) <* .X .X>>
    <SQ 5>
    returns 25 as expected.
    likewise

    <SETG X 5>
    <DEFINE SQ2 () <* ,X ,X>>
    <SQ2>
    also returns 25.
    What doesn't seem to work at all is what makes sense to a Lisp programmer:
    <DEFINE SQ3 (X) <* X X>>

    ReplyDelete
    Replies
    1. Chapter 4 ("Values of Atoms") in the MDL manual describes the comma and dot variable prefixes in detail.

      Briefly: comma-prefixed ",X" is shorthand for , dot-prefixed ".X" is , and for good measure, apostrophe-prefixed "'X" is .

      On its own, "X" is just an atom, and evaluating it does nothing. It only becomes a variable when you use it with SETG/GVAL or SET/LVAL.

      Also note: MDL uses dynamic bindings for locals, so .X doesn't necessarily have to be declared in the same function where it's accessed. (Inside a ZIL routine, it does.)

      Delete
    2. I see blogger has eaten your formatting, so I will repeat: comma looks up a global variable, period looks up a local variable.

      I was trying to get at the fraught relationship between MDL and ZIL when I wrote "as much of MDL as can be compiled into static code". I appreciate the longer explanation.

      Delete
  5. It's been pretty cool to see all this ZIL! Especially digging into how some of the games work after spending too long staring at disassemblies! (For example: if you eat the fruit in H2G2, there is a small chance that Marvin from the future will choose a tool you *do* have. In contrast, https://github.com/historicalsource/hitchhikersguide/blob/master/heart.zil#L281 will choose a tool you definitely don't; question: why doesn't this infinite loop if you *do* have all the tools? I hoped the ZIL would answer this, but I still don't know.)

    I wish we had all of the source code, including the supporting libraries. It seems to me that the way they developed their games is that they had a full MDL interpreter on their workstation, and they could load in games, play, and debug them right there. They ought to have been able to modify a room or routine, for example, and reload that definition while the game is running. This kind of dynamic development is what a MDLr would have expected, and the original Zork would have definitely had been developed in such a read-eval-print loop.

    I'd guess ROUTINE, OBJECT, ROOM and friends are MDL macros that arrange for certain MDL objects to be created and attached to ATOMs, maybe also attached to certain lists of routines, objects, and rooms. Then the ZIL cross-compiler would be another MDL program an Imp could run that would slurp up all these created objects then do the translation to the much more restrictive Z-machine. In MDL, a procedure is a list of type SUBR, so it is possible for ROUTINEs to both be (1) functions that MDL can run directly and (2) the source code for the cross compiler, without needing to read the actual textual source code again!

    Some of the source code comments support the idea that they would reload only the parts of the source code an Imp would currently be working on.

    It's interesting to look at the macro TELL, since it's not shy about using MAPF. Or the definition MULTIFROB, which is not a routine, so wouldn't be touched by the cross-compiler.

    ReplyDelete
    Replies
    1. There were lots of ways to debug games, not necessarily involving running them in a ZIL "interpreter"--which became progressively more difficult as more features were added to ZIL and to the Z machine. The interpreted version was the least interesting to keep running.
      There was a development version of the Macintosh Z machine that developers and testers used after Infocom moved from the DEC-20 to a Mac environment (A/UX for developers, since there was a Muddle implementation on that; normal Macs for testing), with a very good debugger. Since this ran on the compiled code, one had a much better sense of what was really going on, and could ensure that limitations imposed by the Z machine were being honored.

      In Muddle, a SUBR was a built-in function (PRINT, EVAL); an FSUBR was a built-in that didn't get its arguments evaluated first (COND, PROG,...). DEFINE produced a FUNCTION; compiler output was an RSUBR. A FUNCTION would of course have its argument list attached, and an RSUBR would have a version of that. SUBRs and FSUBRs, IIRC, were just things, with no visible metadata.
      A subtle point in the game code is the use of Muddle's OBLISTs, which provided different namespaces. For example, you'll sometimes see just RETURN, and other times RETURN!-. The former most likely is referring to the ZIL routine called RETURN, while the latter is referring to the native Muddle version. These were most conveniently managed by Muddle's package system; ZIL itself was in a package, so could replace Muddle primitives with its own versions without destroying the interpreter.

      Delete
  6. I have a 1980 fanfold printout of c-adventure (the original name of the game).

    ReplyDelete