What is ZIL anyway?
Wednesday, April 17, 2019 (updated 2 days later)
Comments: 13 (latest June 15)
Tagged: zilf, zork, infocom, interactive fiction, zil, if
The Infocom ZIL code dump has kicked off a small whirlwind of news articles and blog posts. A lot of them are somewhat hazy on what ZIL is, and how it relates to MDL, Lisp, Z-code, Inform, and the rest of the Golden-Age IF ecosystem.
So I'm going to talk a lot about it! With examples. But let's go through in chronological order.
The first version of Zork was MDL Zork. This is what Tim Anderson, Marc Blank, Bruce Daniels, and Dave Lebling wrote as MIT hackers around 1977 to 1979. MDL, the MIT Design Language, was a Lisp-like functional language created at MIT. MDL ran on the PDP-10, so that's where this ur-Zork ran.
Zork was ported to Fortran by Bob Supnik in 1980, and then to C. These versions, generally known as "mainframe Zork" (or "Dungeon") circulated among DEC user groups, wasted years of mainframe user time, and -- along with "mainframe Colossal Cave" -- changed the lives of many. (Including me, age 9 or so.)
At the same time, those MIT hackers formed a company and set out to... well, to make business software, if you look at those early plans. But they figured they'd first make a quick buck by porting Zork to home computers. However, MDL programs couldn't possibly run on an Apple 2 or a TRS-80. So the Infocom folks sat down and designed the Z-machine.
I'm not going to go through this story in depth. For that, I recommend Jimmy Maher's overview at the Digital Antiquarian. Here's the quickie: Infocom would write a game in ZIL, or "Zork Implementation Language" -- a high-level language derived from MDL. A compiler called ZILCH would then compile the game into a binary format.
The binary format, Z-code, was a program for an imaginary computer called the Z-machine. (Today we would say "virtual machine".) Nobody intended to build a Z-machine. But they could write a program that emulated the Z-machine. This program, called ZIP, was compact enough to run on a 16-bit home computer. So Infocom could distribute ZIP and the Z-code file on a floppy disk (or cassette tape!) and thus sell a playable game.
ZIL is not a mystery. We haven't had a lot of ZIL code available before this week. But someone scanned an Infocom ZIL manual years ago. You can read that manual here. (It's dated 1989, and primarily written by Steve Meretzky -- "SEM".)
But what kind of language is ZIL?
Above I said it was "derived from MDL". This is no surprise; the Infocom people wanted to reuse as much of MDL Zork as possible in their new commercial product. The resemblance is obvious, and indeed some of ZIL Zork was nearly identical to MDL Zork. Here's a function from the combat implementation in MDL Zork:
<DEFINE VILLAIN-STRENGTH (VILLAIN
"AUX" (OD <OSTRENGTH .VILLAIN>) WV)
#DECL ((VILLAIN) OBJECT (WV) <OR FALSE VECTOR>
(OD VALUE) FIX)
<COND (<G=? .OD 0>
<COND (<AND <==? .VILLAIN <SFIND-OBJ "THIEF">>
,THIEF-ENGROSSED!-FLAG>
<SET OD <MIN .OD 2>>
<SETG THIEF-ENGROSSED!-FLAG <>>)>
<COND (<AND <NOT <EMPTY? <PRSI>>>
<TRNN <PRSI> ,WEAPONBIT>
<SET WV <MEMQ .VILLAIN ,BEST-WEAPONS>>
<==? <2 .WV> <PRSI>>>
<SET OD <MAX 1 <- .OD <3 .WV>>>>)>)>
.OD>
And here's the same code from the ZIL version:
<ROUTINE VILLAIN-STRENGTH (OO
"AUX" (VILLAIN <GET .OO ,V-VILLAIN>)
OD TMP)
<SET OD <GETP .VILLAIN ,P?STRENGTH>>
<COND (<NOT <L? .OD 0>>
<COND (<AND <EQUAL? .VILLAIN ,THIEF> ,THIEF-ENGROSSED>
<COND (<G? .OD 2> <SET OD 2>)>
<SETG THIEF-ENGROSSED <>>)>
<COND (<AND ,PRSI
<FSET? ,PRSI ,WEAPONBIT>
<EQUAL? <GET .OO ,V-BEST> ,PRSI>>
<SET TMP <- .OD <GET .OO ,V-BEST-ADV>>>
<COND (<L? .TMP 1> <SET TMP 1>)>
<SET OD .TMP>)>)>
.OD>
Pretty similar, right? But under the surface, rather different things are going on.
MDL is a functional language in the Lisp vein. Pretty much everything boils down to linked lists. Executing a program means freely constructing and throwing away lists, so there must be a garbage collector behind the scenes. The MDL compiler does what it can to eliminate memory allocation; efficient MDL code might be compiled to static machine code. But if the compiler can't do that, you wind up allocating stuff on the heap.
But the Z-machine doesn't have a garbage collector. It has no primitive operations for constructing lists or allocating objects. There are no Z-machine instructions for finding the head and tail of a list. (The famous car
and cdr
primitives of Lisp, called 1
and REST
in MDL. The Z-machine don't have 'em.)
(You may know that I wrote a small Lisp interpreter for the Z-machine. It was fun, but the Z-machine gave me no help at all! I had to build the Lisp heap and list data structures myself, out of primitive Z-machine byte arrays. Same with the garbage collector. It's all terribly inefficient and janky.)
So how does ZIL perform these operations? Answer: it doesn't! As far as I can tell, the ZIL Zork code doesn't use Lisp-style (linked) lists at all. The 1
, NTH
, and REST
functions appear only rarely, and I believe they always apply to static strings or arrays. MAPF
, a basic Lisp tool which transforms one list into another, doesn't appear at all.
In contrast, the MDL source is filled with 1
, NTH
, REST
, MAPF
, and other functional-language constructs.
So in one sense, ZIL is a completely different language from MDL. It's a C-like compiled language which operates entirely on fixed data structures. But in another sense, as you can see from the examples above, they're almost identical! How does this make sense?
My answer is that game logic is a fairly narrow sort of programming. The combat example sets some local variables and looks up some object properties (fixed data structures!). It compares numbers; it compares variables to objects. And there's a bunch of "if" statements with "and"s and "or"s. Familiar, right? There are dozens of ad-hoc game scripting languages with similar features. You don't use a language like that to solve graph-theory problems, much less build a compiler.
(The Ancient Terror puzzle in Enchanter does involve some graph theory, but this is implemented with some rather clunky nested loops. Not a Lisp-y solution at all.)
It's not that functional methodology is impossible in ZIL. The language clearly imports as much of MDL's functional model as it can. It's just that this isn't very much. ZIL has about as much of MDL as can be compiled into static (non-allocating) code. Which makes sense -- that's exactly what the ZILCH compiler did.
To bring the story up to the present... In 1993, Graham Nelson released Inform. This language (and compiler) had exactly the same goal as ZIL: to efficiently compile source code into Z-machine files. Graham didn't have access to any information about ZIL at that point, so he just made up his own language, which was mostly like C because that's what he was familiar with.
(Of course C itself was designed to run efficiently on fixed data structures. So Graham had an easier job, in some sense. Not to take anything away from his accomplishment!)
Inform (up through version 6) became the cornerstone of the Z-machine ecosystem. But the Z-machine had some hard limits; it had been designed for 16-bit microcomputers and could not easily be expanded. So by 1997 there was a clear need for a new virtual machine. This is where I come into the picture. I designed Glulx as a replacement VM.
Amusingly (or ironically, or inevitably), Glulx had one core goal: to efficiently compile Inform 6 code to a new virtual machine. Every line of I6 code had to work essentially the same as it had before. Glulx used 32-bit values instead of 16-bit values, and I reorganized the memory layout and the instruction table. But the core data structures looked very much like those of the Z-machine.
As a result, it should be possible to compile ZIL to Glulx! I don't think anybody's done it. But Glulx was shaped by I6, I6 was shaped by the Z-machine, and the Z-machine and ZIL were shaped by each other. The chain of influence extends all the way from Joel Berez's coffee table to mine.
Inform 6 was followed by Inform 7, a completely new language which compiles to Inform 6 source code. At least for now. Future versions may compile to a new abstraction. But that's a story for another time.
Footnote: my conclusions about ZIL have to be qualified: I could be wrong. I don't know ZIL or MDL, really. I have an MDL programming guide open as I write this...
EDIT-ADD:
Given the MDL and ZIL code examples above, I thought it was worth adding the Inform 6 equivalent. This is from Allen Garvin's hand-polished I6 port of the ZIL code.
[ VillainStrength oo villain od tmp ;
villain = oo-->0;
od = villain.strength;
if (od >= 0) {
if (villain == thief && Thief_engrossed) {
if (od > 2) {
od = 2;
}
Thief_engrossed = false;
}
if (second && second has weapon && oo-->1 == second) {
tmp = od - oo-->2;
if (tmp < 1) {
tmp = 1;
}
od = tmp;
}
}
return od;
];
If you check, this is line-by-line equivalent to the ZIL version above. But it's much more readable to modern eyes, isn't it?
To some degree, this is just because I6 is part of the C-derived family of languages. (I'm told that "Algol family" is a better term, but I've never touched Algol.) This includes C, C++, C#, Javascript, Perl, Swift... very different languages, but they all share the basic assumption that if (X) print(Y)
is a sensible way to write code. A programmer these days might never have used any other kind of language.
I6 follows C very closely, in this example. The only quirks are:
- the
-->
operator (for arrays, where C would haveoo[0]
,oo[1]
,oo[2]
); - the
has
operator (for testing object flags; in ZIL this wasFSET?
); - defining functions in square brackets, which is idiosyncratic.
Everything else is expressed in ways that look completely natural. Mind you, the lower-case text helps a lot. But we're very used to code where identifiers are text, operators are (mostly) punctuation, and one line of code is conventionally one abstract step of procedure.
In 1979, for people brought up on Lisp, MDL probably wasn't as strange. No doubt they would say that it's much simpler to put the operator at the beginning of every call (<SET OD .TMP>
rather than od = tmp;
). Test operators always end with ?
, whereas in the I6 code you have to look for an if
to figure out whether you're performing a test. The difference between period and comma as atom prefixes is no doubt a concise way to express something (which I haven't bothered to look up what it is). And so on.
I still think the lower-case code is easier on the eyes, but hey.
EDIT-ADD 2: Tim Anderson has kindly sent along a lot of details about MDL -- or Muddle, as I should be calling it. I include his comments verbatim below.
A few comments on Muddle, ZIL, and their friends:
Although the original PDP-10 Zork was written in Muddle, it was, unintentionally, much closer to ZIL (which didn't exist yet, obviously) than it had to be. There were a few reasons for this:
-
Lisp-ish languages were, by outsiders, thought to be inherently slow, so it was a point of pride to write code (and build a tool chain) that disproved this. (There was a benchmark done with some math-heavy computation where compiled Maclisp was faster than FORTRAN, on the same hardware.) A lot of things in the Zork code were there at least in part because of this.
-
The basic objects (rooms and so on) were not lists, they were vectors (or perhaps templates; the difference was slight)--each object was a block of contiguous memory, much like a C struct. Quicker access to the elements, smaller memory footprint, and--much closer to the data structures you'd find in C, or ZIL.
-
There was a lot of effort put into not provoking the garbage collector (No consing, please, this is Lisp!). The Muddle garbage collector, when it ran (there was a lot of innovation in garbage collection later, but this was 1977), just stopped processing for a while, made a clean, compact copy of the heap, replaced the old one with that, and let you continue. This could take several CPU seconds, which on a time-sharing system could turn into a really long long time--very bad for interactive game play. In addition: the machine being used had a 256K word address space, with 512K of physical memory, and was supporting 10, sometimes 20 users. Keeping the memory footprint down was a good way to keep your friends: as much as possible of the code was in read-only memory--allowing it to be shared between processes, and ensuring that there was a clean separation later on, in ZIL. And avoiding dynamic memory allocation reduced the really significant load produced by the garbage collector.
-
As Jesse mentions in the comment about macros, they provided full access to the Muddle interpreter at compile time, and were certainly used in the original Zork as well as later on. The only memory saving produced by compilation was the elimination of macro expansion at runtime (in Muddle, as in Lisp, a call to a macro would be evaluated twice: first to run the macro, which returned some freshly consed-up code, then to evaluate that; the compiler did the first step at compile time, then compiled the result). Muddle, like Lisp, had as its basic function EVAL, which ran the interpreter on a single piece of uncompiled code. Zork never called that--slow, prone to unexpected consing, and so on. So it didn't get in the way when moving to ZIL (or to FORTRAN); everything that was running in Zork was compiled, which limited the bad (inefficient, difficult to port) things that Muddle supported in the interpreter.
-
Each Muddle atom could have two values--a global, and a local (
,FOO
was syntactic sugar for<GVAL FOO>
;.FOO
,<LVAL FOO>
). Locals, as with Lisp, had dynamic scope--in interpreted code, functions called from within an invocation of MYFUNC could access any locals bound within MYFUNC (assuming that no intervening function had another binding of them, of course). In compiled Muddle, though, local variables that needed to be accessed from outside the function itself had to be declared SPECIAL; any other locals ended up having, in effect, static scope within the body of the defining function. Zork used no specials: it made for faster, and, honestly, more understandable code. And also made it easier to port to statically-scoped languages. -
The Muddle compiler acquired some intrinsics to allow direct (single machine instruction) access to the key parser output variables PRSA, PRSO, and PRSI. This was a different way of thinking about things, and again required a bit more discipline.
-
It was unnecessary to declare the type of anything in Muddle, but that would make the compiler's job much harder, and reduce performance. So pretty much every variable, and every field in the Zork objects, had a type declaration associated with it. Faster code, but also easier to port to a language without dynamic types.
-
MAPF (or its friend, MAPR--map First vs. map Rest) was probably used in Zork somewhere, but not in the most general form, where it constructed a new list; rather, it was an easy way to iterate over a list. In that form, it would be relatively easy, if it came to that (I don't think it did), for Zilch to do something sensible, or even to write a macro that would construct the iteration at compile time.
The virtual machine used for Cornerstone was quite a bit more powerful than the Z machine. There were a few reasons for that:
- It was required. A nearly-relational database, with what passed for a GUI in those days, had need of a lot more functionality than a text adventure.
- The minimal target hardware was something like a PC-AT (maybe an XT): hard drive, 640K of memory, etc. There was no need to run on an Apple II or a Commodore 64.
- Infocom had a fair amount of experience with these things by then.
- Not to be invidious, but the people building the tool chain for Cornerstone were better at that kind of development (years of experience, and it was their primary job--not true with ZIL, except for the people building the individual Z machine implementations) than those defining ZIL and the Z machine. They weren't as good at writing games, though, so it worked out.
(-- Tim Anderson)