Sunday, April 17, 2016

IF awards and how we think about them

We just got a new issue of SPAG. (The Society for the Promotion of Adventure Games, a long-historied zine of the IF community. It's old enough that it was originally "Society for the Preservation of Adventure Games" because we thought IF might die out or something. 1994, right?)

I want to respond to Ted Casaubon's article, "Safeguarding Your IF Voting From Animal Attack". The author looks at our IF voting traditions (IFComp and the XYZZY Awards) and puts them in context with last year's furor around the Hugos, the (much more famous) annual awards of the science fiction and fantasy community.

This is an excellent article overall. Ted's comparison is absolutely one that weighed on my mind last year, and still does today. The 2016 Hugo nominations were last month, and XYZZY nominations just started. Does the videogame world have a radical-angry faction analogous to the Sad/Rabid Puppies? Why yes. So it could happen here and we should worry about that. The article talks about that possibility and it does a good job.

However, I think the article skates over the heart of the issue. Let me quote from the concluding paragraph:

The voting systems described above are all intended to ensure that a minority bloc doesn’t thwart the will of the majority. But the reality is that a majority voting bloc could be just as harmful to the integrity of an IF award, if it was the result of a raid on the polls from outside the community. The only real way to prevent that would be to limit who gets to vote.

(-- Ted Casaubon, "Safeguarding Your IF Voting From Animal Attack", SPAG#63, 11 April 2016)

(Yeah, look at me footnoting.)

Here's the thing: the IFComp, the XYZZYs, and the Hugos are all popularity contests. That's fundamentally what they are. When you talk about ways "to prevent that" -- prevent the majority from winning your popularity contest -- you've made some deep conceptual mistake.

And yet, it's not a simple mistake. Ted is correct to note the 2012 incident in which an unguarded blog post flooded the XYZZY noms with votes for a single entry. That was a problem, and the admins dealt with it (correctly in my view) by disregarding those votes. So how does that make sense? Is that a case of ignoring the majority?

Ted's article describes this in terms of "bloc voting" -- which was also the common diagnosis of last year's Hugo problems. If you look back at Dan Fabulich's blog post, he also talks about "the voting block". But he also says:

The XYZZY Awards are normally decided by a close knit community of interactive fiction enthusiasts; more than a hundred votes is a good turnout for XYZZY. ... But this year, our votes completely overwhelmed the entire interactive fiction community.

(-- Dan Fabulich, "We Almost Flooded the XYZZY Ballot Box", 5 March 2012)

This is not a distinction about tactical voting, but about community self-definition -- "our votes" versus "community votes". And this is what I want to step back and consider.

The Hugo situation was not primarily a case of outsiders flooding the ballot box. The leaders in the Sad/Rabid Puppy groups are well-known SF authors and regulars at SF fan conventions. (The least-known, Ted "Rabid" Beale, is still a writer sufficiently established to join, run for president of, and then get thrown out of the SF Writers of America.)

It is true that Puppy campaigning must have brought in votes from people who would not otherwise have purchased a Hugo voting membership. It is also true that the counter-reaction also brought in votes on the other side. For example, me! I cast a Hugo vote for the first time in probably fifteen years.

But am I an outsider? I hope nobody would say that. I'm a convention regular too; I've been going to East Coast regional cons (Balticon, Philcon, Arisia, Boskone) since high school. I've been to several Worldcons, and every Worldcon attendee can cast a Hugo ballot. I just haven't bothered very often. The Puppy situation pulled me back in.

So, while 2015 Hugo voting hit record levels, it's not obvious that much of it came from people who never read SF. My guess (although I have no statistics) is that most of the increase came from SF readers and folks enmeshed in fandom who suddenly cared a whole lot more about the Hugos than usual. That is a good thing. When we talk about a problem in the Hugo voting, we're not talking about that.

Nor are we talking about the fact that the avalanche was tipped off by a couple of racist, homophobic right-wing conspiracy theorists, plus a bunch of other conspiracy theorists who thought that the first guys were fine travelling companions, and then their toxic views gained currency across a stretch of the fandom landscape. That is a problem -- a big problem -- but it's not a voting problem per se.

No, the voting problem is a hole in the Hugo first-round (nomination) rules, which allowed a minority of the voters (30% if I recall the estimates) to completely control most of the voting categories. Which they did, and filled them with entries that the majority (70%) thought were junk. I don't mean people saying "enh, not my favorite story of the year"; I mean people saying "not in the top ten, not in the top twenty, not worth being mentioned on this ballot". Result: predictable collapse in the second round.

(Note: the particular form of the collapse followed from the Hugo rules, which allow the voters to select "no award in this category" as the winner. This is really just a detail, however. Despite a great effort from the Puppy supporters to say otherwise, the use of "no award" was a response to the problem described above, not a problem in itself. If it hadn't been that, it would have been something else.)

What was the hole in the rules? Bloc voting, or, more specifically, bloc nominating. I think this is a problem in most open-nomination voting systems. It's like this:
  • If 100 people with a range of opinions name their five favorite things, they'll all give different lists. But with a lot of overlap. The top thing will be listed maybe 20 times, the next maybe 15 times, the next maybe 12 times. I'd call it a normal distribution, except that's about single-axis variables, but you get the idea -- a bunch of small heaps piled up to make a larger, fuzzy heap with a peak.
  • If 30 people name five things, but they all agree to name the same list, then the top thing will be listed 30 times.
  • 30 beats 20. The 30% controls nominations. Kaboom, as described above.

The implicit assumption of the open-nominations system is that the fuzzy top zone of the heap is acceptable to most of the voters. Your favorite thing may not be the top nominee, but some of your top five are probably in the list. And if everybody votes their own opinions, it's a pretty good system for doing that.

Obviously it's not perfect. If there's a completely bimodal split, the minority is probably hosed. That is, if 70% of the voters only like green things, and 30% of the voters only like purple things, purple gets shut out. You can say that's what majority voting means -- which is true, although perhaps less true in the nominations round. (Should one or two purple nominees show up in the top five?) But in any case this is an extreme edge case; not what you would expect from realistic, honest votes.

So are the open nominations of the XYZZYs (in progress now!) vulnerable to this sort of collective minority action? Sure. No question about it.

This is compounded by the notion of votes who really are coming from outside "the community". That's what happened in 2012. Choice Of Games has a larger internet following than the XYZZY community; they (inadvertantly) swamped it. You could imagine some Youtube streamer or other net celebrity telling their followers "go to this web site and vote for this game!" Then there would be thousands or tens of thousands of XYZZY nominations for it, and what does that mean?

Ted's article (remember Ted's article? This post is a response to Ted's article, at least it was before it hit 1500 words and climbing):

The enthusiastic ChoiceScript supporters were seen as invaders by the IF community in the 2011 XYZZYs, but with Creatures Such as We taking second place in the 2014 Comp and Scarlet Sails taking 7th in 2015, they probably wouldn’t be considered such outsiders today.

But this is the wrong way to look at it. The same flood of blog-spawned votes for an Inform game would have been equally a problem. The point of the XYZZYs, if there is a point, is to discern what the IF community thinks is best in IF this year. And if "the IF community" is a circularly-defined thing of shifting and argumentative boundaries, it is still a thing. Or else the awards stop being interesting.

A couple of years ago I read an article about the Billboard hip-hop charts. Everything in it sounded familiar -- and this was before Gamergate and Puppies appeared on the scene. It wasn't about malicious or coordinated vote-rigging; it was about the inherent fuzziness of self-defined community.

Ideally, any effective genre chart—be it R&B, Latin, country, even alt-rock—doesn’t just track a particular strain of music, which can be marked by ever-changing boundaries and ultimately impossible to define. It’s meant to track an audience. This is a subtle but vital difference. If an R&B chart tries to cover whatever might be termed R&B music, you get into the subjective, slippery business of determining what, or who, is “black enough” for the chart.

(-- Chris Molanphy, "I Know You Got Soul: The Trouble With Billboard’s R&B/Hip-Hop Chart", Pitchfork, 14 April 2014)

If I may sum up: the R&B/hip-hop charts were interesting when they measured what the core hip-hop audience was listening to. In the 80s and 90s that meant sales at stores where the fans -- and artists! -- were buying; it meant playlists on hip-hop radio stations. That's how you knew where the genre was, what was new and hot and (perhaps) about to cross over to the mainstream.

But now it's Internet time. What's a music store? What's a radio station? What do you even measure? Well, you measure digital downloads; but that's everybody, not R&B fans. And so you get a chart which shows, not the top hip-hop songs, but the top songs which are hip-hop. It tells you nothing about the genre, only about how you label songs. "Crossover hits" become meaningless.

Our community awards are about what's hot in IF -- what we, the fans and (presumed) literate critics of IF, think is new and good. We do that by polling our community! And, yes, excluding everybody else from the poll. You can say the same of the Hugos: they're supposed to measure what sci-fi fandom, the widest-read and most discriminating nerds, say is best.

That's why the raw cry of "include more voters!" is a problem. Take that to its limit: you poll every gamer (or every reader). Then your awards go to the most popular game which can be called IF. Or the best-selling book which looks like SF/fantasy. But that's boring! Best-seller charts are easy to find. Robert Jordan and his literary successors sell in truckloads. You go to the Hugo lists to find out if those books are any good.

(Spoiler: Robert Jordan has been nominated for Best Novel once. Lois Bujold has been nominated ten times and won four times. No, that doesn't define quality, but at least it tells you what fandom likes.)

So we want to keep voting inside "the community". But we also want the community to be open to newcomers. Um...

(Hugo voting is limited to Worldcon members; anybody can become a voting Worldcon member for $50. Now you understand the conflicting imperatives between those two facts.)

The IF world has the great advantage of being small, informal, and not very important. The XYZZY and IFComp admins retain the right to exclude votes (or works) at their discretion. That works because we know Sam and Jason; they're open and flexible about their decisions; the discussions remain personal. The Hugos are more ponderous and (necessarily) more legalistic.

But, in both cases, one cannot determine right action through rigid rules. You have to know what's going on in the community. To define the community, even, you have to know what's going on. (Circular, like I said.)

Let me quote one more bit from Ted's article:

[...] last year’s comp saw rumblings of the fact, or perhaps coincidence, that every Twine game in the 2015 comp, without exception, received two 1/10 votes.

As I said in the linked thread, that sounds about right to me! There are people in the IF world who like parser games more than choice-based games; our awards should reflect that. It doesn't surprise me that a couple of those folks feel so strongly that they'd one-star every Twine game. Those votes are coming from inside the house.

If fifty voters were doing that, it would indicate a problem. Not because there's a hard-line limit (more than 25 votes is a bloc?) -- but because it doesn't reflect what I see and hear on the forums. There just isn't that much negativity. So I would want the admins to look into where votes were coming from; I would check out non-IF gamer sites for organized opposition.

In between two and fifty... judgement call. It's contextual. It's all contextual.

Of course, this is where the "conspiracy theorist" element rears up. If what you see in the community absolutely contradicts what I see -- say, if you believe that one publisher gives marching orders to the majority of Hugo voters -- then we will never come to terms about what is right action.

In the end, we're talking about three distinct-but-enmeshed problems:

  • A two-stage voting process with open nominations is mechanically vulnerable.

  • Defining the boundaries of your voter pool is both absolutely necessary and necessarily subjective.

  • Awards or no awards, there is a toxic subculture within both the gaming and sci-fi fan communities.

On the third problem, I have nothing smarter to say than any of the rest of us.

On the second, I try to participate in the process. I trust that the IF community can grow organically without losing itself. It's worked so far, and it's worked by communicating across boundaries.

As for the first... IFComp doesn't have open nominations in the sense that we're talking about. (But it has open submissions, and we can't dismiss the idea of a voting bloc pushing its own entrants.)

In the XYZZYs, the discretion of the IF award organizers should serve. We hope. One day it won't, but I think that will be when the IF field is too large for personal ties to hold it together -- and that will a success in its own right.

On that subject, I should note that the Hugo rule change proposal is in progress but has yet to be adopted. For obvious reasons, the Hugo rules are hard to change. If the proposal is ratified this summer, it will be adopted next year.

Therefore, this year's Hugos may well be as much of a mess as last year's. Or not! Or a different mess! We hear that first-round voting ran at twice the volume of last year, but what do these new (or returning) voters want? If there are two teams -- to oversimplify -- which team are they on? Tune in on the 26th to find out, I guess.

Comments imported from Gameshelf

Dan Fabulich (Apr 17, 2016 at 4:06 AM):

1) I would love it if the rules of XYZZY and/or IFComp would change such that I could actually blog about those competitions without fear that we'd ruin the results.

Those [two anti-Twine] votes are coming from inside the house.

Even granting that the votes did come from inside the community, if there are two community voters who voted all the Twine games a 1/10, (which I don't know for sure, but Jason could verify,) it's unreasonable to think that those voters made a "good-faith effort to actually play as intended," as required by the rules.

Two 1s can have a surprisingly large effect on the final ranking, e.g. nerfing two 1s would push Birdland up from 4th place into 2nd place.

Andrew Plotkin (Apr 17, 2016 at 5:40 PM):

1) Such a change may not be possible. The IF comps are small and can be swamped by a larger amount of incoming traffic. That's inherent. (The Hugo change is not meant to cope with a situation like "4000 fans and 10000 outsiders all try to vote.")

2) Yes, the rules say "good-faith effort", but we can't police that or even provide a hard definition. The best the organizer can do is write privately to the person and say "Did you really try these games?" And maybe they did, really.

At least twice in IFComp's history, I've decided that I'd made a good-faith effort after reading the title screen and saying "hell with that". (Both cases many years ago.)

Lucian Smith (Apr 18, 2016 at 1:04 AM):

For what it's worth, the XYZZY votes in 2012 were not 'thrown out', per se. There was an obvious bimodal distribution in the votes, so instead of making the XYZZY's about which mode was larger, they were just split: in one of the modes, one game literally won all the awards, and in the other mode, there was a more 'traditional' distribution of winners. At the ceremony, we just announced the winner of everything from one mode, and then the suite of winners from the other mode.

Any obvious 'voting bloc' could be dealt with the same way in the future. Just announce what won from that bloc, announce what won from everyone else, and don't even bother saying which bloc was larger. Anyone who wants to argue about which one is the 'real winner' is welcome to argue, but that's beyond the scope of the award itself.

Sam Kabo Ashwell (Apr 18, 2016 at 2:10 AM):

I have a few other thoughts coming, but I should stress this: the 2012 ChoiceScript votes didn't outnumber the regular voter population. They were just very, very focused.

Let's say you have a four-game category in the second round, including one vote-bloc game. Three of the games are the sort of thing that the traditional voters expect to see, and they'll mostly vote for one of those. Let's say that those voters split about 50-30-20. The fourth game wins if the vote-bloc users beat that 50% game - that is, if they make up a little over 1/3 of total voters.

Even without voting blocs, this isn't uncommon. Some XYZZY winners have won with an outright majority; many don't. (And in the first round, of course, the threshold is considerably lower, because those established voters are split across dozens of games.)

Andrew Plotkin (Apr 18, 2016 at 1:28 PM):

Thanks for these corrections (Lucian and Sam).

The relative size of the Choicescript-voting group is an important point, and yes, it alters my previous comment about what kind of rules changes might benefit us.

As to the bimodal thing -- changing the rules post-facto is always a problem. If we had been in the middle of a hostile culture clash, the organizers would certainly have been accused of throwing those votes away, under the polite fiction of inventing a new award and moving all the "bad" votes over to it. ("Winning with an asterisk," a phrase repeatedly thrown around last year's Hugos.)

It didn't get spun that way in 2012, or not much that I recall, because all the community leaders (starting with Dan F.) were in good accord and firmly said so. If a problem comes up in the future, I don't think it'll end that smoothly.

Douglas Knight (Apr 21, 2016 at 10:24 PM):

Here's a rule change: voters must register before the start of the competition. Then a blog post in the middle of the competition cannot influence it, except by making people think about joining next year. Maybe that's even less neighborly than asking Dan to wait until the end, but it's hardly impossible.

Andrew Plotkin (Apr 21, 2016 at 11:31 PM):

Since the publicity for the comps really only exists while they're running, this boils down to "Put a one-year delay on signing up to vote."

That's an interesting intermediate between "anybody can vote" and "voting is limited to a hand-picked jury". However, it's still a pretty extreme change from what we have now.

K.Intil (Apr 23, 2016 at 8:08 PM):

I agree on the most part with Ted's article but it's a sad that in every competition/awards there's always a negative element. They are after all a celebration of the achievements of a community as a whole and in the true spirit of the community, fairness and honesty is implied. That there are safeguards against this kind of thing that are already being talked about or implemented makes you wonder sometimes.

Andrew Plotkin (Apr 23, 2016 at 8:22 PM):

Wonder what?

Sonata Green (Apr 29, 2016 at 8:11 PM):

I'd be interested in seeing a comp where the criterion to be a voter is to have ever entered any comp, including itself. (I think jams do something like this.)

