Last week an "Google AI ethics" article went round the merry-go-discourse. I won't bother linking except for this apropos comeback from Janelle Shane:
Stunning transcript proving that GPT-3 may be secretly a squirrel. GPT-3 wrote the text in green, completly unedited! (...transcript follows)-- @janellecshane, June 12
We're facing piles of critical questions about AI ethics. They do not include "Is Google oppressing sentient AIs?" Here's a starter list of real issues:
What's the difference between using an AI algorithm as part of your artistic process and using it as an artistic process in itself?
Using an AI image algorithm as a source of idea prompts? Tracing or redrawing pieces of the output in your own work? Using pieces of the output directly? Generating ranges of output and iterating the prompt in the direction you want? Generating ranges of output and using them as PCG backgrounds in a game? What will we count as legitimate and/or desirable artistic work here?
How much human supervision do we require on procgen output?
If the background imagery of a game (movie, whatever) shows AI-generated cityscapes, sooner or later something horrible will appear. If an AI is generating personalized emails, sooner or later it will send vile crap. Do we hold the artist/author responsible or just say "eh, it's AI, Jake"? Do we insist on a maximum "error rate"? What's the percentage?
(Do we hand the problem of preventing this off to another AI? "Generative adversarial network" in the literal sense!)
How do we think about ownership and attribution of the data that goes into AI training sets?
Is the output of an AI algorithm a derivative work of every work in the training set? Do the creators of those original works have a share in the rights to the output?
If an image processor sucks up a million Creative Commons "noncommercial use only" images for its training set, is the output of the net necessarily Creative Commons? What if it accidentally grabs a couple of proprietary images in the process? Is the whole training set then tainted?
(We're already deep into this problem. The past few years have seen a spurt of AI image tools with trained data sets. They're built into Photoshop, iOS and Android camera apps, AMD/NVidia upscaling features, etc, etc. What's the training data? Can we demand provenance? Is this going to turn into a copyright lawsuit morass?)
What does it mean if the most desirable artistic tools require gobs of cloud CPU? Will a few tech giants monopolize these resources?
Will we wind up with a "Google tax" on art because artists are forced to use Colab or what have you?
(This isn't new to AI, of course. Plenty of artists "have to" use a computer and specific hardware or software tools. The tech companies aren't shy about extracting rents. But AI could push that way farther.)
What about the environmental costs? Will artists get into an arms race of bigger and more resource-intensive AI tools? All computers use energy, but you really don't want a situation where whoever uses the most energy wins. (Bl*ckchain, cough cough.)
What does it mean when AIs are trained on data pulled from an Internet full of AI-generated data? Ad infinitum. Does this feedback loop lead us into cul-de-sacs?
What assumptions get locked in? It's easy to imagine a world where BIPOC people just disappear from cover art and other mass-market image pools. That's the simplest failure mode. AI algorithms are prone to incomprehensible associations. Who knows what bizarre biases could wind up locked into our creative universe?
How do we account for the particular vulnerabilities of AI algorithms? Can we protect against them once this stuff is in common use?
What if saboteurs seed the Internet with pools of images that are innocent to human eyes, but read as mis-tagged garbage to AI algorithms? Or vice versa: hate speech or repugnant images which AI algorithms pick up as "cute kittens". Could that get incorporated into training sets? Turn every AI tool into a Tay-in-waiting?
The meme-y AI art is all visual and text. But I'm particularly interested in how this plays out for audio -- specifically, for voice generation.
I love building messy, generative text structures. I also love good voice acting in a game. These ideas do not play together nicely. (I guess procgen text is a love that dare not speak its name?)
Text variation like this is trivial in Inform 7:
say "[One of]With a start, you[or]Suddenly, you[or]You blink in surprise and[at random] [one of]realize[or]notice[at random] that your [light-source] is dimming. In just [lifespan of light-source], your light will be gone.";
But if you're writing a fully voice-acted game, you don't even consider this sort of thing. Not even so simple an idea as contextual barks in a shooter game: "Get [him/her], [he/she]'s behind the [cover-object]!" It's not in scope. Which is a shame!
AI voice generation is an obvious path towards making this possible. It's also an obvious path to putting all the voice actors out of work.
How do we negotiate this? What does it mean to put an actor's unique performance into an infinitely extensible corpus of text? How do we pay people when "per line" is a meaningless measurement? How much sampling do we need for a good result? Do we need direct-recorded "cut scenes" for the really emotional bits? What about applying "moods" (angry, tired, defeated, scared) to specific lines to match the current state of the character? There's lots of possibilities here, and we have no idea how to work them out in a way that's fair to both designers and performers.