A machine that writes like Mary Dorothy George

Image: Mary Dorothy George, by Howard Coster, bromide print, © National Portrait Gallery, London, CC-BY-NC-ND 3.0

In recent years I’ve been researching histories of knowledge organisation. Although this has involved many strands of research (I’m currently slightly obsessed with the work of the art historian Judy Egerton), given my background in the history of the printed image, I’ve focused extensively on the work of the historian Mary Dorothy George, who – between 1930 and 1954 – created a monumental seven volume piece of scholarship: 12,553 entries spread over nearly 7,000 pages that catalogued British satirical prints published between 1771 and 1832. George’s work on the Catalogue of Political and Personal Satires Preserved in the Department of Prints and Drawings in the British Museum effectively created a field of study, and has since its publication been a constant interlocutor between the historian and this remarkable era of graphic reproduction, first in print and most recently online. As a result, my research is considering what sort of interlocutor George has been, the ways in which the late-Georgian satirical print was remade by being projected into the world through the remarkable labour and voice of Mary Dorothy George.

GPT-2

An experimental off-shoot of this research has involved turning George’s catalogue descriptions into automated writing, thus making a simulated version of this historical interlocutor. simGeorge as we’ll call them – with a nod to a wonderful piece by Shawn Graham – is a generalisable language model inflected by a corpus of catalogue entries written by Mary Dorothy George. The underlying model is OpenAI’s GPT-2 release, which is a recurrent neural network (RNN) trained on 8-million upvoted onward links from the news aggregator site Reddit. GPT-2 is an ‘unsupervised’ language model, in that it is not refined by direct human evaluation (e.g. mark-up relating to sentence structure or good/bad sentence outputs), and it is designed to create automated text for many different tasks, from sentence completion and chat bot Q&A, to content summary and machine translation. As far as I understand it, this kind of language model ‘writes’ by picking a word, then another, then another, et cetera, with the prediction of each word based on the word the model has already ‘seen’. This is different to, say, a language model that simply pieces together existing bits of writing. Instead, the model writes like the writing it has seen, but not in the same way as the writing it has seen. In this regard, the GPT-2 approach performs particularly well in ‘zero-shot settings’: that is, the model can write a sentence it hasn’t seen before, and follow that sentence with a second sentence that – logically – it has never before seen follow the first sentence, because it has enough contextual information to ‘know’ what a first sentence should be and what a second sentence should be given the first. For more info onn the model, see Radford, Alec, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. ‘Language Models Are Unsupervised Multitask Learners’ (2019).

By inflecting the model with another corpus, we give the model the “voice” of that corpus. For example, if “Fox, the central figure in the design,” is the opening section of a description, simGeorge continues with passages like:

stands in a theatrical attitude on a cloud-less and levitated document. His head is the head of Sheridan, with drooping hair, but also the features of North [et cetera]

or:

stands on the edge of a narrow channel surrounded by railings. He pierces the heart of North, who crouches beneath a sack, leaning backwards. The mouth of the sack is open [et cetera]

In contrast, the un-inflected model will – based on tests at https://talktotransformer.com/ – continue the same description opener with passages like:

will call a press conference shortly after midnight tonight, sources said. Fox and Robert Mueller are expected to appear on television together for the first time ever tonight [et cetera]

or:

will be offered for sale in February, while several other items will be offered for sale over the following months. The magazine has been in production since 2008 [et cetera]

And so simGeorge is neither just the underlying model nor a bunch of George’s writing cobbled together. Rather it is a (not ethically unproblematic) zombie resurrection of George, a means – to paragraph Shawn Graham – of creating a space within which George could have written.

Making simGeorge

simGeorge was created using the python package gpt-2-simple, written by Max Woolf. And it was implemented using a Colaboratory Notebook, also written by Max Woolf. You can find out more about both on Max’s post “How To Make Custom AI-Generated Text With GPT-2“. From my perspective, the advantage of using Colaboratory rather than a local setup was two-fold: it avoided the (high) likelihood of me running out of talent during setup, and it gave me easy access to no-cost GPU time, which made the training faster and less locally resource intensive.

The Notebook starts with basic steps – loading the the GPT-2-simple package into TensorFlow (an open-source platform for machine learning), connecting to a GPU, downloading GPT-2. After that, the notebooks allows you to select a corpus with which to train the model (in my case, `CurV-corpus-27Jan2019.txt` at http://doi.org/10.5281/zenodo.3245037, hereafter the “BMSatire Descriptions corpus”), to fine tune the training, and to generate text based on a series of parameters. Having read around these settings a bit, and experimented a little (but not too much, the GPU time may carry no direct monetary ‘cost’ but it does incur an ecological cost), I settled on an approach that created George-like outputs without overfitting to the BMSatire Descriptions corpus, and that was able to generate relatively coherent outputs at varying levels of ‘creativity’. Note that in GPT-2 text generation, ‘creativity’ is determined by ‘temperature’: a zero to 1 scale where a value nearer to 1 means the generated text makes ‘more random completions’ (so is more ‘creative’), and a value nearer to zero will be ‘deterministic and repetitive’ (see https://github.com/openai/gpt-2/issues/27#issuecomment-467318216, though note that in my testing, anything below 0.3 was far too repetitive).

simGeorge as a writer

So what does simGeorge write? Here is an example. Remember, this text isn’t based on a real print, it is a text created from the inflected language model (note that where you see *TRANSCRIBED* and *BRACKETED* these are placeholders in the BMSatire Descriptions corpus for text was not written by George, usually text transcribed from a print, for more on this see doi.org/10.5281/zenodo.3245037):

The Regent, very drunk, sups in the kitchen at the Pavilion, at a table covered with invitations to dinner. He leans back in his chair, pugnacious and insolent, his eye and mouth watering. He holds a glass and a wine-bottle, brandishes a wine-glass to his mouth, saying, *TRANSCRIBED* His chair is decorated with the Prince’s feathers, oak-leaves, and roses. On the wall behind *BRACKETED* is a picture, *TRANSCRIBED*, of Princess Charlotte drinking from a decanter; she leans over the counter holding a fan. She wears a loose high-waisted dress, with a towering feather, a medallion of a crowned head on a halo, and a coronet with a ducal coronet.

This is a plausible description of a print. It contains a character, it tells us what they are wearing, doing, and saying, and it tells us what is around them. The description is also a plausible George-like description. It begins by talking about an individual who we assume to be the central character of the print, it moves on to tell us what that individual is wearing, doing, and saying, and it concludes by describing the background of the scene. The description even describes a plausible late-Georgian satirical print that is recognisably in the genre of those that poked fun at England’s “royal brats”. The Prince Regent is the central character, as he so often was. And the Prince is not only seen drinking, but he is rude and he is surrounded by his regalia (‘feathers’), emblems of England (‘oak’, ‘roses’), and images of his family. In short, a few syntactic slips aside, this ‘description’ is feels right. And this isn’t an isolated example. For instance, simMDG also writes:

Wellington, dressed as a French army officer, stands with his hands folded in profile to the right, his sword resting on the ground. He declaims: *TRANSCRIBED* Beside him stand Napoleon’s discarded muskets and cavalry boots. He says: *TRANSCRIBED* Wellington says: *TRANSCRIBED* In the background a Russian military officer stands full face, bald head turned in profile to the right, smoking a long pipe. He is a fat and dishevelled fellow, with a melancholy frown. Behind him are mounted Russian soldiers; two stand at attention. An officer with a cane and by-barrel approaches with a drawn sabre.

This is a description of a plausible military scene. It is populated by military officers, objects, and action. It describes a print whose form, though imaginary, we can begin to reconstruct in our minds-eye: Wellington central and grumpy, Napoleon escaped but having left behind some of his effects, numerous Russian soldiers at their heels.

Of course, the spell is broken by some slips: why are muskets and boots speaking? Can a figure be both full face and in profile? Is it possible to hold a cane, by-barrell and a sabre at the same time? And what is a by-barrell? Indeed, the overwhelming majority of descriptions produced by simGeorge gets something wrong. Their writing contains many instances of logical confusion, such as the bishop who ‘draws his drawn sword’, or all the characters with too many hands (e.g. the Prince of Wales ‘who walks arm-in-arm’, puts ‘his arm round the waist’ of a courtesan, and ‘holds his hat’, or the Bishop of Durham with ‘hands clasped’ who is also ‘holding a card in his hand’ and has ‘a cup [..] in his right hand’). Similarly, in many simGeorge descriptions people are found both in profile to the left and to the right, are wearing too many coats, or are described as both standing and sitting. In some cases a description that starts with clarity and assertiveness trails off gradually into nonsense. And less often simGeorge’s descriptions confuse the gender of protagonists (“A maidservant stands by the open door *BRACKETED* adjusting his chair”), produce spatially improbable arrangements of fore and background action, or are lightly anachronistic (Francis Burdett standing for election in 1774).

In short, some problems aside, the results are readable enough often enough to introduce them into my reading of the original source material, not via an analytically rigorous overview of all the outputs, there are too many errors for that, but instead as a way into themes and trends, into omissions and constructions, a way back into George and her readings of the prints.

The problem with simGeorge

But the results are also a problem. I’ve written this post in the hope that it’ll help others with similar interests take a similar approach to automated text generation, not least as one of my challenges right now is how to read the outputs of simGeorge, how to grapple intellectually as a historian with fabricated catalogue entries in the style of Mary Dorothy George (big thanks here to Stephen McGregor for pointing me in the direction of various literature in the domain of computational creativity). Hopefully then some of you will use this post to make your on simWriter or simCurator. If you want, you can even create your own simGeorge from my settings and our dataset. But what I’m not giving you right now is all the outputs simGeorge has produced. This isn’t because I don’t like open licensing my research data (I do) but because the results are a problem.

One of the ambitions of my current research is to recover the motivations behind acts of cataloguing and the impact of these motivations on the fabric of knowledge (an motivation of Bowker and Star’s superb Sorting Things Out). As my colleague Andrew Salway and I argue in a forthcoming paper (which is nearly through peer review!), when legacy catalogue data like George’s work for the British Museum is moved from print to data, that data begins to become uncoupled from its circumstances of production. In the process that data can and does become jumbled into datasets put to generalised use, often to train and develop language models like GPT-2. It follows then, as Agostinho et al have argued, that “the future that an algorithmic system can predict is limited by the historical data used to train that system”. And having looked at nearly 15,000 descriptions produced by simGeorge, it is clear that the input – the BMSatire Descriptions corpus – pushes the language model towards oppressive outcomes. simGeorge makes normative assumptions about what a ‘man’ is and what a ‘woman’ is: they are both white and English unless otherwise stated. simGeorge tends to arrange the speech acts of men before the speech acts of women. simGeorge loads short-hands for nationality with temporally-specific judgements of character. simGeorge is lightly fattist and ageist. And simGeorge is – occasionally – outright racist. And whilst these are features of the late-Georgian prints George was describing, their appearance in simGeorge cannot be accounted for alone by frequency effects in the prints alone, because they are manifestations of language choices produced by the historically specific circumstances in which George’s labour took place.

My work on and with simGeorge has taken place alongside a period of sustained reading around the history, politics, and practice of knowledge production and organisation (short bibliography: Ruha Benjamin, Safiya Umoja Noble, Catherine D’Ignazio and Lauren Klein, Geoffrey Bowker and Susan Leigh Star, Paul Gilroy, Candace Greene, Emma Perez, Roopika Risam, Hannah Turner, Elizabeth Yakel, Terry Cook and Joan Schwartz). All this has left me with a profound discomfort about sharing – as a dataset – the descriptions simGeorge has written. At the very least, it feels important that simGeorge’s outputs shouldn’t be uncoupled from the historical circumstances of their production. But I’m also troubled by the idea of publishing the data not only before redacting, marking-up or otherwise dealing with (per, for example, the archival model proposed by Chilcott, 2019, or the pedagogical approach used by Koritha Mitchell) the extreme manifestations of simGeorge’s “worldview” (e.g. racist language), but also before having investigated further its quieter, systemic, and pervasive oppressions. This approach feels right to me. But I’m always keen to know what others think, and to learn from examples of practice that I’ve missed.