In my previous post I mentioned that the first thing I’d do as I embarked on research using metadata for satirical prints catalogued by the British Museum would be to try to distinguish whether the data I had represented the prints or the cataloguers, the cataloguing, or the process of the data creation. This is an important first step for three reasons:
- I don’t want to be accused on overstanding my data when I use that data to explore/analyse the prints they represent;
- I want to know my data (and validating what it is about is the good way of doing so);
- The descriptive elements of the data were made by two people.
The last point requires a little background. The Catalogue of Political and Personal Satires in the British Museum (details at Wikipedia and the British Museum; hereafter BMSat) is an extraordinary work of scholarship. Published between 1870 and 1954, with a hiatus between 1884 and 1934, the BMSat includes descriptive metadata (that is, not just names and dates but transcriptions and descriptions) for British satirical prints published in the seventeenth, eighteenth, and nineteenth centuries. And although far from complete, this collection of prints – the core 10,000 of which was purchased 1868 – is the national collection and has unrivalled coverage of single-sheet printed satirical output for the period in question. Subsequent addition have only bolstered its status.
Prints covering the period up to 1770 were catalogued by Frederic George Stevens over four volumes (published 1870-1883) and prints covering the period 1771-1832 were catalogued by Mary Dorothy George over seven volumes (published 1935-1954). Since the publication of these volumes George’s work in particular has formed the basis of most, if not all, work on British graphic satire pre-1832, and since the British Museum migrated the catalogue to their website in 2008-2012 her rich descriptions have driven traffic to both the most famous work of this ‘Golden Age’ graphic satire and its less known productions.
Although when we are confronted with these catalogue entries on screen (and as data) the descriptions are clearly the work of a person, this is easy to forget when search terms are entered and results displayed. But it is important to note that with a little poking around find not insignificant discrepancies in the work of the two cataloguers:
- Volume: in the data returned from my British Museum SQARL query (interface; query) 3072 satirical prints (3470 including duplicates where ‘For description see other impression’ is the common descriptive placeholder) cover the period of the Stevens catalogue (hereafter FGS) and 15183 are returned (20225 including duplicates) for period the George catalogue (hereafter MDG) covers. If FGS contains then only 17% of the records MDG contains, FGS is also more economical in description: only 141727 word tokens (that is 46 per print) are present for FGS compared to 1219305 (80 per print) for MDG.
- Language: pushing these texts into AntConc and poking them with concordance measures, ngram counts, and word lists reveals many consistencies. High in the word ranking list (once stop words are removed) are words that navigate us around the prints under examination: ‘hand’, ‘left’, ‘right’, ‘behind’, ‘holds’, ‘says’, ‘wears’, ‘profile’. Even so, discrepancies exist:
- ‘inscribed’ is ranking 4th in MDG, ‘inscriptions’ 53rd in FGS (I’m aware that normalizing the words into their base forms – into lemmas – would improve matters here, this is – I trust you appreciate – all preliminary work and thinking in public);
- ‘hand’ (used, for example, in ‘right hand side’) 1st in MDG and 11th in FGS;
- ‘satire’ 8th in FGS but as low as 74th in MDG and though the latter changes to approximately 20th if combined with ‘satires’ (again analysis of lemmas needed…) George clearly had less urge to call what she was describing a ‘satire’ than did Stevens
- Themes: turning away from structural words and to thematic content, the two datasets – as we might expect – represent different concerns in the underlying data. They do, after all, describe prints published up to 1770 and between 1771 and 1832 respectively (if you want to be reminded of how quickly both the subject and form or satire can change, pop over the British Cartoon Archive…) And so the ‘dutch’ feature in FGS (20th), much less so in MDG (502nd); where ‘walpole’ is the top named individual in FGS (34th), Fox is for MDG (21st). Digging a little deeper, differences and similarities continue:
- 2grams for both sets emphasise royalty, nobility, the devil, and dogs, for FGS ‘the pope’ (104nd) and ‘robert walpole’ (142nd), and for MDG ‘john bull’ (62nd) and ‘cocked hat’ (156nd);
- FGS 3grams emphasise the South Sea Bubble (‘on the financial’ (28th), ‘the financial crisis’ (30th), ‘financial crisis in’ (43rd), ‘the south sea’ (67th)), MDG 3grams the royal family (‘the duke of’ (27th), ‘the prince of’ (84th)) and costume wearing (‘dressed as a’ (87th);
- and finally 4grams reiterate these themes, as well as in FGS giving prominence to structure (‘verses in two columns’ (16th)) and in MDG putting ‘the prince of wales’ (36rd) back above ‘the duke of york’ (68th).
Some (very) preliminary conclusions about the BMSat data that *may* pertain to the prints they describe arise from this work. In no particular order these are:
- there are many descriptive phrases on dress;
- there is a consistent and pronounced royal bias;
- there is a decline in clear, repeatable mentions of noted people (or at least a single public figure in Walpole). This is replaced by the rise of John Bull and – presumably – a variety of people, names, and sobriquets;
- ‘young woman’ is a prominent category that appears to grow over time;
- for FGS, there was either a collecting bias around the South Sea Bubble, a catalouging bias towards describing sections of prints that pertain to the South Sea Bubble, or a genuine explosion of print making on South Sea Bubble;
- there are many prints that contain people in costume;
- with particular regard to the 4gram analysis (FGS: FGS4gramtop250.txt; MDG: MDG4gramtop250.txt), the month of publication (for context, the description of each print in BMSat usually contains a date, with day/month included where present) is prominent in FDG but not FGS, suggesting that in the later period prints were more ephemeral and to functioned in more time sensitive marketplaces. Of course, there is also a legislative angle to this: first the 1734 Engraving Copyright Act and latter the 1777 Print Copyright Act required the publication of a publisher’s name and address and a publication date (unless I’m mistaken…) to protect a design from plagiarism. Either way, laws are social instruments, and the 1734 and 1777 acts can be seen as fail-safes as much as rules Georgian Londoners strictly policed and adhered to. In short, the placing of marks of an ephemeral nature on prints may have had legal origins, but this custom could have wider and more interesting social and cultural consequences;
- with the emphasis on the South Sea Bubble, Walpole, kings, dukes, and parliament we might conclude that these political prints were in the majority. But this isn’t true, for the overwhelming majority of satirical prints that were published, especially after around 1770, contained social satire. So we must proceed with caution, for bean counting of the kind I am undertaken – somewhat inevitably – foregrounds stock words and phrases, thus emphasising the people, places, events, and controversies that described rather than the variety present in the prints themselves.
So do I hear the prints or the cataloguers when I work with the data? The obvious answer is that I hear both. The reasonable answer, having prodded around a bit with the data, is that I’m hearing not so much of the latter that working with the former is impossible. In short, am confident that the cataloguers and their views do not overly impose themselves in an deleterious manner on the descriptions of the prints, much as – as we have seen – their choice of language inevitably does.
In the next post, I’ll follow up on a quirk mentioned above that has emerged from entering the corpus at the level of data (with, of course, good knowledge from my previous research of what samples of that data contains): that is gender. For the prominence of ‘young woman’ in the ngram sets is a tip of a rather interesting iceberg that allows us to dig a little deeper into the relationship between the description of Georgian designs in BMSat and Georgian satirical designs themselves.