Held on a large mass storage device in the corner of my office are (among other things) 49456 books, broken down into pdfs, ocr fragments, jp2 files, and reams of metadata. Most – if not all – of this information derives from books published in the nineteenth-century. The OCR, unlike some, is pretty good: even picking up the odd umlaut. The problem is that none of the data looks like pdfs, ocr fragments et cetera, but is rather filed as countless .dat files in series readable only to machines. Over the next few weeks I will be working with researchers and skilled technicians to restructure the data so that it is readable to people, so that power users – those, for example, who use R (ie not me!) – can interrogate this data, find out what patterns and mysteries are held within.
Of course me being me I took one look at the manifest, hit CTRL+F, and typed ‘Cruikshank’. After a little playing around with Excel macros 29 records came back: one Isaac Cruikshank entry for his work with George Woodward on the marvelous 1796 Eccentric Excursions (this a 1807 edition), the rest related to George Cruikshank – from his early radical work with William Hone circa 1819-1820, to his later illustrative pieces for William Harrison Ainsworth (1805-1882) and for the The Ingoldsby Legends. I’m not sure quite what I plan to do with these books yet, but once we’ve restructured the data I might start by comparing the Cruikshank plates between editions: to see what, if anything, I can find out about the re-use of plates and modularity of these texts.
As all this should make clear, my day-to-day is now rather different to what it has been for the last 6 months. I’m now a Digital Curator at The British Library and so alongside posting on the BLs Digital Scholarship blog, this blog will change to reflect this (exciting!) new arrangement.
You must be logged in to post a comment.