All posts by jwbaker

James Baker is Director of Digital Humanities at the University of Southampton. James is a Software Sustainability Institute Fellow, a Fellow of the Royal Historical Society, and holds degrees from the University of Southampton and latterly the University of Kent, where in 2010 he completed his doctoral research on the late-Georgian artist-engraver Isaac Cruikshank. James works at the intersection of history, cultural heritage, and digital technologies. He is currently working on a history of knowledge organisation in twentieth century Britain. In 2021, I begin a major new Arts and Humanities Research Council funded project 'Beyond Notability: Re-evaluating Women’s Work in Archaeology, History and Heritage, 1870 – 1950'. Previous externally funded research projects have focused on legacy descriptions of art objects ('Legacies of Catalogue Descriptions and Curatorial Voice: Opportunities for Digital Scholarship', Arts and Humanities Research Council), the preservation of intangible cultural heritage ('Coptic Culture Conservation Collective', British Council, and 'Heritage Repertoires for inclusive and sustainable development', British Academy), the born digital archival record ('Digital Forensics in the Historical Humanities', European Commission), and decolonial futures for museum collections ('Making African Connections: Decolonial Futures for Colonial Collections', Arts and Humanities Research Council). Prior to joining Southampton, James held positions of Senior Lecturer in Digital History and Archives at the University of Sussex and Director of the Sussex Humanities Lab, Digital Curator at the British Library, and Postdoctoral Fellow with the Paul Mellon Centre for Studies in British Art. He is a member of the Arts and Humanities Research Council Peer Review College, a convenor of the Institute of Historical Research Digital History seminar, a member of The Programming Historian Editorial Board and a Director of ProgHist Ltd (Company Number 12192946), and an International Advisory Board Member of British Art Studies.

Digital History and being afraid of being insufficiently digital

This blog is cross-posted from the Institute of Historical Research Digital History seminar blog

The A Big Data History of Music project uses metadata about sheet music publication to explore music history. The data the project uses comes from MARC records converted into tabular form with MARCedit. Inconsistencies in the data – inevitable with catalogue records created by people over long periods of time – were resolved with OpenRefine, the data ported back into tabular form (and – for intrepid – RDF/XML), and graphs built (for the most part) in Excel; graphs that show steep declines in score publication in Venice at times of plague (1576/7) and steep rises – smoothed against overall publication trends – of scores whose titles reference Scotland during the 1790s-1810s peak of the English invention of ‘Scottish’ identity. The use of bibliographic data in the Big Data History of Music project confirms existing suspicions, challenges established interpretations, and opens up fresh lines of historical enquiry. It is a project to be celebrated.

We might say that MARCedit and OpenRefine are hardly the most sophisticated of research tools. Both are tools that manipulate data through the use of Graphical User Interfaces (GUIs), visual interpretations of programmatic functions that a humanist could – given time – construct herself. We might say that tabulated data is hardly the most sophisticated of research data formats. It struggles to express multiple values in a given field (for example, multiple creators of a work) or the hierarchical relationship between fields (for example, a creator of a work and an editor of a work). And we might say that Excel is hardly the most sophisticated of research environments. Built around graphical input, it encourages a range of practices that are not machine readable (you can’t do a ctrl+f for bold text or for cells filled in yellow), suffers over time from compatibility issues that make visualisations from data tricky to reproduce, and struggles to handle massive datasets.

And so we perhaps have a mismatch. As historians, we celebrate findings that could – potentially – change the course of historiographical debate. As digital historians tapped into research software engineering and computational science, we wonder about the suitability, interoperability, and sustainability of the decisions made.

It is easy to get sucked into the latter perspective. And as research projects grow issues of suitability, interoperability, and sustainability must be thrust front and centre. But as we teased out during the Q&A that followed Stephen Rose’s excellent talk on the A Big Data History of Music project, we must not be afraid of being insufficiently digital. We must not be afraid of using GUI tools that may not be there tomorrow to get the job done, of using using data formats that suit our local and community needs to express our findings, and of using software environments that are not the epitome of best practice to interpret our work. For we are historians first and foremost, and for all that projects such as The Programming Historian (for which, I should add, I have written) do wonderful work getting historians from GUI to Python, from .xls to .xml, and from Excel to R, the latter must not impose themselves on our work at the expense of gaining deeper understanding of historical phenomena.

The Big Data History of Music project is a shining example of why digital history must not be afraid of being insufficiently digital. We look forward to seeing more projects pass through the Digital History seminar in coming months that embrace this spirit of getting stuff done, of making digital tools, data, and methods work towards enhancing our collective understanding of the past rather than the other way round.