Category Archives: DHums

Digital History and being afraid of being insufficiently digital

This blog is cross-posted from the Institute of Historical Research Digital History seminar blog

The A Big Data History of Music project uses metadata about sheet music publication to explore music history. The data the project uses comes from MARC records converted into tabular form with MARCedit. Inconsistencies in the data – inevitable with catalogue records created by people over long periods of time – were resolved with OpenRefine, the data ported back into tabular form (and – for intrepid – RDF/XML), and graphs built (for the most part) in Excel; graphs that show steep declines in score publication in Venice at times of plague (1576/7) and steep rises – smoothed against overall publication trends – of scores whose titles reference Scotland during the 1790s-1810s peak of the English invention of ‘Scottish’ identity. The use of bibliographic data in the Big Data History of Music project confirms existing suspicions, challenges established interpretations, and opens up fresh lines of historical enquiry. It is a project to be celebrated.

We might say that MARCedit and OpenRefine are hardly the most sophisticated of research tools. Both are tools that manipulate data through the use of Graphical User Interfaces (GUIs), visual interpretations of programmatic functions that a humanist could – given time – construct herself. We might say that tabulated data is hardly the most sophisticated of research data formats. It struggles to express multiple values in a given field (for example, multiple creators of a work) or the hierarchical relationship between fields (for example, a creator of a work and an editor of a work). And we might say that Excel is hardly the most sophisticated of research environments. Built around graphical input, it encourages a range of practices that are not machine readable (you can’t do a ctrl+f for bold text or for cells filled in yellow), suffers over time from compatibility issues that make visualisations from data tricky to reproduce, and struggles to handle massive datasets.

And so we perhaps have a mismatch. As historians, we celebrate findings that could – potentially – change the course of historiographical debate. As digital historians tapped into research software engineering and computational science, we wonder about the suitability, interoperability, and sustainability of the decisions made.

It is easy to get sucked into the latter perspective. And as research projects grow issues of suitability, interoperability, and sustainability must be thrust front and centre. But as we teased out during the Q&A that followed Stephen Rose’s excellent talk on the A Big Data History of Music project, we must not be afraid of being insufficiently digital. We must not be afraid of using GUI tools that may not be there tomorrow to get the job done, of using using data formats that suit our local and community needs to express our findings, and of using software environments that are not the epitome of best practice to interpret our work. For we are historians first and foremost, and for all that projects such as The Programming Historian (for which, I should add, I have written) do wonderful work getting historians from GUI to Python, from .xls to .xml, and from Excel to R, the latter must not impose themselves on our work at the expense of gaining deeper understanding of historical phenomena.

The Big Data History of Music project is a shining example of why digital history must not be afraid of being insufficiently digital. We look forward to seeing more projects pass through the Digital History seminar in coming months that embrace this spirit of getting stuff done, of making digital tools, data, and methods work towards enhancing our collective understanding of the past rather than the other way round.