So contemporary historians, here is the scenario. You are interested in some aspect of life since the 1980s. You have all the usual sources: personal papers, newspapers, official/corporate archives, pictures, books, radio, music, television shows, et cetera. If you are looking at life after 1996, after the boom in the public web, you can also add web archives into the mix. But one of those media types – personal papers – is in decline. Not that people aren’t writing things of importance, but the use of paper as a form to draft those things is in decline. In many cases people aren’t using pen and ink or a typewriter, but are sitting at a personal computer – as I am right now – and drafting those things in a word processor. Alongside those digital personal papers are all sorts of things people save to their personal computers: pictures, books, radio, music, television; you get the picture.
So, to research life since the 1980s, collections of things held on personal computers (that is from PCs to laptops to tablets to smartphones), let us call them personal digital archives, are in scope. And yet – as every contemporary historian knows – privacy is an issue here. Sure you can study things that appear in public but personal things, private things, are often off limits as a result of data protection and the like. And you can guarantee those hard drives are going to have some juicy personal, private details. So, with a heavy heart, you write history without them. Great history. But perhaps not the history you would have liked to write.
What if there was something you could use? What if you could understand the hours someone spent using a personal computer, how they arranged their files, their patterns for editing documents, their choice of software, their downloading habits? And what if you could do all of that without needing to see the personal documents themselves? (of course, filenames are important, often very personal, and possibly subject to data protection, but they they certainly aren’t as personal)
In short what would you be able to glean from the sort of metadata available to download in .csv format here? (for the sake of clarity this is metadata for my USB stick, captured using BitCurator for demonstration purposes). Imagine that this data schema was used to represent metadata for a hard drive packed with personal papers, newspapers, official/corporate archives, pictures, books, radio, music, and television shows; just like your hard disk no doubt. What would you be able to do with it? Could you imagine it as a research object in its own right?