source analysis | cradledincaricature

This post was originally published at the Software Sustainability Institute blog.

In 2017 the House of Lords Science and Technology Committee opened an inquiry into forensic science. The inquiry is still open and has fours areas of focus: the forensic science research landscape, the use of forensic science in the Criminal Justice System, standards and regulation, and digital forensics. Alongside this, the inquiry seeks to examine a range of crosscutting issues:

what new research programmes are needed in forensic science; the level of understanding within the criminal justice system and what routes are available to improve understanding; the performance of the market for forensic services in the UK; and the detection, recovery, integrity, storage and interpretation of evidence from digital devices and networks.

On 23 July 2018 the committee launched its call for evidence. After receiving 103 items of written evidence, oral evidence sessions were organised, and between October 2018 and January 2019 members of the Committee heard from lawyers, law enforcement professionals, scientists, members of the judiciary, home office officials, regulators, and research funders – among others – on the state of forensic science.

No historians, literary scholars, media archaeologists, or archivists gave evidence at these sessions. This is not the fault of the Committee, for none – as far as I can tell, and I’m happy to be corrected! – submitted written evidence. I certainly missed it. But given that one aspect of the inquiry concerns digital forensics, the inquiry is relevant to those disciplines: to contemporary historians and literary scholars who – often in collaboration with archivists – study society and culture from the late-1980s onwards and are beginning to examine primary sources preserved using digital forensics; to media archaeologists who are using digital forensics to recover, replay, and revive archaic software systems; and to archivists who have – for some time – been using digital forensic techniques to process, understand, and secure our shared heritage, a heritage that is increasingly born digital, contained in documents or file containers made on digital devices and intended for use on or with digital devices (Thorsten Ries, ‘The Rationale of the Born-Digital Dossier Génétique: Digital Forensics and the Writing Process: With Examples from the Thomas Kling Archive’, Digital Scholarship in the Humanities, 2017, doi.org: 10.1093/llc/fqx049; Matthew G Kirschenbaum, Track Changes: A Literary History of Word Processing (Cambridge, Massachusetts: Harvard University Press, 2016); Victoria Sloyan, ‘Born-Digital Archives at the Wellcome Library: Appraisal and Sensitivity Review of Two Hard Drives’, Archives and Records 37, no. 1 (2016): 20–36, doi: 10.1080/23257962.2016.1144504).

This two-part blog describes six themes of the oral evidence sessions that are relevant to my home discipline, History. I focus on the evidence given pertaining to digital forensics: a branch of forensic science concerned with the recovery and investigation of material found in digital devices. In so doing, I make the case for historians to forge closer links with disciplinary and professional areas who did respond to the inquiry’s call for evidence. I suggest that by making the case for the value of our perspective on digital forensics, historians can usefully contribute to future policy work in this field.

In this first blog I discuss the three most common themes of the oral evidence sessions: the volume of born digital material, the variety of born digital material, and the systems that produce born digital materials.

Volume

A central theme of the oral evidence sessions was the scale of the digital artefacts produced both by contemporary society and the criminal justice system. A significant majority of all evidence used in criminal inquiries is now digital. This, as noted by David Ormerod (Chair in Criminal Law, UCL), means that scale is a common paradigm through which the police, the accused, lawyers, judges, witnesses, victims, jurors, journalists, and record keepers interact with the criminal justice system:

The risk is that the investigators and subsequently the trial then drown in the data. There was a case recently where 53 terabytes of data were seized in a trading standards case — it was not a high-end, Serious Fraud Office prosecution, but a trading standards case. (6 November 2018, 16)

Such volume of evidence has, according to Chief Crown Prosecutor Adrian Foster, knock-on effects on how these actors deal with evidence:

The average mobile phone is a 32-gigabyte device which, if it is well used, will probably have around 4 million pages of data on it. I think that the new Apple phone is around 256 gigabytes while the new Samsung is four times that. There is an ever-increasing amount of data on mobile phones, so an understanding of what can be achieved by a download from a phone is crucial. The Metropolitan Police recently had two cases, one of which was a rape case involving mobile phones and facebook. It took 630 hours of police time to look at the phones and facebook accounts of the three complainants under judicial order to make sure that everything had been checked. There was also a straightforward rape case involving a complainant who had met the defendant on Tinder. Both of them had a mobile phone and again it took 150 hours of officer time to go through them. (30 October 2018, 4)

Historians, therefore, will have greater volumes of primary source material to wade through. This is not a new observation. The web archives community has been banging this particular drum for some time (Jane Winters, ‘Breaking in to the Mainstream: Demonstrating the Value of Internet (and Web) Histories’, Internet Histories 1, no. 1 (2017), doi: 10.1080/24701475.2017.1305713; Ian Milligan, ‘Finding Community in the Ruins of GeoCities: Distantly Reading a Web Archive’, Bulletin of IEEE Technical Committee on Digital Libraries 11, no. 2 (2015)). But given the sheer volume at play, historians will also need to pay greater attention to archival labour – the traditional filter between then and now – and how appraisal is changing in light of the necessity to make decisions about permanent value in an age of abundance (see for example, the special edition of Archives and Records on ‘Born Digital Description’, edited by Jenny Bunn and Sarah Higgins, and published in 2016). Historians will also need to pay attention to the changing behaviours of other professionals. For example, in the case of the criminal justice system, the crime historian will want to know why ‘150 hours of officer time’ were needed to correctly process two mobile phones in a ‘straightforward’ case, and whether in turn volume is creating challenges both for finding evidence and for maintaining evidential integrity.

Variety

After volume, variety was the second most common theme of the oral evidence sessions that discussed digital forensics. Historians are well used to dealing with a variety of primary sources – letters and banners, novels and badges, account books and photographs, email and web pages – but the digital evidence described with respect to recent criminal investigations points to our digital society creating not only new categories of evidence for the historian to conceptualise, but also historically specific categories of evidence.

According to Sarah Morris (Digital Forensics Unit, Cranfield University):

The range of devices that we are asked to analyse [using digital forensics] is also growing. It can be anything from a computer to a games console to a washing machine to a car. There is this wealth of devices, each with different challenges and each with different types of information we can get and different meanings. (9 October 2018, 16)

In a later session, David Ormerod added to that list:

the messages from the complainant’s and defendant’s mobile phones; the dashcam footage from public and private vehicles going by; the CCTV evidence; the armband/FitBit-wearing defendant who can demonstrate that his heart rate was not escalated at the time of the robbery, and so on. (6 November 2018, 16)

What the evidence sessions make clear is that the born digital evidence historians will soon – and some cases, already do – have at their disposal will not necessarily map to skeuomorphic digital representations of older physical forms: letters as emails, memos as word documents, account books as spreadsheets, books as ebooks. One particularly important category of ‘new’ primary source are those stored by computational and networked devices in the course of our interactions with them: logs held by washing machines, friend lists kept on games consoles, pulse records collected by wearable activity trackers. Our expertise in adapting to and synthesising a variety of primary sources make historians ideally placed to conceptualise the evidential character and the potential ethical implications of using these data (more on which next time).

Systems

But the job of the historian working with archives secured by digital forensics is not just about dealing with more and different. It is also about authenticity, about knowing if the primary sources we are working with are what they purport to be. This isn’t a new problem. Core components of source analysis – palaeography, object inspection, contextualisation – are, in part, used by historians to test the authenticity of primary sources. What changes with born digital materials is what we need to know to do this sense checking: the imperative moves from historians knowing how a book works to how a .docx file works, from how a filing system works to how an operating system works. According to Sarah Morris:

For example, there is a standard definition for things like a recycle bin on a computer. It is where a user deletes files but how it works underneath can vary depending on whether you are looking at a Mac or a Windows operating system. That is an understanding of an evidence base where the definition is standard but underneath we have different meanings which could result in different user activity. There is not a great understanding of the difference between their function and what the artefacts underneath represent. (9 October 2018, 6)

What this suggests is that the ability to know what a born digital primary source is, can depend on the operating system on which it was produced and used. In turn, knowledge of those systems is required to ascertain the authenticity of a primary source. Helpfully, these systems do produce metadata – data about data – that enable us to have confidence in when a born digital primary source was made, last modified, and by whom. Sarah Morris again:

With regard to the evidence base, when people try to manipulate artefacts or obscure their data, digital forensics are now starting to look more at context. It would therefore not be just, there is a file containing a potentially indecent image, but can we tell if the user downloaded it and where they may have got it from? That context makes it very difficult to start obscuring or manipulating the data because you can see if there are breaks or anomalies in that chain. (9 October 2018, 6)

Metadata, in short, is our friend. But the problem is that we often need to know how systems make metadata for when digital files were downloaded, last modified, and by whom, in order to understand the results. For example, depending on the system used to create it, a date-time stamp on a file may not be trustworthy, particularly if that device has travelled between timezones. Despite this relationship between systems knowledge and file authentication, the evidence sessions report that digital forensic processing is moving towards the application of generalisable methods that obfuscate the peculiarities of systems. According to Jan Collie from Discovery Forensics Ltd:

I was speaking to some [police officers] only last Wednesday. What I am seeing in the field is that regular police officers are trying to be digital forensic analysts because they are being given these rather whizzy magic tools that do everything, and a regular police officer, as good as he may be, is not a digital forensic analyst. They are pushing some buttons, getting some output and, quite frequently, it is being looked over by the officer in charge of the case, who has no more training in this, and probably less, than him. They will jump to conclusions about what that means because they are being pressured to do so, and they do not have the resources or the training to be able to make the right inferences from those results. (27 November 2018, Session 1, 3)

This has clear implications for the historian. Our ability to feel confident in the authority of a primary source will require knowledge of the ‘whizzy magic tools’ used by those who mediate between us and the production of the digital file: the police officer, the archivist, the digital forensic analyst. And in order to do that, we’ll need to gain expertise in the operating systems on which our primary sources were produced and used, a task certainly different to what we are used to, but far from outside the scope of our training.

This evidence suggests that the volume and variety of born digital materials, as well as the systems used to make them, create both challenges and opportunities for the historian. In Part Two of this blog series, I will discuss three less common, but no less important, themes of the forensic science inquiry oral evidence sessions: the temporal fluidity of systems that produce born digital materials, gaps in the record, and ethical digital forensics. It is here where the historian’s toolkit has, perhaps, the most to offer other professions engaged in digital forensic science.

cradledincaricature

Tag Archives: source analysis

Digital Forensics in the House of Lords: six themes relevant to historians (Part One)

Volume

Variety

Systems

…some thoughts on digital history, cartoons, and satire.