Over the next few months I’ll be working on a dataset of metadata for journal articles categorised under the category ‘History’ from the last 50 years (or so). Having conducted some tests over the summer (including some work with Paleontology articles), the aim of this research is to ascertain what trends emerge when we read from a distance, to use Franco Moretti’s parlance, a large corpus of this kind.
My plan is to extract from the article titles a set of the highest occurring words (perhaps 50) over the total period, and then to construct bimodal networks looking at the occurrence of those words per journal title over time in decadal chunks. I will then compare these networks with networks derived from decadal lists of the highest occurring words.
From this I expect to find the rise of cultural history and gender studies. If these obvious trends emerge I’ll be happy that my data can be of some use, and so from there I hope to dig a little deepper: for example, to say something about how, if at all, journal titles have changed as a result of needing to be discoverable using web searches.
As I write, Open Refine is munching through the RDFs. Once that is complete I’ll periodically post updates on my progress here and on the Digital Scholarship blog.
In the meantime, if you want to dig through the data yourself, it can be found in the Shared Open Research Resources tab on the right. Though not hosted publicly by my employer, the data is shared under CC0. So in short you can do with it whatever you want.