Those who follow my Twitter feed may have noticed that I’ve been grappling with business directories and georeferencing recently. Today I finished mapping the Kent’s Directory for the Year 1794 and Post Office Annual Directory 1808, and thought I’d share my results.
First, a little context. My research at present is focused on trying to understand the networks of businesses which were auxiliary to the satirical print trade. These include the trade in physical materials such as paper, copper, and paint, as well as service industries such as engraving, printing, printsellers, and stationers. One way to better understand all this is to mine London business directories – Julian Hoppit, for example, does so with great success in his Risk and failure in English Business 1700-1800 (1987). Unlike Julian however I am not interested in all the data these directories contain, and as I wanted to quickly mine these rich resources – rather than manually browsing them – I set about finding digitised versions.
A digitised of Kent’s Directory was found on a family history website. As the site was devoid of information regarding its digitisation, I was initially skeptical about using it. However having cross-referenced a sample against physical directories held at the IHR, it seemed accurate, so I choose to proceed. Kent’s, like most directories of its sort, lists information in a standard format – Surname Forename, Type of Business, Street Number, Street Name. Having stripped the data from the webpages I set about selecting what I wanted to keep. To do this I first converted the data into a CSV (Comma-separated values) file. For the majority of records in Kent’s commas separated the categories I wanted, namely – Name of Proprietor, Type of Business, Street Number, Address. A typical record, then, looked like this: Chettle Sarah, Oil & Colour Shop, 45, Long acre. As I am only interested in information relating to businesses auxiliary to the satirical print trade, I sorted the database by the second value – Name of Proprietor – and extracted to an edited database those business names which contained keywords of interest – ‘Colour’, ‘Copper’, ‘Engraver’, ‘Paper’, ‘Plate’, ‘Print’, ‘Rag’, ‘Stationer’. At this point I little discretion was used to omit record I didn’t want – such as ‘Tinplate-workers’ or ‘Calico-printers’. A large number of records, however, did not fit the standard format, with commas appearing in the proprietor name (eg Sage, Rawdon, & Atkinson), types of business (eg Carver, Gilder & Print-seller) or address (eg Curtain-road, Shoreditch). Through some careful searching and browsing of the remaining information, further records were copied to the edited database and commas incompatible with the CSV format removed. I then had a clean database of 340 records representative of the businesses I wanted to analyse.
The Post Office Annual Directory 1808 had not been digitised in this manner, but a scanned and OCRd copy was available at Historical Directories. I am aware of problems regarding the ratio of words transcribed accurately by OCR and therefore proceeded with caution (for reference a recent and typically superb post by Ted Underwood covers OCR – and many more issues – regarding textmining). Having conducted some initial searches and consulted the website’s excellent technical FAQs, I was confident I could get something meaningful from keyword searching. Using broadly the same keywords as with Kent’s (though with more variations to take account of the OCR – so I searched for ‘Colour’, ‘Colourman’, and ‘Colourmen’, as well as ‘Stationer’ and ‘Stationers’), I manually – yes manually – typed out the information into a CSV file. This was time consuming, but – 391 records later – was worth it.
Alongside this work, I had begun playing with Google Fusion Tables (see posts here and here). GFT has the ability to georeference tabulated data – such as CSV files – from basic address information. An excellent tutorial on GFT can be found here. In order to make my georeferenced data meaningful, I decided to cluster my Type of Business into the more generic categories ‘Colour’, ‘Copper’ (sellers and plate makers), ‘Engraver’, ‘Misc’, ‘Paper’ (rag collectors, makers, and wholesalers), ‘Stationer’, ‘Printer’ (on copper, mezzotint, or similar, but not books), and Printseller. Where two of these types were present, the first used in the specific name became the generic business type – so an ‘Engraver and Printer’ became an ‘Engraver’. This didn’t of course capture all the nuances of the data – such as printsellers who also were printers, and stationers who were also rag merchants – but I figured that these details could be looked at into further at a later date, when I came to analyse the data closely. Each generic type was then given a visual marker (a visualisation of the markers GFT recognises can be seen here). Finally I merged the location information into one column, adding at the end ‘London’ to ease the georeferencing.
I was now ready to georeference. As I said, GFT can generate georeferenced points from address information – which I had – but two factors made this less simple than it might be for modern data. First, London had changed, and second, the way London was described had changed. Thankfully I had a number of ways of weeding out georeferencing errors. The most obvious was just looking at the georeferenced map – for example, if a data point was outside of the boundaries of late-Georgian London (such as in Canada…), the georeferencing had failed. In addition to this GFT handily highlights in yellow those cells in a ‘location’ column which it could not georeference. Once more then I had some manual work to do, though with some Googling and cross-referencing of place names with Locating London’s Past I was able to pin down (broadly speaking) where businesses had been with an acceptable margin of error.
The result of all this is two maps, snapshots of which are displayed below. At this stage I should note that there are still a few errors in the dataset. I have also removed the stationers from both maps, simply because there volume obscures some more interesting looking trends (I probably shall come back to stationers in a latter post).
To be honest, I don’t know exactly what to make of this yet. There are some established trends noticeable in the most: most obviously the expansion of London to the north, south, and east. More interesting for me is the spread of the ‘Oil and Colourmen’ to the new north-west of London, especially to Tottenham Court Road, and the general growth of ‘Engravers’ as a type of trade. It is also interesting to note the emergence of businessmen describing their trade as ‘Copper-plate printer’ (small blue ‘p’) in the 1808 Directory. This may – and I need to do more work on this – be a quirk in the naming of businesses between the rival directories, but if not it potentially signifies either an increase in gravitas for the copper plate printing trade or the existence of a marketplace strong enough to support tradesmen whose sole business was copperplate printing.
Either way, the maps make – I believe – interesting reading, though there is much data cleaning and context setting to do before I can start using them to make any big claims. I also need to work out how to import this all into Google Earth and place the data onto a historic map (suggestions welcome).
I’ve prefixed this post with ‘Week 1’. This is to recognise the fact that today marks the end of the first week of my Postdoctoral Fellowship with the Paul Mellon Centre for Studies in British Art. I won’t make any rash promises about writing a weekly research update over the coming 6 months, but I do intend to blog regularly along the way. I look forward to your comments on my adventures.