Digital Humanities Australasia 2018

A hands-on data exploration & challenge to become a derived data-set author on the British Library’s open data-set platform ( (132)

Mahendra Mahey 1
  1. British Library, London, United Kingdom
  • Do you want to understand some of the challenges of working with cultural heritage data in a large national library such as the British Library?
  • Do you want to explore and get some 'hands-on' experience of working with the British Library’s digital collections and data?
  • Do you want to leave a ‘legacy’ of being a data-set author/creator/curator on the British Library’s data-set platform?
  • Do you have some digital literacy in using familiar data exploration tools such as Microsoft Excel (see 'GUIDANCE FOR THIS WORKSHOP' below)?

If the answer is 'Yes' to any of these, then this workshop could be for you!

Mahendra Mahey, manager of British Library Labs (BL Labs) will examine some of the BL’s digital collections/data & discuss challenges he has had in making the BL's cultural heritage data available openly or onsite at the British Library.

Mahendra will invite delegates to explore data-sets at their leisure, setting a challenge for those who are interested, skilled in exploring, finding patterns and grouping data. They could become data-set authors/creators of derived data-sets, based on pre-existing digital collections/data provided on the day or already available on

The workshop will conclude with reflections from the delegates and possibly highlighting a number derived data-sets that were generated by participants on the day that could now potentially exist on If selected, these new derived data-sets will be attributed with the creators' / authors' details and each will have its own cite-able Digital Object Identifier (D.O.I). These new data-sets would then be available for reuse by any researcher in the world.


We strongly recommend you come to this workshop with an appropriate device such as a laptop pre-installed with appropriate tools to analayse different kinds of data-sets, e.g. Microsoft Excel may work with smaller data-sets such as metadata (see other data exploration tools below). If you don't have one, and would still like to attend, please request to 'pair up' with someone who is willing to share and has already signed up.

Other data exploration tools include: Notepad++ (e.g. for viewing text and XML); Open Refine (e.g. for cleaning data); Tableau Public (e.g. for visualising data); Google Fusion Tables (e.g for visualising geo-spatial data); Spacy (e.g. for text and data mining), RStudio (an open source Statistical package), MATLAB (data analysis tool) & NLTK (Natural Language processing). 

Please note that this workshop is NOT about training you in using any of these tools, just tools you may be already familiar with to explore and find patterns in our data.

Datatypes you may be examining in this workshop could include: .ZIP, .PDF, .TXT, .CSV, .TSV. .XLS, .XLSX, RDF, .nt, XML (TEI, ALTO and bespoke), .JSON, .JPG, .JPEG, .TIFF and .WARC

Please ensure you are able to read these files on your device before the workshop if you are interested in exploring them during our session.

Slides for session:

URL for specific data:

Mahendra Mahey tweets at @BL_Labs & @mahendra_mahey