The Digital.Bodleian project ran from November 2012 to October 2012, and is now in the process of being transitioned into a service. It has created a unified interface for accessing and interacting with the Bodleian’s archive of digitized content.
Our primary output is the new Digital.Bodleian site (currently in soft launch and undergoing performance testing), which will provide unified access to hundreds of thousands of digitized images from the Bodleian’s collections, together with metadata describing them.
Another outcome for us has been a significant increase in the sustainability of our digitized collections. With this project we are fundamentally moving away from a silo-ed, collection-based approach to access. We have tidied up our legacy metadata and managed to get data out of our closed systems and stored it in a more open and usable manner. In doing so we have moved our images from a storage system with no immediate access for staff or for end users, to one in which our content can be accessed immediately. We have made most of the images available under a UK Creative Commons ‘Attribution-NonCommercial-ShareAlike 3.0′ Licence (CC-BY-NC-SA), except where the rights are held by other individuals or institutions, and made our terms and conditions easier to read. We have also ensured that the project is properly documented, and that our metadata has been transformed into a single standard schema which we can use to drive discovery.
Each resource on Digital.Bodleian will be described using our custom extension of Dublin Core, but we will also retain the original untransformed metadata — for example, many of our legacy digitized collections make use of a MODS file in a METS wrapper — which will be stored together with the JPEG 2000 images of the digitized item in a uniquely identified [via UUID] package on our asset management system. These packages will be assigned persistent URLs and be accessible via a RESTful API. We will also make this data available through other APIs in order to facilitate contribution to Europeana and to other subject-specific collections. The Digital.Bodleian site will deliver them using Armadillo System’s iNQUIRE, which uses Seadragon Ajax [combined with a tile server] to deliver the images as tiles, much in the same way as Google Maps.
The main lessons learned have been described in another blogpost (see General lessons learned) which covers staffing and recruitment issues, metadata, bottlenecks, what to prioritize, performance issues, file formats, the separation of front and back ends, the use of open source and off-the-shelf software, playing to our strengths, and the importance of sufficient good-quality documentation.
We have also encountered the need to ensure consistent metadata quality and use and extend existing standards. This project has allowed us to review twenty years of creating metadata for digitized collections and to learn some important lessons about standards and granularity. For the future, it would be a good idea to restrict what people can enter into metadata elements using validation rules, particularly with regard to languages and dates, and make more use of authority controlled and cooperatively catalogued resources. This will change the way we create metadata for digitized collections going forward, and we already feeding the lessons learned from this project into the creation of a new digitisation workflow.
One of the fundamental next steps for us will be to survey our legacy projects and see which ones we can decommission as a result of this project. Some will have enough added value (either in the form of additional metadata, or the functionality of the site) that they will remain a distinct resource even if all of their data is included in Digital.Bodleian; others will be withdrawn.
System redundancy and robustness has been one of our biggest challenges. The image server that we have setup for Digital.Bodleian should become a micro-service for all of our collections and interfaces, but we need to ensure its robustness. The internal standards and methods developed for this project will form part of our working practice in digitization and the delivery of digitized resources in new projects.
Opening up these collections will allow us to engage with users in new ways and on a larger scale than ever before. The Armadillo iNQUIRE interface that we have implemented allows users to create both public and private notes and to tag images, building a folksonomy in the process. We will be monitoring this activity closely to see how much interaction takes place. Providing standard ways of harvesting our data will also open up new possibilities for collaboration and contribution.