BDLSS logo Pliny the Elder: Naturalis Historia

BDLSS Blog

Bodleian Digital Library Systems and Services

As part of our 2nd advisory board meeting, we discussed plans for user testing. Dr. Monica Bulger who is in charge of user testing for the project recommended a combination of two standard methods for user testing: online survey and individual think aloud interviews. Both would target current users, although a suggestion was made to also recruit people more broadly, possibly by standing in front of the Bodleian, to attract new users to the website.

For the online survey, users would be recruited from the top 5 Bodleian sites, starting with the Digital Bodleian (digitalbodleian.ox.ac.uk) and Oxford Digital Library (odl.ox.ac.uk) domains. When a user first lands on one of these pages, he/she would receive a pop-up inviting them to participate in using our new system and responding to a brief online survey. We would then record user activities through the server logs; when the user is finished, he/she would click a link to complete a 5-10 question survey. The survey would ideally include demographic questions such as gender, age, frequency of use, affiliation with Oxford/role, and purpose for use. The following list represents the types of access and experience questions we would include:

  • What, if anything did you find frustrating in your experience today?
  • Was there anything you wanted to access, but couldn’t? If so, why?
  • What features did you like?

The online survey would run for a two week period, with a conservative target of 500 users, however, given results of past Bodleian surveys, we can expect 2,000-3,000 responses.

As a possible complement to the online survey, would be to invite a small number of users (less than 10) to participate in a think aloud interview. Users would be the same websites listed above and, if affiliated with Oxford, would be asked if we could do a site visit. Sessions would be scheduled for 20 minutes, with the option to extend to an hour. Participants would be asked to talk us through their user experience and prompted to describe their experience, particularly areas of frustration or confusion.

These combined techniques would provide valuable feedback in terms of user expectations and experience. The think aloud protocol would provide more in-depth information, while the online survey would sample a large breadth of users.

User Testing Plan for Digital.Bodleian

(The detail for the overview below will be be fleshed out in a series of more detailed blogposts over coming week.)

The user facing side of digital.bodleian is iNQUIRE, a product being developed by Armadillo Systems in consultation with the Bodleian. iNQUIRE’s front end is an ‘ajax‘ type javascript web application which enables free text searching, faceted browsing, and deepzoom display of images. It also enables users to: add images to their own collections (which can be shared with others); annotate images with private notes; add public tags which can be seen by others.

The advantage iNQUIRE has for the Bodleian over other similar products is it will sit on top of our image repository/digital asset management system, indexing metadata and displaying images directly, rather than storing the images or metadata in a closed or application-specific system.

While the iNQUIRE front-end is a commercial Armadillo developed product, the backend makes use of open-source technologies for search, indexing, and image display, and Microsoft’s SQL Server for application-specific data.

Server Components:

Apache Solr 3.5

(for search/indexing/faceting.)
The Bodleian already has in-house experience within BDLSS with Solr so we expect to be able to make use of the digital.bodlian Solr index in other applications. Within Solr we are using a DC-based schema for searching and indexing, and we are writing data handlers to enable the ingest of metadata from our legacy image collections into this Solr schema.

N.B. while we are using a relatively sparse DC-based schema to drive the faceting and searching within Solr, the original source metadata, which may be richer, will still be linked from within the web interface and will be searched in full by the index.

Djatoka/Tomcat

(for serving up the jpeg2000 images to the Seajax DeepZoom component.)
Existing image collections are being converted (using Kakadu) from their existing formats — largely a mixture of Group 4 compressed bitonal tiffs, and uncompressed 24-bit colour tiffs — to lossless jpeg2000 files. Djatoka then serves up these jpeg2000 images as part of the iNQUIRE web-interface.

MS SQL Server

iNQUIRE makes use of MS SQL Server to store data generated in the iNQUIRE interface such as user collections, tags, and comments, and also for the management and administration of the interface.

N.B. Bibliographic metadata is retained on our systems in open formats. Solr indexes the metadata and iNQUIRE provides links back to this bibliographic metadata. MS SQL is used only for iNQUIRE application-specific content.

Workflow:

  • We are moving our images from the tape based University HFS system to a digital asset management system [developed in-house, but somewhat Fedora-like: http://en.wikipedia.org/wiki/Fedora_(software)].
  • We are creating UUIDs and persistent URLs for each object, and for each object storing an RDF manifest for the object (for an early example, see here), a lossless jpeg2000, a copy of the original metadata (typically as XML — for a METS/MODS based example, see here), and a cut-down and standardised DC version of the metadata which we can use for OAI-PMH and other services. Our asset management system provides a RESTful interface to this content, and the same persistent URLs (for images and metadata) that are being fed into iNQUIRE will also be used by other applications.
  • For each collection we are creating XLSTs to transform the existing metadata to DC, and creating Solr data import handlers to index either the derivative DC metadata, or the original metadata where possible. The original metadata is stored with the object, and linked from iNQUIRE. In some cases we expect to enrich the metadata we have, or make use of existing catalogue resources as part of the metadata conversion and ingest process.
  • Once the metadata is indexed in Solr, and the images provided from our asset management system via the Djatoka/Tomcat server to the iNQUIRE deep zoom component, the content will go live in the iNQUIRE interface.

Please see the attached scan for a hand-drawn sketch [from an internal Bodleian meeting] of the digital.bodleian architecture.

Sketch of Digital.Bodleian Architecture

Progress:

Over the following weeks we will make blogposts describing progress for each of the stages in our workflow, and outlining some of the technical challenges faced and solutions adopted.

  • Comments Off

Alexander Huber, BDLSS Metadata Services Co-ordinator, attended the recent EuroCRIS task group meeting on CERIF held in Bath. Our first foray into using CERIF will be as part of the JISC funded Damaro project. We will be exposing metadata from the research data catalogue, being developed as part of the Damaro project, using CERIF.

  • Comments Off

This 9 month post will be primarily dedicated to the Digital.Bodleian project, a JISC-funded project that aims to aggregate and open up, through both a graphical user interface and through open linked data, the Bodleian Library’s substantial collection of digital assets.

You will be a member of the team within Bodleian Digital Library Systems and Services (BDLSS) that works with the applications and content in the DAMS. You will work closely with the Imaging Systems Developer and the DAMS Developers to help support and organise the contents of the DAMS.

You will work with the imaging systems developer, metadata specialist, and other technical staff to gather together images and metadata from a range of existing Bodleian systems, standardise the images and metadata, and ingest this content into new storage, and the resource discovery system running on top of this storage. You may also be asked to perform other technically similar tasks within existing workflows.

You will need to be familiar with Windows and Unix file systems, at least at the level of file copying/moving/renaming on a large scale, and with the running of simple shell scripts and Windows batch files. You should also be familiar with databases and with XML, as well as possessing a working knowledge of existing library catalogue resources, bibliographic metadata and metadata standards. You will have good qualifications and/or experience working with metadata on a large scale and in an IT-related environment.

Only applications received before midday on Monday 27 February 2012 can be considered. Interviews will be held on Tuesday 6 March. You will be required to upload a supporting statement as part of your online application.

For more information and to apply click here

  • Comments Off

One of the first decisions we have had to make regarding the installation of the iNQUIRE interface for digital.bodleian has been about logins – when and how.

In order to take advantage of some of the web2 features of the site—commenting, tagging, and building your own collections—users will have to login. Our first decision about this was fairly clear-cut. We didn’t want users to have to login until they engaged with these features. It was extremely important to us that the general public be able to come to the site and browse and search the collections anonymously. Nothing irks me more than following a link and being prompted immediately with a login prompt or having to create an account before even knowing if I want. I usually don’t bother.

The next thing we had to decide was what means we would give users for logging in and creating accounts. Ideally for Oxford users we would use our own local authentication system, but for now this is Ox-only and so not idea. iNQUIRE comes out of the box with the OpenID built in so we have decided to go with that. One of the benefits will hopefully be for our outreach—if you’ve logged in with your Google profile, it should be fairly seamless to tell your Google+ circles about something. I do have to admit some reservations, though, about making it so easy for people to share their information and behaviours with commercial companies, so I am pleased that OpenID also has an option for creating credentials just with them.

I would be interested to know other people’s experiences with this – have you used OpenID before for your collection? Do you administer credentials yourself?

The new Oxford University Research Archive (ORA) interface can be found at http://ora.ox.ac.uk/

Blueprint, the magazine for staff at the University of Oxford, also features an article on ORA called ‘Get your grey matter noticed’ in the January 2012 issue – http://www.ox.ac.uk/document.rm?id=2265

On Mon 16 Jan, the IBBA Project is holding a Semantic Specification Workshop at Merton College. The purpose of this workshop is to engage stakeholders in a discussion about cataloguing. The Specification will serve both as a reference for collaboration between project partners and as the basis for further development, perhaps with other partners and contributors in future. More information can be found via twitter #ibbaworkshop @BodleianBallads.

Bodleian Libraries
DAMS Software Engineer – 2 posts (DaMaRO / SoftwareStore Projects and Broadside Ballads Archive Project)
Bodleian Digital Library Systems & Services, Osney Mead, Oxford
Grade 7: £29,099 – £35,788 per annum
Fixed-term contract for 14 months and 11 months

The Bodleian Libraries Digital Asset Management System (DAMS) aims to provide a common digital object management and storage infrastructure to underpin the activities of the Libraries in the digital domain. The DAMS architecture aims to provide a long term storage and digital preservation environment that can adapt to changes in technology and tools whilst providing an abstracted set of interfaces which can be used for application development.

The SoftwareStore Project will involve two months of development work, with the Bodleian Libraries and Isis Innovation, to develop a ‘first release’ of a tool called SoftwareStore. This project is funded by an EPSRC Pathways to Impact Grant. SoftwareStore will provide storage for and dissemination of software code developed by Oxford researchers and which exploits new technologies to support increased impact. You will also work with another developer to integrate SoftwareStore with a tool called SearchResearch which will provide a registry and search tool focusing on Oxford University researchers and their research activities. Both services will be freely available for internal and external users – for example, internal departments, Research Services, Press Office, and external entities such as funders, commercial concerns or policy makers.

The JISC-funded DaMaRO Project focuses on enhancing and extending Oxford’s existing pilot research data management infrastructure to enable and encourage the wider re-use and repurposing of research data. It will enable Oxford to comply with policies from research funders that mandate that research data must be deposited in an appropriate repository and made available for public use, interrogation, and scrutiny, for a certain period of time after the end of a project. The research data infrastructure environment developed by DaMaRO will support the full research data lifecycle from planning to re-use.

The Integrating Broadside Ballads Archives Project aims to improve public access to and understanding of the rich musical, literary, visual and bibliographical traditions embodied by the broadside ballad. In partnership with the Early-Modern Center in the Department of English at the University of California Santa Barbara and the The Vaughan Williams Memorial Library of the English Folk Song and Dance Society Society, the project will further cooperation in the description and understanding of broadside ballads among researchers and institutions holding ballads in their collections.

You will provide sole technical support for the project(s) you are supporting, utilizing development tools, databases and web-based solutions. There will also be a requirement to liaise appropriately with technical colleagues from other units both within and outside the University.

You will be able to develop and maintain complex software solutions. You will require analytical and communication skills to determine technical and functional requirements. You will have an aptitude for developing user interfaces, particularly in a web environment. You will work well in a team of non-technical and technical colleagues to achieve common goals. You will be creative and responsive to the needs of the curators and researchers who will use the systems that you develop.

Applications for these vacancies are to be made online. Applicants are invited to apply for one or both of these posts and please specify in your application for which post(s) you would like to be considered. To apply for these roles and for further details, including a job description and selection criteria, please click on the links below:

https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=101844

https://www.recruit.ox.ac.uk/pls/hrisliverecruit/erq_jobspec_version_4.jobspec?p_id=101850

Only applications received before 12pm on Thursday 26 January 2012 can be considered. Interviews will be held on Monday 6 February 2012.  You will be required to upload a supporting statement as part of your online application.

  • Comments Off
Aims, Objectives and Outputs of the Digital.Bodleian Project

As described in the previous Digital.Bodleian blog post, the Bodleian Library, like many cultural heritage institutions, has a range of digital resources available online, but these resources have been created across over a decade or more, with content stored in discrete ’silos’, each with their own metadata format, different user interfaces, and no common search interface enabling users to discover content or navigate across collections.

The Digital.Bodleian project aims to unifiy and open up our collections by:

  • Bringing together our discrete collections under a single user interface which supports fast user-friendly viewing of high resolution images.
  • Standardising the metadata for each collection to facilitate faceted browsing and searching across collections.
  • Converting all of our images in a variety of formats to jpeg2000 and migrating them to a robust scalable storage infrastructure.
  • Providing an OAI-PMH target and the ability to machine-read and harvest these collections.
  • Allowing users to tag, and annotate images, and group together content into their own virtual collections which can be shared with other users.
  • Allowing users to export metadata and images.

All of these tasks are to be carried out using standards-compliant file formats and methods and with a view to future expansion, scalability and robustness.

The four main outputs from the project will be:

  • A single user interface to our digital collections, based on the iNQUIRE system produced by our project partners, Armadillo Systems.
  • A new file store containing lossless jpeg2000 master copies of our legacy image collections.
  • An OAI-PMH target to enable harvesting and machine-readable access to content.
  • A set of standardised metadata describing the digital objects ingested into the Digital.Bodleian framework.

In addition we expect to fully document the processes we used and the lessons learned during the development and testing process.

Benefits of Digital.Bodleian

The successful completion of the Digital.Bodleian project will make available to academic and non-academic users at Oxford, and in the wider sector, many hundreds of thousands of digital images and their accompanying metadata. For most of these images, this will be the first time they have been made available in a user interface that supports cross-collection searching, user-driven collection building and annotation, and fast high-resolution image viewing. In addition, much of the content that will be made available in Digital.Bodleian will be being brought out of our private archival file store for the first time.

The Digital.Bodleian site will be a major scholarly resource, but also a source for images of strong historical and intrinsic interest for those who aren’t academic specialists.

The project also offers benefits for the University of Oxford, and the Bodleian Library, as much needed reorganisation of storage, conversion to a common file format, and normalisation of metadata will also be performed as required during the ingest of our legacy collections into Digital.Bodleian.

Risks for Digital.Bodleian

Table of Risks for Digital.Bodleian Project


As part of the project the Digital.Bodleian team and Armadillo will carry out extensive user testing, and scalability testing, to ensure that the browser and machine readable interfaces to the Digital.Bodleian system perform as expected under expected loads, and can continue to perform when more content is added, and if usage levels increase.

IPR

All images that will be made available via Digital.Bodleian are, unless otherwise indicated, copyright the Bodleian Libraries and are free for private use and teaching provided the source is acknowledged. A full copyright statement and a permissions form for all other uses, including publication, are available here:

http://www.bodleian.ox.ac.uk/bodley/services/copy/imaging_services/copyright.

Higher resolution images, for publication and other uses, are available via the Bodleian Libraries Imaging Services:

http://www.bodleian.ox.ac.uk/bodley/services/copy/imaging_services

Part of the Digital.Bodleian project will involve producing an inventory of the digital content currently made available on the web, and stored in our archive. We expect that some of the content we make available may be under non-Bodleian copyright, and where this content is made available via Digital.Bodleian the image and metadata rights will reflect this.

Project Team Relationships

The Digital.Bodleian Project will be directed by Dr Christine Madsen, and managed by Dr Matthew McGrattan. Additional technical assistance will be provided by Neil Jefferies, as technical consultant; Alexander Huber, as metadata specialist, and Renhart Gittens as developer. Additional assistance will be provided by Michael Popham, as a digital library specialist, and Sally Rumsey in digital collections development. The project will also recruit one full time member of staff for the duration of the Project.

Project Director Christine Madsen Christine Madsen has many years of experience delivering technical and digitisation projects in libraries. Before joining the Bodleian she ran the Open Collections Program at Harvard University, a large scale digitisation programme intended to open Harvard’s special collections to the world. At UC San Diego, she worked on several metadata mapping projects (including for ARTStor), specialising in metadata for visual images. She has a D.Phil from the University of Oxford in Information, Communications and the Social Sciences and an MLIS from San Jose State University.
Project Manager Matthew McGrattan Matthew McGrattan has been involved with the internet industry since the early 90s. He has worked in academic IT—including training and teaching, digital imaging, and database and web development— since 1998 for various Oxford colleges, the Bodleian Library, and the Museum of the History of Science. He is currently responsible for the infrastructure used for the production and storage of digital image holdings created within the Bodleian Library, and for the Luna image delivery platform. He has an MA(Hons) from the University of Glasgow, and B.Phil and D.Phil degrees in Philosophy from the University of Oxford.
Projected Timeline and Workplan

The work packages involved in the Digital.Bodleian project are as follows:

Work-package 1: Project Planning and Management (November 11- July 2012)
Description: Finalisation of Project plan; liaison with JISC and other participants in the wider Discovery programme; liaison with project partners; documented planning and reporting; legal documentation
Deliverables: All reports and formal agreements required by JISC.
Evaluation: Acceptance by JISC Programme Manager.
Responsibility: Project Manager
Work-package 2: Inventory and Assessment (December 2011-January 2012)
Description: Assessment and full inventory of existing metadata and digital objects, with a timeline for retrieving both.
Deliverables: Full inventory of files, including represented metadata formats record formats
Evaluation: Project Director.
Responsibility: Project Manager, Metadata Specialist
Work-package 3: Metadata Mapping and Ingest (January-April 2012)
Description: Based on the outcomes of work-package 2, final decisions will be made about mapping and ingest of records.
Deliverables: A large object store containing descriptive and technical metadata for each object, with all metadata adhering to our RDF object model.
Evaluation: Project Director, Technical Consultant
Responsibility: Technical Developer, Metadata Specialist
Work-package 4: Installation and testing of Web Interface [BETA] (March-May 2012)
Description: Initial installation of the Armadillo Systems web interface for searching and browsing. Testing will include focus groups, interviews and consultation with user experience professionals
Deliverables: Beta version and soft launch of digital.bodleian site; results of user testing.
Evaluation: All project stakeholders.
Responsibility: Technical Developer, Technical Consultant
Work-package 5: Re-launch of Web Interface (May-June 2012)
Description: Feedback from the user testing will be incorporated into iterative design and functionality changes in the user interface. The new site will be re-launched.
Deliverables: Finished web site.
Evaluation: All project stakeholders.
Responsibility: Technical Developer, Technical Consultant
Work-package 6: Documentation (May-July 2012)
Description: Providing access to the outputs of the project, including (but not limited to) information about the project, the final web site, and harvesting capabilities (OAI-PMH, RDF).
Deliverables: In addition to clearly and easily accessible information about the collections and data we make available, we will present our experiences in this project as a case study, emphasising the benefits and lessons learned from our decisions.
Evaluation: Acceptance by Project Advisory Board, JISC Programme Manager, and stakeholders.
Responsibility: Project Director, Project Manager
Work-package 7: Dissemination (July 2012)
Description: Partnering with Bodleian’s communications team to develop and deploy a dissemination campaign including press releases.
Deliverables: Announcements to appropriate online lists, social media channels, presentations at appropriate academic and library conferences, reports and information about the project.
Evaluation: Acceptance by Project Advisory Board, JISC Programme Manager, and stakeholders.
Responsibility: Project Director, Project Manager

We expect several of these work-packages to run concurrently, and outputs from each will be documented in blog posts on this blog, along with regular updates with decisions made, lessons learned, and any technical documentation created.

Budget

Digital.Bodleian Budget Chart

  • Comments Off

The Bodleian Libraries’ collections are extraordinary and significant—both from a scholarly point of view and as material that has an historic and aesthetic richness that holds value for non-academic users. Each year the Libraries serve more than 65,000 readers, over 40% of them from beyond the University, while its critically-acclaimed exhibitions attract almost 100,000 visitors annually. In an effort to make portions of our collections open to a wide variety of users from around the world for learning, teaching and research, the Bodleian Libraries have been digitizing library content for nearly twenty years. The result is over 200,000 freely available digital objects and at least another 1.5 million images awaiting release.

Like many academic libraries, though, our freely available digital collections have been placed on-line in project driven websites, with content stored in discrete ’silos’, each with their own metadata format, different user interfaces, and no common search interface enabling users to discover content or navigate across collections. Some of our collections are linked at portal pages such as (http://digital.bodleian.ox.ac.uk/) but each collection remains, with a few exceptions, isolated and difficult to search. In addition, only a few collections offer a machine-readable interface, or any way to link their data with similar data in other Bodleian collections, or with collections at other institutions.

The Digital.Bodleian project aims to solve these problems by:

  • Bringing together our discrete collections under a single user interface which supports fast user-friendly viewing of high resolution images.
  • Standardising the metadata for each collection to facilitate faceted browsing and searching across collections.
  • Converting all of our images in a variety of formats to jpeg2000 and migrating them to a robust scalable storage infrastructure.
  • Providing an OAI-PMH target and the ability to machine-read and harvest these collections.
  • Allowing users to tag, and annotate images, and group together content into their own virtual collections which can be shared with other users.
  • Allowing users to export metadata and images.

All of these tasks are to be carried out using standards-compliant file formats and methods and with a view to future expansion, scalability and robustness.

The Digital.Bodleian project has been funded by the JISC as part of the Resource Discovery programme, and began in November 2011, with the project to finish by the end of July 2012.

  • Comments Off