The White House Blog

You’ve Got to See it Before You Can Read it! Making Ancient Texts and Images Available on the Web

William Noel

William Noel is being honored as a Champion of Change for the vision he has demonstrated and for his commitment to open science.

I was one of those people who found their passion early. When I was about six, my Dad gave me a book called The Nursery History of England.  It started out with “Little Men and Big Beasts,” and it ended with Queen Victoria.  Not that I spent much time at the end of the book. I got bored when the book reached 1381, the date of the Peasant’s Revolt. I just read the beginning, with pictures of Alfred the Great and King Canute and King Harold and the Battle of Hastings, over and over again. I was hooked on medieval history.

My expensive English education only confirmed my life path.  My history teacher was sublime; my science teachers were terrible:  Physics was taught by a man who fenced for England, biology by a man who jogged across the United States, and chemistry by the UK hockey coach. But they couldn’t teach the sciences, and if I had any latent interest in science, it totally died.

Fast forward 25 years and I was the curator of a wonderful collection of illuminated medieval manuscripts at The Walters Art Museum in Baltimore.  I knew nothing about the digital revolution that was about to explode, and neither did the museum.  And then a private collector left on my desk an old book called The Archimedes Palimpsest.  Its important texts - which included unique works by the ancient Greek mathematician - were erased in the thirteenth century, and the private collector charged me with making them legible. This developed into a worldwide project that involved multispectral imaging in Baltimore and X-ray fluorescence imaging at the Stanford Linear Accelerator Center. It also involved the work of scholars of Greek texts throughout the world.  It was a cool project, we discovered neat things, and I finally got to realize that science – ancient and modern – was actually incredibly cool.

The importance of the project in this context, however, is that the owner of the book insisted that we publish the raw data as a set of flat files on the Internet, for anyone to use however they liked, and for free.  I thought this was a nutty idea. How were people actually going to read the book: Would they have to open up each of the files in turn?  Why were we not building an interface for people to conveniently view the book? Surely that was what was needed...

As so often in this project, I was wrong. The point is that anyone could build an interface, anyone could do with this data exactly what they wanted. They could ingest it into their own institutional repositories, they could further process the images, and they could create interfaces to read the text.  And that is exactly what has happened. The dataset is now replicated in libraries around the world, and the images are being enjoyed by all sorts of people in all sorts of different context.  The project is over, but the data and its manipulation live on in an open environment.

This experience fundamentally transformed me as a curator of rare materials.  With a wonderful crew of people, and with funding from the National Endowment for the Humanities (NEH,) I started to digitize the illuminated manuscripts under my care and present them on the web in the same form as the Archimedes data.  The result is that images from these manuscripts are now the easiest images to find of medieval manuscripts on the web: just try finding them on a Google image search!  The traditional audience for these materials is grateful, and entirely new audiences have been reached.

The great problem in my field is that so few repositories of ancient books make digital images of their material available in truly useful ways:  the data needs to be free, it needs to be published at the resolution at which it is captured, and it needs to be presented outside any fancy interfaces so that others can ingest it and use it as they like with the least “friction” possible.  The web of medieval manuscripts in the future isn’t going to be built by institutions; it’s going to be built by users who are going to present the data as they want to present it, to answer the questions that they want to ask.  The institutions need only provide the data – but they do have to provide the data! I now direct The Schoenberg Institute for Manuscript Studies at The University of Pennsylvania, which is in part dedicated to making this happen. 

I want to use this opportunity to talk about another fascinating dataset created by the same team that created the Archimedes Palimpsest data.  Like the Archimedes manuscript, this one too is a palimpsest – with the important text scraped off.  The erased text was written in the ninth century, and it is by far the fullest witness to a Syriac translation of Galen’s On Simple Drugs by Sergius of Res ‘Ayna. There could well be other texts in the manuscript that have yet to be identified.  A group of Syriac scholars is working on this, but the text was much more thoroughly erased than the Archimedes text, and what is needed is a campaign by people who can process the raw data to create legible images for these scholars.  So, if you are an image processor and feel up to the Indiana Jones Challenge, have a go at it, and send me the results.  I’ll put you in touch with the right people!  Here is the dataset.

William Noel is the Director of the Schoenberg Institute for Manuscript Studies, and the Director of the Special Collections Center at the University of Pennsylvania.