Chasing Chaucer

There is an aura of mystery surrounding The Canterbury Tales, Geoffrey Chaucer’s collection of poetry from the 14th century.

By Lesley Porter Apr 10, 2015

The original manuscript is long lost, and the remaining versions pre-dating 1500 are all slightly different. However, English professor Peter Robinson is bringing some clarity to the Middle English masterpiece with his transcription project.

Robinson and his collaborators took on an enormous task: transcribe the 88 remaining versions—some 30,000 pages of text—into an online database. Line by line, the manuscripts were entered in their original Middle English, a painstaking process. Initially, it took about half an hour to transcribe a page, but "by the time you include all the checks, it's going to get closer to an hour," Robinson said. "That means 30,000 pages, 30,000 hours, which is 15 years of work, roughly."

There are about 12,000 pages to go. Once the transcription is complete, Robinson hopes to determine which one of the manuscripts is closest to the original by analyzing the differences between versions, some of which are so subtle one might miss them at first glance.

"What we need to do is to figure out how they are related, which one descended from which, which belong to the same family, and then from that, arrive at some kind of understanding of how the texts developed," he said. This can be difficult, he explained, as the spelling the scribes used back then were not standardized; regional dialects had a lot to do with the variation as well. "In different parts of the country, people used forms of English different from each other, both in terms of pronunciation and spelling."

Robinson used a line from the poem The Nun's Priest's Tale as an example. Appearing in most modern versions as "And no wine drank she, either white or red," a Middle English translation reads as "No wyn drank she, neither whit ne reed." However, the line may vary from one version to another; wyn may become wyne or wynne, whit may turn into white or whyte, and so on, depending on the medieval scribe that wrote it.

With so many variations, Robinson uses a powerful computer program to keep track of even the most subtle nuance between the manuscript texts. The transcribed pages are run through the program that records the spelling and grammatical differences. It is similar to how evolutionary biologists chart out a family of organisms and record their characteristics.

"It's really quite groundbreaking to figure out how to take manuscripts and put them inside the system, and get useful results from it," he said. "There's an enormous amount of information here about how people spoke and wrote English in 1390 to 1500. And that's not really been explored yet."

Aside from exploring the early medieval language structure, Robinson also hopes the project makes Chaucer more accessible by opening it up to anyone who would like to help with the transcribing process. "It's crowdsourcing, essentially," he said, adding that a high level of academic control will be kept over the project to maintain the quality of the work. "We'll end up with a lot more people owning it if we expand the number of people working on it."

That sense of accessibility is important, he said, acknowledging that reading Middle English can be challenging. But if it is presented in an engaging way for a digital-savvy audience (such as in a smartphone app with the translation shown on screen), others may appreciate Chaucer more.

"It's amazing how many people I've met who said, ‘I read The Canterbury Tales when I was in school, and I hated it'," he said with a laugh. "If you have to read it on the page, it's not very interesting, but if you hear it, and you can see the translation and understand what's going on, it becomes so much more alive."