CEA 2015, Indianapolis, IN
March 28, 2015
Novels in Context: A TEI Database for Teachers, Students, and Scholars
In Fall 2012, I was teaching my “survey” of World Literature (1500-1800… I know!), a class that I typically teach once an academic year. I use the Norton Anthology, Volumes C and D, because they have a good variety of the materials I like to include, accessible headnotes and introductions, and pretty good footnotes for some of the less familiar and more global literary representatives. Because I also, like many of us here, I expect, teach a variety of other classes with little consistency year to year, I have limited, if any, time to make changes to my syllabi, particularly when they have been working well–as this class had been. In Spring of 2012, Norton released the 3rd edition of their anthology–a new edition that had new materials included, as well as, mind-bogglingly, different translations for some of the pieces that were retained. While I understand the need to offer current materials that represent the best practices in college literature surveys, the demands of the job make this kind of planned obsolescence, to use Kathleen Fitzpatrick’s useful term, especially frustrating. What if we could do better for faculty, and in the process, open the pedagogical experience up to incorporate something of the historical materiality of the texts we’re working with? What if we could also engage students in tightly-organized projects that are of real scholarly use?
Imagine a classroom where harried faculty no longer had to plan lectures, class activities, and assignments around textbook materials that may change or even disappear, and which many students never end up purchasing at all; or, where we no longer had to scrounge around the web for free, quality resources that students may or may not print out and bring to class. Imagine going to a reputable site where you can select from a range of primary source materials that include facsimile page images, relevant and recent headnotes, and even reading questions; a scenario in which you can easily publish that subset to a personal website, or even save them to a single PDF document that can, itself, be posted to a course site or even made available for the cost of the copying in a campus bookstore. Or, imagine how a project that involves students and faculty in creating it might transform an upper-level class. These scenarios are what I imagine for the project I want to share with you today.
Novels in Context: A TEI Database for Teachers, Students, and Scholars (NiC) is a project that links my interests in the digital humanities and eighteenth-century studies. While the scenarios I drew for you earlier is based on a future iteration of the project, I hope today to show you where I’m starting and talk a little about what I want to end up with. I was lucky enough to receive a sabbatical grant to begin work on the Novels in Context project last term, and that gave me time to research a variety of open-source database applications that worked with XML files; it also gave me time to learn the basics of xQuery, the database query language that interacts with the XML files. I spent time thinking about how the documents would be marked up, and what the costs and benefits would be of chosing either interpretive or descriptive markup. I went to the Library of Congress and got in touch with some other special collections libraries to test out the process by which I could acquire page images of early sources to include in the database, and investigated how to link them into the database for web-accessible display. I wasn’t able to do more than scratch the surface of the project as a whole, but this did give me the time and space to start, as well as figure out where the project could go.
So, essentially, Novels in Context is a collaborative digital scholarly project with pedagogical significance that seeks to provide an agile and web-accessible alternative to 1.) the costly and proprietary Eighteenth-Century Collections Online, 2.) the single print anthology on the subject, and 3.) the free but scattered and often unreliable resources available on the Internet. The project will provide a free and open electronic subset of primary source materials focused on the history of the novel in English that is highly curated, extensible, fully indexed, searchable, and accessible. Here, you can see that the basic setup of the app includes a basic user interface that contextualizes the project, offers a full-text search, and displays in list form the materials currently in the database. The files in the database right now are TEI-formatted XML documents–I’ll show you what they look like in raw form a bit later and talk about my markup choices. When you click into a document, you’ll see an excerpt of the full-text, page images of the excerpt (if it is an excerpt–this essay by Johnson is the whole thing, because it’s brief) that has been selected for inclusion because of its significance, and below, some information about the material object and its various online iterations–plus links to the ESTC and 18th Century Book Tracker.
This is all pretty basic and easy enough to do, though for someone unfamiliar with XQL, there was a bit of a learning curve. I first set the project up as a local installation on my laptop, where it runs on Apache, and I used the sample database of Shakespeare’s works as a model. After I became more confident, I posted all the code online, to Github, an application that offers easy ways to version control, collaborate, and publish the code. I met with one of the developers of eXist for coffee one day, and he explained quite a bit of the process to me, which helped greatly (shout out to Joe Wicentowski!); the eXist online developer and user forums are also essential resources for a beginner. Anyone here today can install eXist, download my app from Github, and install it on their own server–I wanted to do this because I think it’s important that literature out of copyright be freely available in the most usable, useful form, and anything I can do to enhance our access is in the realm of good. I also wanted to make it available like this because I hope to work with students in IT this summer to further refine and develop the project in ways I’ll describe later.
Now I’ll talk for a little bit about the XML markup and why I chose to do it this way, as opposed to some other way. By the way: how many of you are familiar with the terms I’m using–particularly XML and TEI?
If not many, explain. If lots, skip. Probably, there’ll be few familiar with these terms?
XML stands for eXtensible Markup Language–it looks a lot like HTML, where text is “tagged” or “marked up” so that your browser can interpret it. The difference between XML and HTML is that XML is extensible–that is, you can add to it, and create new tags and markup depending on your needs; you can create your own schemas that your application can interpret however you ask it to interpret. XML can be used by businesses to keep track of inventory and display that inventory dynamically on the web–but in my experience, it’s most often used by scholars to make texts machine-readable. The Women Writer’s Project, for instance, uses XML–there are many examples available, and you can use google to search for more. It is often used to describe manuscript materials, and there are ways to use it to make other media, like musical scores or videos, readable to machines, too. The idea is that, in the absence of the thing itself, this document can achieve a kind of descriptive exactitude that makes the absent object visible. Depending on the tool you use to “read” or “display” the text, you can get quite detailed. TEI refers to the Text Encoding Initiative, a group of scholars and coders who have created a standard for describing things like manuscripts, early printed matter, and so on. So, TEI is the standard of the XML I’m using–there’s a specific set of tags and syntax that all scholars doing this sort of work would use–you can imagine why that would be necessary.
If we look at this XML file, you can see what I’ve done. I’ve got a header here that describes the publication history of the object–not only the particular text I’m working with, but also the page images I’m working with and its other available electronic iterations, as well. My header also contains information about this electronic version of the text–who created it, who did what part of the creation, and so on. The pedagogical applications of this I think are clear and straightforward, but often overlooked; it is meant to locate the object in time and space, giving it a local habitation and a name, as it were. Too often, students turn without awareness of context to “the web” writ large, not knowing much of anything about the specific thing they’re looking at, much less its provenance, who edited it, and so on. This contributes to, and even embodies, the rootlessness of our student’s reading and research skills. While anthologies can seek to help craft a narrative by putting texts in chronological or thematic order, it is not a given that our students grasp that narrative clearly. If students were to, in collaboration with a faculty mentor, participate in creating such XML documents, they would be required to spend the time putting the text back into its historical and material context–and also begin to understand the relationship between the unmoored “web version” they’re so used to reading, a contemporary student edition like an Oxford edition, and the historical printings or even manuscripts that those editions draw on. This version becomes a point in a line as clearly defined as the context requires.
I won’t go into detail about a lot of the other aspects of the XML here, but I do want to describe the evolution of my thought regarding conceptual markup or annotation. When the project first began, I wanted to identify specific places in the text where a key theme or topic was evident–for instance, a reference to formal realism, or a reference to exemplary characters for imitation, or to the dangers of reading, or to the relationship between novels and the romance tradition. However, I quickly realized that one reader could very easily locate multiple topics or themes in a single span of text–this would make standardizing the database very challenging, and an illogical markup makes for problematic data. A full-text search of the database for keywords would ultimately be just as effective. And finally, I also worried about imposing a reading on the text that students should really discover for themselves. This led me to the way I’m currently conceptualizing the markup, which is more descriptive and clarifying–for instance, defining important and unfamiliar words, identifying allusions, clarifying a reference, or tagging people and dates to create the basis of a network that may become more useful for the application later.
I began thinking of the data less as a collection of topics and more as a collection of things. And because of this, I realized that student contributions could very easily be of real use to the project. Many digital pedagogical projects have short life spans, or are significant purely in the doing of them. Student labor in the digital humanities is often elided, sometimes for very good reasons–the contributions are exercises in pursuit of a specific pedagogical goal, and often that goal is removed from the larger scholarly conversation. We encourage students to see themselves as a part of an ongoing conversation, but we don’t expect them–somewhat myopically–actually to contribute some new understanding of the history of the novel, for instance, or even a new interpretation of this particular text. Sometimes student work is less than polished, and it could be more informed in its research–and that’s okay, it’s the nature of the exercise to be a learning experience. However, what if students could contribute meaningfully? Students can consult library and Internet materials to identify an allusion, or the birth and death dates of a person referenced, and they can identify the structural parts of a text (title, subtitle, page, paragraph, verse paragraph, and so on). Students can go to a special collections library and take 400dpi, margin-clear photographs (often on their phones!) of the page images of a first edition, and if its not the first edition, they can identify which edition it is. They can work with each other and with their faculty to draft headnotes, or reading questions. And what if their XML documents of specific texts or excerpts, headnotes or reading questions, could be submitted to an editorial board that vetted them for accuracy, style, and completeness, to be re-used by you, or you, or you, the next time you’re teaching your class? A clear style sheet would be useful here, of course, but the idea is not far-fetched. Indeed, I think it has real implications for both pedagogy, scholarship, and scholarly publication.
Which leads me to the final portion of my presentation, on future directions. Over the summer, I’m hoping to get a grant to work with three IT students to flesh the project out–I’d like to be able to create a user registration process, whereby individuals can submit XML documents plus page images, for instance. This will require a clearly defined workflow so that an editorial board can identify which documents can be published, which rejected, or which published with revision; I also want to provide a way to display the variety of headnotes and reading questions–perhaps organized by general theme or topic–and make them available, with the documents themselves, for exporting to another source as a coursepack or anthology. Because the project subscribes to an open culture license, these materials will be freely reproducible and distributable. For a nominal fee, faculty could have them bound and available for purchase in the campus bookstore, and/or made available in PDF.
So, these are my next steps–however, I’m in the process right now of submitting the grant application, and I’m learning a lot there, too. In particular, the IT students, I’m realizing, don’t have a clear awareness of digital humanities as a growing field of opportunity, the concept of a free and open source ethos, or important concepts like version control. Coursework in our IT departments focuses on subjects very different–this will be a challenge for the project, but also a real opportunity.
Ultimately, my goal is to make the program and platform itself available for public reuse, with any kind of content. With faculty buy-in, it can be a real alternative to pricey and frequently revised print anthologies by Norton or Bedford. Students and faculty can work together to contribute to a growing, scholarly, and free collection of primary resources, where faculty can —will be a real asset to students and teachers of this important development in literary history. Finally, by actively engaging students from Information Technology, I hope that this project will provide a model for future interdisciplinary collaboration, mentorship, and even publication.
We face a significant challenge, as teachers and scholars, of making distant material relevant for students raised in an environment of standardized, workforce-oriented learning. Part of our work as teachers consists in our attempts to combat this sense of irrelevance and standardization by adopting habits of active, project-based learning, many of which involve students in our own research agendas. The primary source materials in Novels in Context are not only useful for students of eighteenth-century letters, but their presentation also offers a window into the material history of novel reading and publishing. By building the resource in compliance with the Free and Open Source ethos and by incorporating student-authored markup among the scholarly contributions to the database, I hope the project become a public site that makes vivid the active production of knowledge, both in history and into the digital realm. As such, I intend it to offer an interrogation of current modes of scholarly publication as well as textbook, anthology, and coursepack production. Current habits of scholarly publication are emphatically not open to the public, but secured behind institutional paywalls, and typically exclusionary in both content and form; similarly, the costs and methods of textbook production by many accounts is more burdensome than enabling for student learning. The future of publishing, the work of learning, and the demands of public discourse are changing, and as teachers and scholars, part of our charge is to ensure that these changes benefit our students’ intellectual, ethical, and civic growth.