Category Archives: dh

How-to guide for basic collaborating in GitHub

FacebookTwitterShare

The summer is upon us, and I have had the great good fortune of a grant to be able to work with three IT students from Marymount on the Novels in Context project. We are all learning about how to collaborate using GitHub for version control, and I found myself in the exciting position of having to teach others what I am also teaching myself. So, I wanted to give a very clear step-by-step, which I hope others find useful. I’ll update the document with information on pull requests when that happens. For the time being, here’s the how-to!

To theme or not to theme?

After the November EC/ASECS conference and the Thanksgiving holiday–as well as working on other projects, like my Kinski paper–I’m now back to work on the database. While I have a basic working model, I still need to put it online, and there are several other issues I’m trying to work through before I need more help from people who know more than I. I am currently working on adding basic references and personography details, and rethinking the thematic markup. Each time I think I have a thematic schema, something emerges to undermine, complicate, or destabilize it. Any thematic assessment is necessarily interpretive, and I am not sure how elaborate my interpretive additions should be. As Jerome McGann and Dino Buzzetti have aptly noted, “Markup…is essentially ambivalent” and it “should not be thought of as introducing—as being able to introduce—a fixed and stable layer to the text. To approach textuality in this way is to approach it in illusion.” This is especially evident when attempting a thematic markup, but it is also an issue when identifying what to note in, for instance, a personography. The questions that I have, or rather, that I think students will get meaning out of without overly biasing or directing their engagements with the texts, will emerge as more or less evident given how I structure and mark up the materials. Thematic markup has the real potential to take agency away from the student reader, rather than helping the student reader place the excerpt in a larger material context, which, despite our efforts at informational literacy, still eludes many. Additionally, the fulltext search feature is generally sufficient for keyword thematic analysis. For the time being, therefore, I have chosen not to incorporate thematic analysis in the XML, but I do want to explore issues of personography, bibliography, allusion, reference, and structure.

Some of the next steps are to determine what needs to be removed from the header information (that is, what I can put in a separate file or elsewhere, so as not to duplicate it in each file); identify key names/dates/places/references and consider how they may best be marked up;  and adapt the XQL to account for some of these changes (those that affect display). Then, a big project I hope to have some help tackling is upgrading my server, installing eXist-db (and other necessary packages), and installing a live working copy!

DH2015 Poster Proposal

Novels in Context: A TEI Database of Primary Resources for Teachers, Students, and Scholars
DH 2015 | Poster Proposal

Tonya Howe
Marymount University
thowe@marymount.edu | cerosia.org

In this poster presentation, I hope to share the (evolving) product of a recent grant to create a free- and open-source database of curated, excerpted, annotated primary source materials useful for the study and teaching of the 18th century novel in English. The Novels in Context (NiC) project combines my interests in DH with my scholarship in eighteenth-century studies while creating a pedagogical opportunity for student-faculty research and publication. The goal of the NiC project is, most immediately, to provide an agile and accessible alternative to 1.) the costly and proprietary Eighteenth-Century Collections Online, 2.) the single print anthology on the subject, and 3.) the free but scattered and often unreliable resources available on the Internet.

NiC is built on TEI-formatted XML in an eXist database, an open-source native XML platform. To ensure the project remains free and ethically unencumbered by proprietary concerns, all the texts are transcribed and marked up individually and page images secured through library collections holding first editions. The XML is both structural and critical, including key themes and topics–this is primarily in anticipation of the time when NiC can incorporate MALLET topic modeling and other modes of visualization. The project will incorporate an editorial board to publish user contributions more broadly, including student-faculty collaborations. In the long term, I hope to create a portable platform that instructors can use to generate coursepacks of quality, reliable material (headnotes, reading questions, and so on) that could potentially stand in for the ubiquitous, proprietary, and costly textbook, engaging students in a broader conversation about intellectual property and the public good.

NiC is currently in the working prototype phase, and a copy of the current build is available for technical collaboration on GitHub (https://github.com/tonyahowe/NiC); I am also eager to find collaborators among the world of coders, scholars, and teachers of literature.

Iteration and calling new functions

This past week I think I’ve had a couple breakthroughs in terms of understanding how xQuery works. The rubicon was when Christian Moser from the eXist forum explained iterative variables to me. I know, of course, what iterative variables are, but I didn’t know exactly how they worked in XQL–in my head, creating a variable $n and not tying it to a TEI element n is strange. But there’s no reason for that–what we call something in XQL has nothing but convention to do with what we call something in XML. So, there was that, and also the fact–which had not occurred to me–that you can define variables in for expressions as well as let expressions. Also, the iterative variable can be bound to the position in the iterative cycle. I don’t know if this makes any sense, but it helped me tremendously.

For instance, here’s how I made the TEI statement of responsibility <respStmt> display, accounting for more than one editor. Let’s say we have two people collaborating on a document:

<respStmt>
<resp>Transcription and correction</resp>
<name>Elizabeth Ricketts</name>
</respStmt>
<respStmt>
<resp>Correction, editorial commentary, and markup</resp>
<name>Tonya Howe</name>
</respStmt>

We want to render them on separate lines, connecting the <resp> and the <name> with ” by ” for readability. Here are two things I tried:

{
for $respStmt in $header
let $respCount := count($header//tei:respStmt)
return
if ($respCount > 1) then
<p>{string-join(($titleStmt/tei:respStmt/tei:resp, $titleStmt/tei:respStmt/tei:name), ‘ by ‘)}</p>
else
concat($titleStmt/tei:respStmt/tei:resp, ‘ by’, $titleStmt/tei:respStmt/tei:name)
}

and
{
 for $resp in $resps
return
<p>{string-join(($titleStmt/tei:respStmt/tei:resp, $titleStmt/tei:respStmt/tei:name), ‘ by ‘)}</p>
 }

These bits gave me the following results, respectively:

Transcription and correction by Correction, editorial commentary, and markup by Elizabeth Ricketts by Tonya Howe

and
Transcription and correction by Correction, editorial commentary, and markup by Elizabeth Ricketts by Tonya Howe
Transcription and correction by Correction, editorial commentary, and markup by Elizabeth Ricketts by Tonya Howe

Here’s what I ended up with, through the magic of iteration.

{
for $n in $resps
return
<p>{concat($n//$resps/tei:resp, ‘ by ‘, $n//$resps/tei:name)}</p>
}

Transcription and correction by Elizabeth Ricketts
Correction, editorial commentary, and markup by Tonya Howe

It was the $n in front of the //$resps/tei:resp that made no sense to me–but you can think of it as the iterative variable and its position in the cycle of respStmts. Pretty cool, huh? I thought so.

And then, today, that breakthrough led me to another: I had been struggling to get the source information in <imprint> to display properly, especially since I want there to be multiple sources, including but not requiring a first edition, an online edition, and any other edition used for the project. The online imprint should display a live link. Here’s the catch–I’m using <date type=”firstEd”> or <date type=”accessed”> to identify dates associated with print versus online, and <extent type=”physical”> or <extent type=”onlineLink”> to identify the extent of the sources. This seems to be the most consistent in terms of TEI P5 guidelines. So, I wanted to test something (I went with date, but could have gone with extent) to see if it were a web source, and if so, display different information. Here’s what I came up with, in tei2html.xql:

<h3><b>Sources:</b></h3>
{
for $n in $imprints
return
<li>
{$n//$imprints/tei:pubPlace}: {$n//$imprints/tei:publisher}. {$n//$imprints/tei:date}.
{
if ($n//$imprints/tei:date/@type = “accessed”) then
<a href=”{$n//$imprints/tei:extent}”>{$n//$imprints/tei:extent}</a>
else
$n//$imprints/tei:extent
}
{$n//$imprints/tei:note}
</li>
}

And it did exactly what I wanted it to do!

Then, from this whole collection of moments, I could fix my image display problem. I had already gotten the <pb facs=”image.png”> elements to display inline with the text using a typeswitch in the main XQL display functions page that called a little function in the same page each time it met with a <pb> element. Like this, in the tei2html.xql:

declare function tei2:tei2html($nodes as node()*) {
for $node in $nodes
return
typeswitch ($node)case element(tei:pb) return
tei2:pageImages($node)
};

declare function tei2:pageImages($pb as element (tei:pb)) {
let $facsPage := $pb/@facs
for $pb in “work”
return
<img src=”../images/{$facsPage}”/>
};

But this seemed clunky to me in the page display as a whole–in addition to being an inelegant solution generally. I wanted to put the page images in a sidebar, but translating that interdependent function to a standalone function and then calling it with <div data-template=”tei2:pageImages”/> kept returning errors. tei2:pageImages can’t be called as is with <div data-template=”tei2:pageImages”/> in the sidebar location, for reasons having to do, I think, with what information is being called in the node and the work as a whole. There are probably other reasons, too; suffice to say, it didn’t work. Using what I learned here and above, though, I came up with this, and put it not in tei2html.xql, but in app.xql:

declare function app:pageImages($node as node(), $model as map(*)) {
for $pb in $model(“work”)//tei:pb
let $facsPage := $pb/@facs
return
<p><a href=”../images/{$facsPage}”><img src=”../images/{$facsPage}” width=”100%”/></a></p>
};

Success!

ss34
Page images are now contained nicely in the sidebar!

As that worked like a such a charm, I wanted to try to move the source information elsewhere–possibly the sidebar, maybe a footer, who knows. For the moment, I went with sidebar, since I was on a roll. Here’s how I adapted it from above, putting it instead into app.xql, which is then called through templating:

declare function app:sources($node as node(), $model as map(*)) {
for $n in $model(“work”)//tei:imprint
return
<li>
{$n//tei:pubPlace}: {$n//tei:publisher}. {$n//tei:date}.
{
if ($n//tei:date/@type = “accessed”) then
<a href=”{$n//tei:extent}”>{$n//tei:extent}</a>
else
$n//tei:extent
}
{$n//tei:note}
</li>
};

As you can see, I was also able to streamline some of the code, making it more compact. I’m doing my happy dance right now, though you can’t see it.

Updating the search and display features

I’ve been pretty busy this last week on the database, though I haven’t committed a lot of it to this blog–I have, however, been updating the project on GitHub, which has been a lot easier to use now that I have a project that needs version control in this specific way. Most of my time has been spent taking things out, putting things in, tweaking the xql, reverting to the original, and starting over, in between posting to and reading around on the eXist forum, as I learn how to untangle the samples.

The most important things to note are 1.) I have pre-ordered the eXist O’Reilly book by Adam Retter and Erik Siegel–it should ship in November, if I’m lucky! 2.) I got the search query results to return and display correctly, and I’m learning how those features work. 3.) I put some page images into the Mary Hays file, and got them to display both in browse and in query return modes–that’s pretty exciting for me! It’s not an elegant solution, and I have a lot of work to do, but there’s a brute force version of it there now.

ss30
Narrow search results for ‘character’
ss31
Navigate to matched search results inside the Hays file
ss32
Sample page images inserted into the Hays file

Next up I think are some pretty big issues, I think–definitely more conceptual in nature. How should I display the page images, for instance? I want to keep the elements inline with the xml, but it may be more elegant to display them in a side bar or a footer, so as not to interrupt the flow of reading. The page images should be thumbnailed, with full sized images behind them. The big question here is how to automate this, which is likely also determined by how the images are stored. Another conceptual issue that’s arisen has to do with incorporating images as primary interests in the database–though this may be something to consider in a later iteration. Currently, the textual transcription and markup is the primary focus, with the page images functioning as secondary visual resources. But what if I were to incorporate paintings, or photographs of material objects? Even if I don’t want to do this here, it may be something that others would want, so considering it now is probably a god idea. I’ll also need to begin marking up major topics in each xml file, which will also mean making sure the structure overall is solid and that I have a good topic schema. Users should be able to at least see a list of topics marked up as such in each file, though ultimately I’ll want to incorporate that as a search feature, but I don’t want to have to re-do everything when I incorporate the mallet module, eventually. More details in the readme!