Text Encoding Fundamentals and their
Application
Advanced Encoding Exercise
Once you've completed the basic exercises and mastered the
essentials of text encoding, you need a more substantial project
to work on. For most of this workshop, you will be focusing on
transcribing a set of actual documents, taken either from publicly
available collections or from materials of your own. By the end of
the workshop, you should have completed the encoding of several
sample documents, transformed them with a simple stylesheet, and
created a simple TEI customization to help you represent their
structure.
If you have brought materials of your own to work with, by all
means use them. If you haven't, then choose one of the following
publicly available collections to work with (or another that suits
you better). In each case, you should select from the collection a
small set of documents to work with, preferably short ones so that
you have a manageable task!
Periodicals
- Harper's
Bazaar, from the Making of America website (could choose any
one of these many periodicals)
Document Analysis
The first stage of any encoding project is document analysis, and for this we will take some time for thought and discussion, using a form that provides questions to guide your analysis.
Document Skeleton
Once you have done an initial analysis of your chosen documents and know what they contain, the next step is to create a simple encoded skeleton showing the basic structure of the document and the essential components of the encoding. This process allows you to model the document in a preliminary way before you actually start typing much of the content.
To produce your skeleton, we suggest the following:
- Open a new copy of either TEIworkshop/documents/template_extensive.xml
or TEIworkshop/documents/template_basic.xml. If
you don't already have these files, they are available here.
- Ignore the TEI header for now; inside the text element, insert a basic document structure that matches the structure of your sample document. Include elements for the front matter, body, and back matter of the text, if these are present. Then inside each one, insert elements representing the subdivisions of the text.
- Once you have the basic organizational structure of the text encoded in this way, choose a representative sample section and insert the elements that you will need to represent the lower-level structures of the document. Pay attention to things like paragraphs, quotations, images, lists, headings, and other features. Transcribe the first few words of each feature: this will help you validate your document.
- Once you have a sample section roughly sketched in this way, validate your document. Fix any errors; you may need to insert additional elements to make the document valid.
- Look at the rest of your sample and see whether there are other sections that include features you haven't yet covered; encode these as well.
- Once you have a valid skeleton for your document, begin transcribing the content in full. This process will probably reveal additional, local features that you may wish to encode, such as names, dates, spelling modernization, foreign-language words, and so forth. Save and validate often.
- Once you have completed one document sample, look over your other sample documents and see whether there are any that include very different features; if so, encode another sample.
Sample TEI header
Once we have gone over the details of the TEI header, the next step is for you to add detailed TEI headers to each of your sample documents. First think about the kinds of information it will be natural to include in your header, based on the choices you expressed in your document analysis and the needs revealed by your document skeletons: do you need information about foreign languages? do you have complex bibliographic information to represent? details of editorial and transcription practice? Then:
- Start with the file description and fill in the essential details about your electronic publication and your source.
- Next complete the profile description, paying particular attention to any essential documentation of languages and handwriting.
- Next, if you have time, take a look at the encoding description and consider whether the nature of your collection and approach warrants providing detailed information about your editorial and transcriptional practices, using the editorialDecl. You may also want to think about providing topic keywords for your document (again, depending on the functional goals you identified in your document analysis).
- Finally, fill in at least one entry in the revision description, just to get in the habit.
Schema Customization
The TEI schema customization exercise has its own instructions. We will be going over these in detail during class, and then giving you an opportunity to create a simple customized TEI schema that can be used to validate and constrain your sample documents.
Stylesheets
In a short, intensive course like this one there is not time to do justice to stylesheets and other publication tools; our treatment of XSLT and CSS is intended just to show you the kinds of things these tools can do. For this exercise, we will provide a few simple XSLT stylesheets that do some very basic things. We'll go over how to invoke them and how to make simple modifications, so that you can experiment on your own.