Sample TEI Documents
This is a small collection of sample TEI files gathered from a variety of real-world TEI projects. They represent an extremely varied set of documents and encoding approaches, and may be useful as a way of getting ideas about how to encode specific features, and also as illustrations of the use of specific elements. When using these examples as models, bear in mind:
- The encoding demonstrated here arises from specific projects with their own distinct motivations that may not match your own. The level of detail and choice of elements reflects their priorities and interests. Visit the project sites and read their documentatation to find out more.
- The authors of these documents are normal human beings who make mistakes! so be forgiving, and don't take the encoding you see here as normative.
In Vitro Samples
These samples were encoded for training purposes by Melanie Chernyk, to demonstrate various features of TEI.
- Sample personography of Tudor monarchs: shows names, birth and death dates, floruit dates, and religious faith, as well as cause of death and some demonstration of certainty levels.
- Sample placeography of places mentioned in Isabella Whitney's "Last Will and Testament of Isabella Whitney": demonstrates alternate spellings of place names, geographic coordinates, and different categories of places.
- Selections from Palgrave's Golden Treasury: demonstrates front and back matter, nested <div> elements, indexes, editorial notes, and corrigenda, all with links into the text using <ref> and <ptr>.
- Sample manuscript page of a poem by John Keats: demonstrates additions, deletions, damage, doodles, gaps, supplied text, and the use of <sic> and <corr> to correct spelling errors. An image of the original is here.
Proust, À la recherche du temps perdu (TEI Tite)
This sample includes the first tome of the first edition of Proust's À la recherche du temps perdu (Paris: Gallimard, 1919, under the imprint of the Éditions de la Nouvelle Revue Française). It contains Du Côté de Chez Swann, Part I - "Combray" and Part II - "Un Amour de Swann." This sample is encoded with TEI Tite, a very simple and constrained TEI schema used for data capture by vendors. This sample was generously contributed by Jeff Drouin at the University of Illinois.
Women Writers Project
The Northeastern University Women Writers Project is a digital collection of early modern women's writing in English, specializing in detailed structural and content markup. Because no page images are provided, the markup also captures a fairly detailed representation of the appearance of the source text. The sample files included here represent several different genres, plus a taxonomy file containing a genre classification. The WWP uses a TEI customization that adds several WWP-specific elements (including <vuji>, <mw>, and <mcr>). These files also illustrate the use of XInclude, which is used to pull in (among other things) the taxonomy file into the TEI header.
- Ann Bacon, An Apology or Answer in Defence of the Church of England, 1564: demonstrates basic prose encoding, marginal notes, abbreviations and old-style typography, illegibility, non-Unicode characters.
- Elizabeth Bath, Poems on Several Occasions, 1806: demonstrates encoding of poetry.
- Hannah Cowley, Albina, A Tragedy, 1813: demonstrates encoding of drama.
- Taxonomy file: a simple example of a local taxonomy used to classify texts by genre. The contents of this file are pulled into the <classDecl> of each document's TEI header using XInclude.
The Swinburne Archive
The Swinburne Archive is a digital collection, or virtual archive, devoted to the life and work of Victorian poet Algernon Charles Swinburne. The files included here represent a fairly comprehensive picture of how XML can be used to represent a large and complex document, including a number of generated files that are used for visualizations of various kinds.
- The source XML: six volumes of Swinburne's 1904
collected Poems: Volume 1, Volume 2, Volume 3, Volume 4, Volume 5, Volume 6
- The Swinburne Chronology. On the Swinburne Archive website, this chronology is viewable as a
Simile timeline
or as a more conventional HTML page
Both are generated from this same TEI/XML file, with different XSLT stylesheets to produce the two
different views.
- A set of XML files that are used to produce visualizations, with more detailed markup of individual words. These poems are all part of Volume 3 (listed above). The encoding in these files was used to produce the
visualizations
here and
here.
- A zip file containing all of the Swinburne XML samples
Chicago Foreign Language Press Survey
The Newberry Library has been digitizing the Chicago Foreign Language Press Survey, which was a project of the Illinois Works Projects Administration between 1936 and 1941. The Survey selected and translated articles published between 1861 and 1938 in Chicago newspapers, covering twenty-two linguistic and ethnic groups in the city.
ECCO, Charles Macklin's King Henry VII: or the popish impostor
These samples demonstrate the evolution of TEI data from an early SGML version, through a P5 XML version, to a final version that has had morphosyntactic markup automatically added using Phil Burns' Morphadorner tool.
- SGML version created by the Text Creation Partnership: demonstrates basic structural markup including textual divisions, cast lists, dramatic speeches, page breaks.
- TEI P5 version produced by Brian Pytlik Zillig: very similar to the first example, but expressed as XML.
- A linguistically annotated version, created with Phil Burns' Morphadorner. In this version, every word is annotated with part-of-speech markup. Note that this version uses @reg and @part attributes for the <w> element that are not valid under P5.