The Association for Documentary Editing

Committee on Electronic Standards


«back to About Documentary Editing

A Discussion of Data Formats

Approved: October 2002

Defining Electronic Editions | Minimum Standards | Discussion of Data Formats

Discussion of Data Formats

Once an accurate text has been established and prepared, one of the most important decisions the editor faces is the type of text format to use for the edition. The format selected will, in many cases, determine the level at which the editor can provide access to his/her texts.

Editions can range in format and capability widely, and it is for each editor to determine how much value they can add to the texts. Considerations of staff time and expertise, funding, and capabilities of publishers or service providers must be taken into account when selecting a format, as must larger concerns about the longevity of the edition and its usability into the future. Print and microform editions have been designed, from both a content and technical aspect to stand a reference works that will outlast more ephemeral works. The challenge for editors of electronic texts is to come as close as possible to that standard.

Editors ought to carefully weigh the pros and cons of the available formats before selecting the one to use, and if an editor selects a format which may not have the longevity of others, he or she should make plans to create an "archival" version of their data which could be used to convert or upgrade the edition if the selected software used becomes obsolete.

At this point the ADE makes distinctions between two types of electronic editions, one which provides electronic access to the edited texts, but little else (Level 1) and another where the use of text encoding offers capabilities beyond what we can offer in traditional print and microform editions (Level 2). While the ADE prefers the creation of Level 2 electronic editions, it recognizes the fact that making available Level 1 editions of high quality content is certainly preferable to not creating an electronic edition at all.

Level 1 Electronic Editions

At its most basic, a Level 1 electronic edition might simply be a PDF version of an existing print volume. Attention alone to textual quality and ability to search the volume on-line would meet the ADE's minimum standards, though the experience of using this volume would not be appreciably different than one might have reading the original printed version. The major benefit of this kind of electronic edition is increased access to the documents.

Another example of a Level 1 edition might be one that uses a commercial software package to allow users to make their own notes, highlights and comments in the text, without altering the texts themselves.

Another example of a Level 1 edition might be an HTML translation of a set of documents, using hyperlinks to attach individual documents to a table of contents, and annotation to the document texts. Such an edition can offer a manually hyperlinked index, and can mimic the format and readability of a microform or print edition. Again, provided that textual standards were maintained and described in introductory material, such an edition would meet the ADE's minimum standards.

The advantage to creating Level 1 editions is that the technical requirements and knowledge required to do so are minimal. Converting wordprocessing files to PDF format or simple HTML coding can be readily learned by most editors. We can use the same kinds of typography used in print editions to render complex texts (for example, the use of brackets to demarcate editorial insertions, arrows or other characters to denote insertions) and provide keys to understanding abbreviations just as we do in print editions. Most users have browsers capable of reading such editions, and there is a fairly good chance that as long as the editions are maintained, and upgraded when major software revisions are announced, they will have a long shelf life.

The drawbacks of these editions lie not so much in what they do, which is not largely different than what we do in print editions, but in how little more they can give the reader, given the capabilities of electronic text. Because PDF and HTML are formats which chiefly display text, but do not have the capacity to analyze or describe it, they cannot easily yield the kinds of in-depth textual analysis that other formats do.

An example is the capabilities of searching in an HTML edition. Using regular web browsing software to search the texts of editions will quickly return pages that contain the text in question. But "hits" from such a search will be literal, that is only locating matches when the text exactly matches the query. Users cannot differentiate between documents where the word is used by the editors, in annotation or headnote material and where it is used by the document's author. Likewise, creating complex queries, such as searches for all documents which mention a given word, between a date span, and sorted in date order rather than by the number of "hits" is also difficult to create, because the date of the document is not coded differently than any other number in the text.

In some cases, commercially available proprietary software allows additional features which, if critical to the edition's purpose, may outweigh the limitations of the data format. Each editor needs to weigh the pros and cons in relation to the edition he or she is creating and they ways in which they want to distribute and preserve it.

Level 2 Electronic Editions

A Level 2 electronic edition is one in which the electronic text has been altered by the addition of descriptive mark-up. Such editions allow the users to take full advantage of the power of electronic text by enabling advanced searches, textual analysis, and sophisticated display abilities. In order to construct these editions, editors must use a set of mark-up rules, known as DTDs (Document Type Descriptions) which are readable by an increasing number of software programs. Marked-up documents capture information about more than just the display features of the documents (such as underlining, italics, or larger or smaller text), they also describe the reasons why data is displayed differently (for example, the fact that one word which is italicized is done so because it is a title, while another is in italics for emphasis). Mark-up also enables the editor to describe the structure of the document, which enables readers to specify the parts of the document they want to search (such as the document text instead of the editorial surround), but also enables the editor to display documents in a consistent and flexible manner. Finally mark-up enables the editor to create and link descriptive material to the text that will enhance searches and make the documents richer. For example, an editor can provide the correct spelling of a misspelled word in the mark-up, and while this data may not appear on screen, it will be used (if desired) when searching the edition. Similarly, names of people, organization, places and other proper nouns can be fully identified in the mark-up, even if the document refers to them in cryptic fashion.

The level of mark-up will vary from project to project, based on the editor's goals. A literary edition of poetry will certainly wish to maintain line breaks, while a historical edition may be uninterested in even where the pages on the original document have broken. One edition may mark-up each word in the edition according to its part of speech, while others might focus on providing a subject index to the documents. In all cases when using mark-up, the editor must include a guide to what they have done and why.

Editors creating Level 2 electronic editions need to determine which DTD they will use and include that information in their electronic edition. The ADE recommends use of the Model Editions Partnership DTD when applicable, because it has been designed specifically for historical editions. Available for both XML and SGML, the MEP DTD is drawn from the Text Encoding Initiative's larger DTD, which should meet the needs of editors who cannot, because of the nature of their text, use the MEP DTD.

The drawbacks of creating Level 2 electronic editions, is that they require additional staff training or the hiring of consultants to help to turn the editor's vision into a workable mark-up scheme. Specialized software is required for creating the marked-up texts, although word processing programs have begun offering SGML and XML tools. Getting a handle on the differing kinds of mark-up that can be used and how to implement it can take some time.

Once a project's electronic texts are created and marked-up, search engines are needed in order to make full use of the capabilities of the edition. Some software products, such as DynaWeb and Panorama are specifically designed for displaying SGML text, but these also require advanced training in order to create fluid and user-friendly web sites.

The ADE recommends the creation of Level 2 electronic editions, because it believes that they will be of greater longevity and usefulness.