Some notes on CTS

Neven Jovanović, Zagreb
http://orcid.org/0000-0002-9119-399X
Address of this page: http://solr.ffzg.hr/dokuwiki/doku.php/z:notes-on-cts

Use cases

How do we refer to passages in letters in these collections in a 'machine-actionable' way? (machine actionable = something we click on or an external program calls up via an API)

Existing persistent identifiers? URN:NBN (e. g. http://urn.fi/URN:ISBN:952-10-0093-7; Croatia is only developing them)? DOI (http://dx.doi.org/10.1093/past/142.1.94; how do I get it)? HANDLE system?

Questions: What to do with historical editions (or with their digital facsimiles)? How to go deeper and address certain parts of a digital object?

CTS

CTS, Canonical Text Service, is a scheme for accessing computationally (digital) texts and their manifestations and parts. In the words of its creators, Neel Smith and Christopher Blackwell,1) CTS is a service “for identifying texts and for retrieving fragments of texts by canonical reference expressed as CTS URNs”; it is at the same time “a framework for scholarly reference to the unique cultural phenomena that humanists study.”2)

When we need to refer to a passage in a digital (or digitised) document, we have three problems:

  1. network address (URL) of the document and passage may change (e. g. if the document is stored in a database which is updated, if the system is upgraded, if database is relocated elsewhere)
  2. how to delimit the passage that is being referred to (e. g. 'the third paragraph', or 'the second sentence in the third paragraph', or 'the first mention of that name in the third paragraph')
  3. do we refer to the text in general ('Montaigne's letter XIV'), or to a specific edition (or translation) of it ('letter XIV from The Essays of Montaigne translated by Charles Cotton, edited by William Hazlitt, London: Templeman, 1842').

CTS URNs refer to a passage of text in terms of two hierarchies. The first hierarchy identifies a text in a model similar to the conceptual model of the Functional Requirements for Bibliographic Records (FRBR).

CTS URNs organize works in text groups. Text groups have no direct parallel in FRBR, and do not have a predefined semantic range. Instead, they associate works, according to traditional citation practice, in groups with various meanings. The text group may reflect authorship (e.g., a work entitled The Adventures of Huckleberry Finn might belong to a group named “Mark Twain”), or may represent some other kind of corpus (e.g., a work numbered 1 belonging to a group named “Federalist Papers”). Within a text group, a CTS URN’s work is a conceptual entity, like the FRBR work: it is an abstract idea of the content expressed in all versions of a work, in the original language or in translation. The work may optionally be identified with increasing specificity as versions (translation or edition), or exemplars (individual physical copies). The CTS URN’s version corresponds to the “expression” in the FRBR model, while exemplars correspond to “items” in FRBR parlance.

The second hierarchy in a CTS URN refers to a passage expressed in a logical citation scheme. While the nature of this hierarchy depends on the specific work referred to by a CTS URN, many texts will fall into one of a few common citation schemes. Prose works might be cited by chapter and section, or book, chapter and section, for example, or poems might be cited by line, stanza and line, or book and line.

Syntax of a CTS URN

URNs always begin with the string urn: followed by a protocol identifier. We use the identifier cts for our protocol.

Colons separate the top-level elements of a CTS URN: any use of a semicolon as a data value must therefore be escaped. The top-level elements are:

  1. urn name space (required: always cts)
  2. cts namespace (required: a value that can be resolved to a unique URI)
  3. work identifier (required: a value registered in the designated registry)
  4. passage reference (optional)
  5. subreference (optional)

The general structure of a CTS URN is therefore

urn:cts:CTSNAMESPACE:WORK:PASSAGE:SUBREFERENCE?

Periods separate second-level hierarchical components of the work identifier and passage reference.

A mockup CTS URN for the third passage in the letter of F. T. Andreis (andreis02), written on 1570-02-02 (epistula15700202), in the Croatiae auctores Latini (CroALa) collection (croala-lat01):

urn:cts:croala:andreis02.epistula15700202.croala-lat01:3

Note: in an extension of CTS inspired by the Perseus Digital Library, the work description contains of a (human-readable) textgroup siglum (andreis02), a text siglum (epistula15700202) and an edition siglum (croala-lat01).

Note: a work can exist in multiple repositories (croala, xx, yy), in multiple languages (lat, hrv, eng, ita), in multiple editions (01, 02, 03).

Examples of CTS URNs used for the Perseus Digital Library:

http://data.perseus.org/citations/urn:cts:greekLit:tlg0007.tlg015.perseus-eng1:11.1 = Chapter 1, paragraph 1 of English translation of Alcibiades (tlg015) by Plutarch (tlg007), a work of Greek literature (greekLit)

http://data.perseus.org/texts/urn:cts:greekLit:tlg0007.tlg015.perseus-eng1 = a reference to the whole edition (text) of this translation (FRBR manifestation level)

http://data.perseus.org/texts/urn:cts:greekLit:tlg0007.tlg015 = a reference to Plutarch's Alcibiades as a work (FRBR work level)

Implementations of CTS as protocol for accessing digital texts

Tree model

Directed graph

Relational model

Initiatives and research projects using CTS


1) Neel Smith, College of the Holy Cross, Worcester MA: http://shot.holycross.edu/nsmith/; Christopher Blackwell, Furman University, Greenville, South Carolina: http://www.furman.edu/academics/classics/about/pages/facultyandstaff.aspx.
2) “The CITE architecture”: http://www.homermultitext.org/hmt-doc/cite/index.html, accessed March 16, 2015
z/notes-on-cts.txt · Last modified: 2015/03/23 15:15 by njovanovic
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Run by Debian Driven by DokuWiki