CTS katalog zbirke CroALa

Što je CTS? http://www.homermultitext.org/hmt-doc/cite/texts/ctsoverview.html

Zašto želimo prilagoditi CroALa za CTS? Kako bismo mogli citirati dijelove tekstova u zbirci na ujednačen, računalno obradiv i (donekle) standardiziran način.

Kako to postići? Potrebni su nam CTS server (to još nemamo), tekstovi prikladni za CTS (toga imamo dovoljno) i CTS katalog.

Evo kako izgleda potonje.

Primjer unosa iz CTS kataloga

Opis zbirke

 <collection id="CroALa" isdefault="yes">
    <title xml:lang="lat">Croatiae auctores Latini - bibliotheca electronica</title>
    <title xml:lang="eng">Digitalna biblioteka hrvatskih latinista</title>
    <title xml:lang="eng">A Digital Collection of Croatian Authors Writing in Latin</title>
    <creator xmlns="http://purl.org/dc/elements/1.1/">Croatica et Tyrolensia</creator>
    <coverage xmlns="http://purl.org/dc/elements/1.1/" xml:lang="hrv">Primani i sekundarni izvori za istraživanje hrvatskog latinizma</coverage>
    <description xmlns="http://purl.org/dc/elements/1.1/" xml:lang="hrv">Primani i sekundarni izvori za istraživanje hrvatskog latinizma</description>
    <coverage xmlns="http://purl.org/dc/elements/1.1/" xml:lang="lat">Fontes primarii et secundarii Latinitati Croaticae investigandae</coverage>
    <description xmlns="http://purl.org/dc/elements/1.1/" xml:lang="lat">Fontes primarii et secundarii Latinitati Croaticae investigandae</description>
    <coverage xmlns="http://purl.org/dc/elements/1.1/" xml:lang="eng">Primary and secondary sources
      for the study of Croatian Latin</coverage>
    <description xmlns="http://purl.org/dc/elements/1.1/" xml:lang="eng">Primary and secondary
      sources for the study of Croatian Latin</description>
    <rights xmlns="http://purl.org/dc/elements/1.1/" xml:lang="hrv">Prava: Creative Commons licenca - imenovanje 4.0 međunarodna</rights>
    <rights xmlns="http://purl.org/dc/elements/1.1/" xml:lang="eng">This work is licensed under a Creative Commons Attribution 4.0 International License.</rights>
  </collection>

Pojedinačan unos, autor (= textgroup) Marulić, djelo (= work): pismo Jurju Šižgoriću. Ovo djelo ima samo jednu razinu citiranja - odmah nakon prvog div dolazimo do p i pojedinačnih riječi. (Adresa za docname je privremena, još ne funkcionira.)

<textgroup projid="croala:calmarul01">
    <groupname xml:lang="hrv">Marulić, Marko</groupname>
    <work projid="croala:marul-mar-epist-1477" xml:lang="lat">
      <title xml:lang="lat">Ad Georgium Sisgoreum epistula (1477)</title>
      <title xml:lang="hrv">Pismo Jurju Šižgoriću (1477)</title>
      <edition projid="croala:croala-lat1">
        <label xml:lang="lat">Ad Georgium Sisgoreum epistula (1477), versio electronica (Zagreb 2009 TEI XML)</label>
        <description xml:lang="lat">Ad Georgium Sisgoreum epistula (1477), versio electronica. Zagreb, Digitalizacija hrvatskih latinista, lipnja 2009.</description>
        <online docname="/db/repository/croala/calmarul01/marul-mar-epist-1477/calmarul01.marul-mar-epist-1477.croala-lat1.xml">
          <validate schema="tei_all.xsd"/>
          <namespaceMapping abbreviation="tei" nsURI="http://www.tei-c.org/ns/1.0"/>
          <citationMapping>
            <citation label="Epistula" xpath="/tei:div[@xml:id='?']" scope="/tei:TEI/tei:text/tei:body">
            </citation>
          </citationMapping>
        </online>
      </edition>
    </work>
  </textgroup>

Pojedinačan unos, textgroup Ilija Crijević, pismo Marinu Buniću. Ovo djelo ima dvije razine citiranja: pismo u cijelosti i dvije sekcije.

<textgroup projid="croala:calcrije02">
    <groupname xml:lang="hrv">Crijević, Ilija</groupname>
    <work projid="croala:crijev-i-epist-1504" xml:lang="lat">
      <title xml:lang="lat">Epistula ad Marinum Bonum (1504/1512)</title>
      <title xml:lang="hrv">Pismo Marinu Buniću (1504/1512)</title>
      <edition projid="croala:croala-lat1">
        <label xml:lang="lat">Epistula ad Marinum Bonum (1504/1512), versio electronica, (Zagreb 2012 TEI XML)</label>
        <description xml:lang="lat">Epistula ad Marinum Bonum (1504/1512), versio electronica. Zagreb, Digitalizacija hrvatskih latinista, lipnja 2012.</description>
        <online docname="/db/repository/croala/calcrije02/crijev-i-epist-1504/calcrije02.crijev-i-epist-1504.croala-lat1.xml">
          <validate schema="tei_all.xsd"/>
          <namespaceMapping abbreviation="tei" nsURI="http://www.tei-c.org/ns/1.0"/>
          <citationMapping>
            <citation label="Epistula" xpath="/tei:div[@xml:id='?']" scope="/tei:TEI/tei:text/tei:body">
              <citation label="Sectio" xpath="/tei:div[@n='?']" scope="/tei:TEI/tei:text/tei:body/tei:div[@xml:id='?']">
              </citation>
            </citation>
          </citationMapping>
        </online>
      </edition>
    </work>
  </textgroup>

Primjer opisa epike 1 (bez podataka)

<textgroup projid="croala:calbenes01">
	  <!-- Croatian name of author -->
  <groupname xml:lang="hrv">Beneša, Damjan</groupname>
  <!-- a list of his works in CroALa -->
  <work projid="croala:benesa-d-dmc" xml:lang="lat">
	  <!-- add here Latin (and Croatian, if possible) title of work -->
    <title xml:lang="lat"/>
    <!-- add here description of digital edition -->
    <edition projid="croala:croala-lat1">
      <label/>
      <description/>
      <!-- path to document, add after croala/ -->
      <online docname="/db/repository/croala/">
        <validate schema="tei_all.xsd">
          <namespaceMapping validation="tei" nsURI="http://www.tei-c.org/ns/1.0"/>
          <!-- how to get to words of document; express as XPaths -->
          <citationMapping>
            <citation/>
          </citationMapping>
        </validate>
      </online>
    </edition>
  </work>
</textgroup>

Primjer opisa epike 2 (s podacima)

Podatke prenosimo iz XML datoteke u zbirci croala-epica.

Koristimo se pomoćnim XPath upitima za dohvatanje individualnih riječi.

a) Element-roditelj svakoga text() elementa

Paste this expression into oXygen XPath 2.0 search field:

distinct-values(//text//*[text()]/name())

“Return names of all kinds of elements below TEI text (i. e. excluding the TEIHeader) which contain text() nodes as children.”

For the document benesa-d-dmc.xml, the XPath returns the following 14 values:

body
div
head
l
emph
hi
ref
seg
note
p
add
mentioned
num
closer

Of these, potentially quotable are: div, head, l, note, p, closer. Other elements are below l. For the six quotable elements we have to define XPaths. They can be found programmatically with an XQuery. In Beneša's document, the paths are:

TEI/text/body/div
TEI/text/body/div/div
TEI/text/body/div/div/head
TEI/text/body/div/div/l
TEI/text/body/div/div/l/note
TEI/text/body/div/div/l/note/p
TEI/text/body/div/div/closer
  • The “scope” (the path encompassing the work as a whole) would be TEI/text/body/div.
  • For individual cantos (books), the paths would be TEI/text/body/div/div.
  • All other nodes are descendants of cantos, therefore they are listed inside the citation label for cantos.
  • For cantos' titles, the paths are TEI/text/body/div/div/head.
  • Individual verses are quoted by TEI/text/body/div/div/l,
  • notes by TEI/text/body/div/div/l/note (the path TEI/text/body/div/div/l/note/p seems as if it can be left out, this should be checked in text)
  • Closers of cantos are reachable by TEI/text/body/div/div/closer.

This results in the following CTS citationMapping element:

<citationMapping>
  <citation label="Book" xpath="/tei:div[@n='?']" scope="/tei:TEI/tei:text/tei:body/tei:div">
    <citation label="Title" xpath="/tei:head[@n='?']" scope="/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='?']"/>
    <citation label="Line" xpath="/tei:l[@n='?']" scope="/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='?']">
      <citation label="Note" xpath="/tei:note[@n='?']" scope="/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='?']/tei:l[@n='?']"/>
    </citation>
    <citation label="Closer" xpath="/tei:closer[@n='?']" scope="/tei:TEI/tei:text/tei:body/tei:div/tei:div[@n='?']"/>
  </citation>
</citationMapping>

Obviously, each of listed elements should have an @n attribute. This is, however, often not the case, and it should be noted. It can be corrected with an XSL transformation.

Watch out especially for note elements, which belong to l (lines), which themselves belong to cantos (a citation: 2, 14, note 5 = note 5 in verse 14 of book 2).

b) Element-dijete najnižeg div elementa

To return names of all elements which are children of the lowest div, we use the following XPath:

distinct-values(//TEI/text//div[parent::*:div]/*/name())

In Beneša's document, the following names are returned:

head
milestone
l
closer

We know that milestone is an empty element (cannot contain text), so we get the paths for other three elements.

z/croala-cts.txt · Last modified: 2015/08/11 13:37 by njovanovic
 
Except where otherwise noted, content on this wiki is licensed under the following license: CC Attribution-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Run by Debian Driven by DokuWiki