Working with Local CapiTainS XML File

Introduction

The class MyCapytain.resources.texts.locals.tei.Text requires the guidelines of Capitains to be implemented in your file.

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
#  We import the correct classes from the local module
from MyCapytain.resources.texts.local.capitains.cts import CapitainsCtsText
from MyCapytain.common.constants import Mimetypes, XPATH_NAMESPACES
from lxml.etree import tostring

#  We open a file
with open("./tests/testing_data/examples/text.martial.xml") as f:
    # We initiate a Text object giving the IO instance to resource argument
    text = CapitainsCtsText(resource=f)

# Text objects have a citation property
#  len(Citation(...)) gives the depth of the citation scheme
# in the case of this sample, this would be 3 (Book, Poem, Line)
for ref in text.getReffs(level=len(text.citation)):
    # We retrieve a Passage object for each reference that we find
    # We can pass the reference many way, including in the form of a list of strings
    # We use the _simple parameter to get a fairly simple object
    # Simple makes a straight object that has only the targeted node inside of it
    psg = text.getTextualNode(subreference=ref, simple=True)
    # We print the passage from which we retrieve <note> nodes
    print("\t".join([ref, psg.export(Mimetypes.PLAINTEXT, exclude=["tei:note"])]))

"""
You'll print something like the following :

    1.pr.1	Spero me secutum in libellis meis tale temperamen-
    1.pr.2	tum, ut de illis queri non possit quisquis de se bene
    1.pr.3	senserit, cum salva infimarum quoque personarum re-
    1.pr.4	verentia ludant; quae adeo antiquis auctoribus defuit, ut
    1.pr.5	nominibus non tantum veris abusi sint, sed et magnis.
    1.pr.6	Mihi fama vilius constet et probetur in me novissimum

"""

# It is possible that what you're interested in is a little more complex
# Like for example, getting a specific text sample with a specific reference
# In TEI !

#  We open another such as Cicero's texts !
with open("./tests/testing_data/examples/text.cicero.xml") as f:
    # We initiate a Text object giving the IO instance to resource argument
    text = CapitainsCtsText(resource=f)
    # We are specifically interest in the portion 28-30
    # Note that we won't use 28-30 as cross passage reference won't work properly
    p28_29 = text.getTextualNode("28-29")

    # And we want to be able to work with the xml
    # To be injected in a third party API for lemmatization purposes
    xml = p28_29.export(Mimetypes.XML.Std)
    print("XML of 28-29")
    print(xml)
    print("------------")

    # But what we really want to do, is suppress the note from the XML.
    # So we export to an LXML Object
    document = p28_29.export(Mimetypes.PYTHON.ETREE)
    # We remove some XML
    for element in document.xpath("//tei:note", namespaces=XPATH_NAMESPACES):
        element.getparent().remove(element)
    # And we print using LXML constants
    print("Clean XML of 28-29")
    print(tostring(document, encoding=str))
    print("------------")