MyCapytain’s Main Objects Explained

Exportable Parent Classes

Description

MyCapytain.common.constants.Exportable

The Exportable class is visible all across the library. It provides a common, standardized way to retrieve in an API fashion to what can an object be exported and to exports it. Any exportable object should have an EXPORT_TO constant variable and include a __export__(output, **kwargs) methods if it provides an export type.

Example

The following code block is a mere example of how to implement Exportable and what are its responsibilities. Exportabletypically loops over all the parents class of the current class until it find one exportable system matching the required one.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
from MyCapytain.common.constants import Exportable, Mimetypes


class Sentence(Exportable):
    """ This class represent a Sentence

    :param content: Content of the sentence
    """
    # EXPORT_TO is a list of Mimetype the object is capable to export to
    EXPORT_TO = [
        Mimetypes.PLAINTEXT, Mimetypes.XML.Std
    ]
    DEFAULT_EXPORT = Mimetypes.PLAINTEXT

    def __init__(self, content):
        self.content = content

    def __export__(self, output=None, **kwargs):
        """ Export the collection item in the Mimetype required.

        :param output: Mimetype to export to (Uses MyCapytain.common.utils.Mimetypes)
        :type output: str
        :return: Object using a different representation
        """
        if output == Mimetypes.PLAINTEXT:
            return self.content
        elif output == Mimetypes.XML.Std:
            return "<sentence>{}</sentence>".format(self.content)


class TEISentence(Sentence):
    """ This class represent a Sentence but adds some exportable accepted output

    :param content: Content of the sentence
    """
    EXPORT_TO = [
        Mimetypes.JSON.Std
    ]

    def __export__(self, output=None, **kwargs):
        """ Export the collection item in the Mimetype required.

        :param output: Mimetype to export to (Uses MyCapytain.common.utils.Mimetypes)
        :type output: str
        :return: Object using a different representation
        """
        if output == Mimetypes.JSON.Std:
            return {"http://www.tei-c.org/ns/1.0/sentence": self.content}
        elif output == Mimetypes.XML.Std:
            return "<sentence xmlns=\"http://www.tei-c.org/ns/1.0\">{}</sentence>".format(self.content)


s = Sentence("I love Martial's Epigrammatas")
print(s.export(Mimetypes.PLAINTEXT))
# I love Martial's Epigrammatas
print(s.export())  # Defaults to PLAINTEXT
# I love Martial's Epigrammatas
print(s.export(Mimetypes.XML.Std))
# <sentence>I love Martial's Epigrammatas</sentence>

tei = TEISentence("I love Martial's Epigrammatas")
print(tei.export(Mimetypes.PLAINTEXT))
# I love Martial's Epigrammatas
print(tei.export())  # Defaults to PLAINTEXT
# I love Martial's Epigrammatas
print(tei.export(Mimetypes.JSON.Std))
# {"http://www.tei-c.org/ns/1.0/sentence": I love Martial's Epigrammatas}
print(tei.export(Mimetypes.XML.Std))  # Has been rewritten by TEISentence
# <sentence xmlns="http://www.tei-c.org/ns/1.0">I love Martial's Epigrammatas</sentence>
try:
    print(tei.export(Mimetypes.XML.RDF))
except NotImplementedError as error:
    print(error)
# Raise the error and prints "Mimetype application/rdf+xml has not been implemented for this resource"

Retrievers

MyCapytain.retrievers.prototypes.API

Description

Retrievers are classes that help build requests to API and return standardized responses from them. There is no real perfect prototypes. The only requirements for a Retriever is that its query function should returns string only. It is not the role of the retrievers to parse response. It is merely to facilitate the communication to remote API most of the time.

Recommendations

For Textual API, it is recommended to implement the following requests

  • getTextualNode(textId[str], subreference[str], prevnext[bool], metadata[bool])
  • getMetadata(objectId[str], **kwargs)
  • getSiblings(textId[str], subreference[str])
  • getReffs(textId[str], subreference[str], depth[int])

Example of implementation : CTS 5

MyCapytain.retrievers.cts5.CTS

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
from MyCapytain.retrievers.cts5 import CTS

# We set up a retriever which communicates with an API available in Leipzig
retriever = CTS("http://cts.dh.uni-leipzig.de/api/cts/")
# We require a passage : passage is now a Passage object
passage = retriever.getPassage("urn:cts:latinLit:phi1294.phi002.perseus-lat2:1.1")
# Passage is now equal to the string content of http://cts.dh.uni-leipzig.de/api/cts/?request=GetPassage&urn=urn:cts:latinLit:phi1294.phi002.perseus-lat2:1.1
print(passage)

"""
<GetPassage><request><requestName>GetPassage</requestName><requestUrn>urn:cts:latinLit:phi1294.phi002.perseus-lat2:1.1</requestUrn></request>
<reply><urn>urn:cts:latinLit:phi1294.phi002.perseus-lat2:1.1</urn><passage><TEI>
<text n="urn:cts:latinLit:phi1294.phi002.perseus-lat2" xml:id="stoa0045.stoa0"><body>
<div type="edition" n="urn:cts:latinLit:phi1294.phi002.perseus-lat2" xml:lang="lat">
<div type="textpart" subtype="book" n="1"><div type="textpart" subtype="poem" n="1">
<head>I</head>
<l n="1">Hic est quem legis ille, quem requiris, </l>
<l n="2">Toto notus in orbe Martialis </l>
<l n="3">Argutis epigrammaton libellis: <pb/></l>
<l n="4">Cui, lector studiose, quod dedisti </l>
<l n="5">Viventi decus atque sentienti, </l>
<l n="6">Rari post cineres habent poetae. </l>
</div></div></div></body></text></TEI></passage></reply>
"""

Text and Passages

Description

Hierarchy

The generic idea of both Text and Passage’s classes is that they inherit from a longer trail of text bearing object that complexified over different features. The basic is

  • TextualElement is an object which can bear Metadata and Collection information. It has a .text property and is exportable
  • TextualNode inherits from NodeId and unlike TextualElement, TextualNode is part of a graph of CitableObject. It bears informations about its siblings, parents, children.
  • TextualGraph is a bit interactive : you can query for children nodes and get descendant references of the object.
  • InteractiveTextualNode is completely interative . You can browse the graph by accessing the .next property for example : it should then return an InteractiveTextualNode as well
  • CTSNode has two unique methods more as well as a urn property.
  • From CTSNode we find CitableText and Passage, which represents complete and portion of a Text. The main difference is that CitableText has no parents, no siblings.

Objectives

Text and Passages object have been built around InteractiveTextualNode which fills the main purpose of MyCapytain :being able to interact with citable, in-graph texts that are retrieve through web API or local files. Any implementation should make sure that the whole set of navigation tool are covered. Those are :

Tree Identifiers(Returns str Identifiers) Tree Navigations (Returns InteractiveTextualNode or children class) Retrieval Methods Other
prevId prev .getTextualNode(subreference) id : TextualNode Identifier [str]
nextId nextId .getReffs(subreference[optional]) metadata : Metadata informations [Metadata]
siblingsId [tuple[str]] siblings [tuple[InteractiveTextualNode]]   about : Collection Information [Collection]
parentId parent   citation : Citation Information [Citation]
childIds [list[str]] children [list[InteractiveTextualNode]]   text : String Representation of the text without annotation
firstId first   .export()
lastId last    

The encodings module

The encoding module contains special implementations : they technically do not support interactive methods but provides generic parsing and export methods for specific type of contents such as TEI XML object or other formats such as json, csv, treebank objects in the future.

The TEIResource for example requires the object to be set up with a resource parameters that will be furtherparsed using lxml. From there, it provides export such as plain/text, TEI XML, nested dictionaries or even anlxml etree interface.

Implementation example : HTTP API Passage work

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
from MyCapytain.retrievers.cts5 import CTS
from MyCapytain.resources.texts.api.cts import Text

# We set up a retriever which communicates with an API available in Leipzig
retriever = CTS("http://cts.dh.uni-leipzig.de/api/cts/")

# Given that we have other examples that shows how to work with text,
# we will focus here on playing with the graph functionality of texts implementations.
# We are gonna retrieve a text passage and the retrieve all its siblings in different fashion#
# The main point is to find all children of the same parent.
# The use case could be the following : some one want to retrieve the full text around a citation
# To enhance the display a little.

# We will work with the line 7 of poem 39 of book 4 of Martial's Epigrammata
# The text is urn:cts:latinLit:phi1294.phi002.perseus-lat2
text = Text(retriever=retriever, urn="urn:cts:latinLit:phi1294.phi002.perseus-lat2")

# We retrieve up the passage
target = text.getTextualNode(subreference="4.39.7")
print(target.text)
"""
Nec quae Callaico linuntur auro,
"""

# The parent way :
# - get to the parent,
# - retrieve each node,
# - print only the one which are not target

parent = target.parent
for node in parent.children:
    if node.id != target.id:
        print("{}\t{}".format(node.id, node.text))
    else:
        print("------Original Node-----------")

"""
4.39.1	Argenti genus omne comparasti,
4.39.2	Et solus veteres Myronos artes,
4.39.3	Solus Praxitelus manum Scopaeque,
4.39.4	Solus Phidiaci toreuma caeli,
4.39.5	Solus Mentoreos habes labores.
4.39.6	Nec desunt tibi vera Gratiana,
------Original Node-----------
4.39.8	Nec mensis anaglypta de paternis.
4.39.9	Argentum tamen inter omne miror
4.39.10	Quare non habeas, Charine, purum.
"""

print("\n\nSecond Method\n\n")

# We are gonna do another way this time :
# - get the previous until we change parent
# - get the next until we change parent

parentId = node.parentId
# Deal with the previous ones
current = target.prev
while current.parentId == parentId:
    print("{}\t{}".format(current.id, current.text))
    current = current.prev

print("------Original Node-----------")

# Deal with the next ones
current = target.next
while current.parentId == parentId:
    print("{}\t{}".format(current.id, current.text))
    current = current.next
"""
4.39.6	Nec desunt tibi vera Gratiana,
4.39.5	Solus Mentoreos habes labores.
4.39.4	Solus Phidiaci toreuma caeli,
4.39.3	Solus Praxitelus manum Scopaeque,
4.39.2	Et solus veteres Myronos artes,
4.39.1	Argenti genus omne comparasti,
------Original Node-----------
4.39.8	Nec mensis anaglypta de paternis.
4.39.9	Argentum tamen inter omne miror
4.39.10	Quare non habeas, Charine, purum.

"""

Other Example

See MyCapytain.local

Collection

Description

Collections are the metadata containers object in MyCapytain. Unlike other object, they will never contain textual content such as Texts and Passages but will in return help you browse through the catalog of one APIs collection and identify manually or automatically texts that are of relevant interests to you.

The main informations that you should be interested in are :

  • Collections are children from Exportable. As of 2.0.0, any collection can be exported to JSON DTS.
  • Collections are built on a hierarchy. They have children and descendants
  • Collections have identifiers and title (Main name of what the collection represents : if it’s an author, it’s her name, a title for a book, a volume label for a specific edition, etc.)
  • Collections can inform the machine if it represents a readable object : if it is readable, it means that using its identifier, you can query for passages or references on the same API.

Main Properties

  • Collection().id : Identifier of the object
  • Collection().title : Title of the object
  • Collection().readable : If True, means that the Collection().id can be used in GetReffs or GetTextualNode queries
  • Collection().members : Direct children of the object
  • Collection().descendants : Direct and Indirect children of the objects
  • Collection().readableDescendants : Descendants that have .readable as True
  • Collection().export() : Export Method
  • Collection().metadata : Metadata object that contain flat descriptive localized informations about the object.

Implementation : CTS Collections

Note

For a recap on what Textgroup means or any CTS jargon, go to http://capitains.github.io/pages/vocabulary

CTS Collections are divided in 4 kinds : TextInventory, TextGroup, Work, Text. Their specificity is that the hierarchyof these objects are predefined and always follow the same order. They implement a special export (MyCapytain.common.constants.Mimetypes.XML.CTS) which basically exports to the XML Text Inventory Formatthat one would find making a GetCapabilities request.

CapiTainS CTS Collections implement a parents property which represent a list of parents where .parents’ order is equalto Text.parents = [Work(), TextGroup(), TextInventory()]).

Their finale implementation accepts to parse resources through the resource= named argument.

Diagram of collections prototypes

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
from MyCapytain.retrievers.cts5 import CTS
from MyCapytain.resources.collections.cts import TextInventory, Work
from MyCapytain.common.constants import Mimetypes
from pprint import pprint

"""
In order to have a real life example,
we are gonna query for data in the Leipzig CTS API
We are gonna query for metadata about Seneca who
is represented by urn:cts:latinLit:stoa0255

To retrieve data, we are gonna make a GetMetadata query
to the CTS Retriever.
"""
retriever = CTS("http://cts.dh.uni-leipzig.de/api/cts/")
# We store the response (Pure XML String)
response = retriever.getMetadata(objectId="urn:cts:latinLit:stoa0255")

"""
From here, we actually have the necessary data, we can now
play with collections. TextInventory is the main collection type that is needed to
parse the whole response.
"""
inventory = TextInventory(resource=response)
# What we are gonna do is print the title of each descendant :
for descendant in inventory.descendants:
    # Metadatum resolve any non-existing language ("eng", "lat") to a default one
    # Putting default is just making that clear
    print(descendant.title["default"])

"""
You should see in there things such as
-   "Seneca, Lucius Annaeus" (The TextGroup or main object)
-   "de Ira" (The Work object)
-   "de Ira, Moral essays Vol 2" (The Edition specific Title)

We can now see other functions, such as the export to JSON DTS.
Collections have a unique feature built in : they allow for
accessing an item using its key as if it were a dictionary :
The identifier of a De Ira is urn:cts:latinLit:stoa0255.stoa0110
"""
deIra = inventory["urn:cts:latinLit:stoa0255.stoa010"]
assert isinstance(deIra, Work)
pprint(deIra.export(output=Mimetypes.JSON.DTS.Std))
# you should see a DTS representation of the work

"""
What we might want to do is to browse metadata about seneca's De Ira
Remember that CTSCollections have a parents attribute !
"""
for descAsc in deIra.descendants + [deIra] + deIra.parents:
    # We filter out Textgroup which has an empty Metadata value
    if not isinstance(descAsc, TextInventory):
        print(
            descAsc.metadata.export(output=Mimetypes.JSON.Std)
        )
"""
And of course, we can simply export deIra to CTS XML format
"""
print(deIra.export(Mimetypes.XML.CTS))

Resolvers

Description

Resolvers were introduced in 2.0.0b0 and came as a solution to build tools around Text Services APIs where you can seamlessly switch a resolver for another and not changing your code, join together multiple resolvers, etc. The principle behind resolver is to provide native python object based on API-Like methods which are restricted to four simple commands :

  • getTextualNode(textId[str], subreference[str], prevnext[bool], metadata[bool]) -> Passage
  • getMetadata(objectId[str], **kwargs) -> Collection
  • getSiblings(textId[str], subreference[str]) -> tuple([str, str])
  • getReffs(textId[str], subreference[str], depth[int]) -> list([str])

These function will always return objects derived from the major classes, i.e. Passage and Collection for the two firsts and simple collections of strings for the two others. Resolvers fills the hole between these base objects and the original retriever objects that were designed to return plain strings from remote or local APIs.

The base functions are represented in the prototype, and only getMetadata might be expanded in terms of arguments depending on what filtering can be offered. Though, any additional filter has not necessarily effects with other resolvers.

Historical Perspective

The original incentive to build resolvers was the situation with retrievers, in the context of the Nautilus API and Nemo UI : Nemo took a retriever as object, which means that, based on the prototype, Nemo was retrieving string objects. That made sense as long as Nemo was running with HTTP remote API because it was actually receiving string objects which were not even (pre-)processed by the Retriever object. But since Nautilus was developed (a fully native python CTS API), we had the situation where Nemo was parsing strings that were exported from python etree objects by Nautilus which parsed strings.

Diagram of operations before resolvers : there is duplication of processing

Introducing Resolvers, we managed to avoid this double parsing effect in any situation : MyCapytain now provides a default class to provide access to querying text no matter what kind of transactions there is behind the Python object. At the same time, Resolvers provide a now unified system to retrieve texts independently from the retrieverstandard type (CTS, DTS, Proprietary, etc.).

Diagram of operations with resolvers : duplicated steps have been removed

Prototype

class MyCapytain.resolvers.prototypes.Resolver[source]

Resolver provide a native python API which returns python objects.

Initiation of resolvers are dependent on the implementation of the prototype

getMetadata(objectId=None, **filters)[source]

Request metadata about a text or a collection

Parameters:
  • objectId (str) – Object Identifier to filter on
  • filters (dict) – Kwargs parameters.
Returns:

Collection

getReffs(textId, level=1, subreference=None)[source]

Retrieve the siblings of a textual node

Parameters:
  • textId (str) – Text Identifier
  • level (int) – Depth for retrieval
  • subreference (str) – Passage Reference
Returns:

List of references

Return type:

[str]

getSiblings(textId, subreference)[source]

Retrieve the siblings of a textual node

Parameters:
  • textId (str) – Text Identifier
  • subreference (str) – Passage Reference
Returns:

Tuple of references

Return type:

(str, str)

getTextualNode(textId, subreference=None, prevnext=False, metadata=False)[source]

Retrieve a text node from the API

Parameters:
  • textId (str) – Text Identifier
  • subreference (str) – Passage Reference
  • prevnext (boolean) – Retrieve graph representing previous and next passage
  • metadata (boolean) – Retrieve metadata about the passage and the text
Returns:

Passage

Return type:

Passage

Example

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
from MyCapytain.resolvers.cts.api import HttpCTSResolver
from MyCapytain.retrievers.cts5 import CTS
from MyCapytain.common.constants import Mimetypes, NS

# We set up a resolver which communicates with an API available in Leipzig
resolver = HttpCTSResolver(CTS("http://cts.dh.uni-leipzig.de/api/cts/"))
# We require a passage : passage is now a Passage object
# This is an entry from the Smith Myth Dictionary
# The inner methods will resolve to the URI http://cts.dh.uni-leipzig.de/api/cts/?request=GetPassage&urn=urn:cts:pdlrefwk:viaf88890045.003.perseus-eng1:A.abaeus_1
# And parse it into interactive objects
passage = resolver.getTextualNode("urn:cts:pdlrefwk:viaf88890045.003.perseus-eng1", "A.abaeus_1")
# We need an export as plaintext
print(passage.export(
    output=Mimetypes.PLAINTEXT
))
"""
    Abaeus ( Ἀβαῖος ), a surname of Apollo
     derived from the town of Abae in Phocis, where the god had a rich temple. (Hesych. s. v.
     Ἄβαι ; Hdt. 8.33 ; Paus. 10.35.1 , &c.) [ L.S ]
"""
# We want to find bibliographic information in the passage of this dictionary
# We need an export as LXML ETREE object to perform XPath
print(
    passage.export(
        output=Mimetypes.PYTHON.ETREE
    ).xpath(".//tei:bibl/text()", namespaces=NS, magic_string=False)
)
["Hdt. 8.33", "Paus. 10.35.1"]