The smallest unit of tokenization in the OGR is the Segment (seg
). The Segments are not intended to provide
a phonetic transcription of the text but instead a consistent and phonemically-informed interpretation
of the spelling, which in these early texts is broadly phonemic. The segmental annotation respects the following
principles:
More extensive documentation of the segmental annotation will be found in published scientific presentations of the corpus.
Segment-level annotation is contained in three core tags:
seg_plus
and seg_minus
: two comma-separated lists of alphabetically-ordered phonological features, which are specified as
“+” or “-” respectively for the Segment. The more features included on these lists, the more specified the segment.seg_phoneme
: a user-friendly symbol denoting a particular matrix of phonological features. IPA symbols are generally
used with their standard value. Underspecified segments are represented by non-IPA symbols or capitals.It is recommended to use the seg_plus
and seg_minus
tags when searching for classes of sounds rather than
attempting to list possible seg_phonemes
, e.g.
seg_minus=/.*cons.*/
onc="C" _i_ seg_plus=/.*nas.*/
word _r_ seg_minus=/.*son.*/
seg_plus=/.*CORONAL.*/ _=_ seg_minus=/.*cont,.*son,.*strident.*/ & word _r_ #1
In the TXM version of the corpus, Segment-level annotation is available through the word-level phon
tag, which concatenates all the seg_phonemes
in the word as a single string. The features in seg_plus
and
seg_minus
are not available.
Symbol | IPA | cons | son | nas | LABIAL | round | DORSAL | high | low | back | atr | voice |
---|---|---|---|---|---|---|---|---|---|---|---|---|
V | - | + | ||||||||||
U | - | + | - | + | + | + | + | - | + | + | ||
O | - | + | - | + | + | + | - | + | + | |||
u | u,w | - | + | - | + | + | + | + | - | + | + | + |
o | o,u | - | + | - | + | + | + | - | + | + | + | |
ɔ | ɔ | - | + | - | + | + | + | - | - | + | - | + |
y | y | - | + | - | + | + | + | + | - | - | + | + |
i | i,j | - | + | - | + | + | - | - | + | + | ||
Æ | - | + | - | + | - | - | + | |||||
E | - | + | - | + | - | - | - | + | ||||
e | e | - | + | - | + | - | - | - | + | + | ||
ɛ | ɛ | - | + | - | + | - | - | - | - | + | ||
ə | ə | - | + | - | + | + | ||||||
A | - | + | - | + | - | + | - | + | ||||
a | a | - | + | - | + | - | + | - | - | + | ||
æ | æ | - | + | - | + | - | + | - | + | + |
Symbol | IPA | cons | son | cont | strident | lat | nas | LABIAL | CORONAL | ant | dist | DORSAL | back | LARYNGEAL | voice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
C | C | + | |||||||||||||
h | h | + | - | - | - | + | - | ||||||||
Z | + | - | - | - | + | ||||||||||
D | + | - | - | - | - | + | + | ||||||||
T | + | - | - | - | - | - | + | + | |||||||
t | t | + | - | - | - | - | - | + | + | - | |||||
d | d | + | - | - | - | - | - | + | + | + | |||||
Ç | + | - | - | + | - | - | + | + | - | ||||||
ʦ | ʦ | + | - | - | + | - | - | + | + | - | - | ||||
ʣ | ʣ | + | - | - | + | - | - | + | + | - | + | ||||
Č | + | - | - | + | - | - | + | - | + | ||||||
ʧ | ʧ | + | - | - | + | - | - | + | - | + | - | ||||
ʤ | ʤ | + | - | - | + | - | - | + | - | + | + | ||||
S | + | - | + | + | - | - | + | + | - | ||||||
s | s | + | - | + | + | - | - | + | + | - | - | ||||
z | z | + | - | + | + | - | - | + | + | - | + | ||||
ð | ð,θ | + | - | + | - | - | - | + | + | + | |||||
G | + | - | - | - | - | + | + | ||||||||
K | + | - | - | - | - | - | + | + | |||||||
k | k | + | - | - | - | - | - | + | + | - | |||||
g | g | + | - | - | - | - | - | + | + | + | |||||
ɣ | ɣ | + | - | + | - | - | - | + | + | + | |||||
Ċ | + | - | - | - | - | + | |||||||||
ċ | + | - | - | - | - | + | - | ||||||||
ġ | + | - | - | - | - | + | + | ||||||||
B | + | - | - | - | + | ||||||||||
P | + | - | - | - | - | - | + | ||||||||
p | p | + | - | - | - | - | - | + | - | ||||||
b | b | + | - | - | - | - | - | + | + | ||||||
F | + | - | + | + | - | - | + | ||||||||
f | f | + | - | + | + | - | - | + | - | ||||||
v | v | + | - | + | + | - | - | + | + | ||||||
ß | ß | + | - | + | - | - | - | + | |||||||
N | + | + | - | - | + | + | |||||||||
n | n | + | + | - | - | + | + | + | + | ||||||
ɲ | ɲ | + | + | - | - | + | + | - | + | ||||||
m | m | + | + | - | - | + | + | + | |||||||
L | + | + | - | + | - | + | |||||||||
l | l | + | + | - | + | - | + | + | + | ||||||
ʎ | ʎ | + | + | - | + | - | + | - | + | ||||||
r | r | + | + | + | - | - | + | + |