[1. Basis of the reconstruction] [2. Symbols and features]
The smallest unit of tokenization in the OGR is the Segment (seg
). The Segments are not intended to provide
a phonetic transcription of the text but instead a consistent and phonemically-informed interpretation
of the spelling, which in these early texts is broadly phonemic. The segmental annotation respects the following
principles:
More extensive documentation of the segmental annotation will be found in published scientific presentations of the corpus.
Segment-level annotation is contained in four core tags:
seg_plus
and seg_minus
: two comma-separated lists of alphabetically-ordered phonological features, which are specified as
“+” or “-” respectively for the Segment. The more features included on these lists, the more specified the segment.seg_phoneme
: a character denoting a particular matrix of phonological features. IPA characters are generally
used with their standard value. Underspecified segments are represented by non-IPA characters or capitals.seg_matches
: a string of all the characters that match the matrix of phonological features. For
example, the palatal nasal stop “ɲ” also matches the underspecified characters “N” (nasal consonant) and
“C” (consonant). The seg_matches
string is therefore CNɲ
. In practical terms, this means that the
query seg_matches=/.*N.*/
will match all nasal consonants.When searching for classes of sounds, use the either seg_plus
and seg_minus
or seg_matches
tags, e.g.:
seg_minus=/.*cons.*/
seg_matches=/.*V.*/
onc="C" _i_ seg_plus=/.*nas.*/
onc="C" _i_ seg_matches=/.*N.*/
word _r_ seg_minus=/.*son.*/
word _r_ seg_matches=/.*Q.*/
seg_plus=/.*CORONAL.*/ _=_ seg_minus=/.*cont,.*son,.*strident.*/ & word _r_ #1
word _r_ seg_matches=/.*T.*/
In the TXM version of the corpus, Segment-level annotation is available through the word-level phon
tag, which concatenates all the seg_phonemes
in the word as a single string. The seg_plus
,
seg_minus
and seg_matches
tags are not available.
Symbol | Type | cons | son | nas | LABIAL | round | DORSAL | high | low | back | atr | voice |
---|---|---|---|---|---|---|---|---|---|---|---|---|
V | vowel | - | + | |||||||||
U | u,y | - | + | - | + | + | + | + | - | + | + | |
O | back | - | + | - | + | + | + | - | + | + | ||
u | u,w | - | + | - | + | + | + | + | - | + | + | + |
o | o,u | - | + | - | + | + | + | - | + | + | + | |
ɔ | ɔ | - | + | - | + | + | + | - | - | + | - | + |
y | y | - | + | - | + | + | + | + | - | - | + | + |
i | i,j | - | + | - | + | + | - | - | + | + | ||
Æ | front (not i) | - | + | - | + | - | - | + | ||||
E | e,ɛ1 | - | + | - | + | - | - | - | + | |||
e | e | - | + | - | + | - | - | - | + | + | ||
ɛ | ɛ | - | + | - | + | - | - | - | - | + | ||
ə | ə | - | + | - | + | + | ||||||
A | a,æ | - | + | - | + | - | + | - | + | |||
a | a | - | + | - | + | - | + | - | - | + | ||
æ | æ | - | + | - | + | - | + | - | + | + |
Symbol | Type | cons | son | cont | strident | lat | nas | LABIAL | CORONAL | ant | dist | DORSAL | back | LARYNGEAL | voice |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
C | consonant | + | |||||||||||||
Q2 | obstruent | + | - | - | - | ||||||||||
h | h | + | - | - | - | + | - | ||||||||
Z | coronal | + | - | - | - | + | |||||||||
D | t,d,ð,θ | + | - | - | - | - | + | + | |||||||
T | t,d | + | - | - | - | - | - | + | + | ||||||
t | t | + | - | - | - | - | - | + | + | - | |||||
d | d | + | - | - | - | - | - | + | + | + | |||||
Ç | ʦ,ʣ | + | - | - | + | - | - | + | + | - | |||||
ʦ | ʦ | + | - | - | + | - | - | + | + | - | - | ||||
ʣ | ʣ | + | - | - | + | - | - | + | + | - | + | ||||
Č | ʧ,ʤ | + | - | - | + | - | - | + | - | + | |||||
ʧ | ʧ | + | - | - | + | - | - | + | - | + | - | ||||
ʤ | ʤ | + | - | - | + | - | - | + | - | + | + | ||||
S | s,z | + | - | + | + | - | - | + | + | - | |||||
s | s | + | - | + | + | - | - | + | + | - | - | ||||
z | z | + | - | + | + | - | - | + | + | - | + | ||||
ð | ð,θ | + | - | + | - | - | - | + | + | + | |||||
G | velar obstruent | + | - | - | - | - | + | + | |||||||
K | k,g | + | - | - | - | - | - | + | + | ||||||
k | k | + | - | - | - | - | - | + | + | - | |||||
g | g | + | - | - | - | - | - | + | + | + | |||||
ɣ | ɣ | + | - | + | - | - | - | + | + | + | |||||
Ċ | palatalized velar | + | - | - | - | - | + | ||||||||
ċ | k,c,ʨ | + | - | - | - | - | + | - | |||||||
ġ | g,ɟ,ʥ | + | - | - | - | - | + | + | |||||||
B | labial obstruent | + | - | - | - | + | |||||||||
P | p,b | + | - | - | - | - | - | + | |||||||
p | p | + | - | - | - | - | - | + | - | ||||||
b | b | + | - | - | - | - | - | + | + | ||||||
F | f,v | + | - | + | + | - | - | + | |||||||
f | f | + | - | + | + | - | - | + | - | ||||||
v | v | + | - | + | + | - | - | + | + | ||||||
ß | ß | + | - | + | - | - | - | + | |||||||
N | nasal | + | + | - | - | + | + | ||||||||
n | n | + | + | - | - | + | + | + | + | ||||||
ɲ | ɲ | + | + | - | - | + | + | - | + | ||||||
m | m | + | + | - | - | + | + | + | |||||||
L | lateral | + | + | - | + | - | + | ||||||||
l | l | + | + | - | + | - | + | + | + | ||||||
ʎ | ʎ | + | + | - | + | - | + | - | + | ||||||
r | r | + | + | + | - | - | + | + |