Presentation

[Introduction] [History] [Sources and bibliography]

Introduction

The Old Gallo-Romance (OGR) corpus contrains richly annotated versions of the all Gallo-Romance texts preserved in manuscripts copied before c1130. Since most of the texts are well-known and excellent corpora of Old French (notably the Base de français médiéval at bfm.ens-lyon.fr) are already available, it’s reasonable to ask: why do we need this corpus too?

1. Unique features of the OGR corpus

The OGR differs from existing corpora and indeed published editions in important ways.

1.1 Unified early Gallo-Romance

The OGR corpus contains both Old French and Old Occitan texts. The textual record for the period in question is extremely sparse, the texts do not clearly belong to a single literary tradition and none of the them is written in the same Gallo-Romance variety. More importantly, a number of the texts from this period show effects of contact between northern and southern Gallo-Romance varieties, notably the Passion of Clermont, or originate from regions such as Poitou near the linguistic border between the northern and southern Gallo-Romance areas.

In short, the textual record for this early period does not neatly reflect a clear linguistic division between the two areas, and as such a unified Gallo-Romance corpus provides a more representative overview of the available data.

1.2 Manuscript oriented

The OGR corpus focuses on manuscripts, not texts. The only manuscripts included are those copied before c1130, which provides a clear terminus ad quem for any observed linguistic development, since any possible modernization by later scribes is excluded. Furthermore, the OGR includes a diplomatic edition of the text verified against photographs of the manuscript. The original word division and manuscript abbreviations are recorded and corrections to the text are avoided, except in the most obvious of cases, and even here they are clearly marked.

Some of this information is recoverable in critical editions — although typically not the original word division — but is absent from most electronic corpora, which eliminate the critical apparatus of the print edition and provide only the editor’s normalized base text.

1.3 Fully lemmatized and morphologically annotated

The lemmatization and morphosyntactic annotation in the OGR corpus has been manually verified for every text. As there is no shortage of detailed philological discussion of these texts — print critical editions typically include a fully glossed list of forms and often a translation — manual verification is informed by the philological tradition.

1.4 Phonologically and metrically annotated

Each text in the corpus is phonologically transcribed. The transcriptions are parsed into syllables, and in the verse texts, each syllable is assigned a metrical position within the line of verse, or is marked as elided.

While more speculative and informed by my own research interests, this annotation facilitates research into the morphophonology of the texts.

2. Research facilitated by the OGR corpus

The OGR corpus is designed primarily to facilitate research into the phonology, morphology, and morphosyntax of early Gallo-Romance, in addition to the study of versification. However, it is hoped that the lemmatization and morphosyntactic annotation will increase the accessibility of these often difficult texts for researchers in all areas who wish to explore early Gallo-Romance data in more detail.

The OGR corpus does not include treebank annotation and I do not envisage adding it in the near future. However, most of the northern Gallo-Romance texts were manually parsed during the SRCMF project and are available as part of the Profiterole treebank https://universaldependencies.org/treebanks/fro_profiterole/index.html.

Two of my papers use the OGR: Rainsford (2022), which gives a detailed presentation of the corpus, and Rainsford (2024), which examines sandhi phenomena, enclisis and proclisis in these old Gallo-Romance texts.

3. Why c1130?

The cut-off date for inclusion in the OGR corpus is fixed at c1130, roughly the date at which the Hildesheim manuscript of the Life of Saint Alexis was copied but before the Oxford manuscript of the Song of Roland. The date itself is arbitrary but it allows the OGR to include only those texts which are traditionally considered to be the “early monuments” of Gallo-Romance. These texts are united by their individuality. Except for the Life of Saint Alexis, each text is preserved only in a single manuscript. They are all written in a different variety of Gallo-Romance, and have little in common with each other in terms of their poetic form. Many are chance survivors, but even those which are not cannot easily be connected to the literary traditions that develop in the twelfth century. In short, the OGR is a collection of early, “oddball” texts.

With this in mind, there is a deliberate omission from the corpus. In the southern Gallo-Romance area, legal documents began to be written in the vernacular, at least in part, towards the end of the eleventh century (see Brunel 1926). These documents are linguistically very interesting but it makes little sense to include only those copied before 1130 and exclude the rest, since these are not oddball survivors but the first examples of an emerging textual tradition.

History

The OGR began life during my time as a postdoc on the SRCMF project at the ENS de Lyon, where I was able to collaborate with Alexei Lavrentiev and Céline Barbance-Guillot to produce new editions of the Serments and Eulalie texts. Subsequently, Christiane Marchello-Nizia invited me to collaborate on the creation of a multi-facet edition of the edition of the Life of Saint Alexis (Rainsford and Marchello-Nizia 2024). All of these editions are now also available in the Base de français médiéval. The style of the editions in the OGR, and in some cases the stylesheets themselves, draw heavily on Alexei Lavrentiev’s work for the BFM. I’m grateful to the BFM team (Céline Guillot-Barbance and Alexey Lavrentiev) and to the developers of the TXM software (Serge Heiden and Mathieu Decorde) for their support, inspiration and encouragement over the years.

As a British Academy Post-Doctoral Fellow at the University of Oxford, I continued to develop the tools used to build the metrically and syntactically annotated corpus developed during my doctoral thesis. These tools were used to produce an initial version of the Boeci text, annotated both metrically and syntactically in joint work with Olga Scrivner, who also introduced me to the ANNIS software.

The phonological transcriptions were developed for a paper published in 2020 on the the syllable structure of Early Old French, and the dataset, which also includes a transcription of all the forms in the Song of Roland , was published in the TROLLing repository.

A preview version of the corpus (v0.1.3) was released on ogr-corpus.org in June 2021. Version 0.4 of the corpus was presented at the Congrès mondial de linguistique française in Orleans in July 2022, and the paper was published in the conference proceedings (Rainsford 2022). The first complete version of the corpus was released in January 2025.

Version history

18 August 2025 Version 1.0.1 released and website updated.
- Corrected bug which annotated clitics as stressed in Old French.
2 January 2025: Version 1.0 released and website updated.
14 June 2022: Website updated to corpus version 0.4.
18 March 2022: Version 0.4 released on GitHub, six further texts added.
7 December 2021: Version 0.3 released. All formats fully implemented (TXM binary, relANNIS, PAULA-XML, TEI-P5). License changed to CC BY-NC-SA 4.0.
4 August 2021: Preview version 0.2 for TXM 0.8.1 and ANNIS 3.6 released. ANNIS portal online.
8 June 2021: Preview version 0.1.4 for TXM 0.8.1 released.
1 June 2021: Website online, preview version 0.1.3 for TXM released.

Sources and bibliography

Sources

The source texts are all in the public domain in the EU.
The normalized transcriptions, part-of-speech annotation and lemmatization in the following texts were adapted from the Base de Français médiéval: Serments, Eulalie, Alexis, Passion, SLeger. The re-use and re-distribution of this data in the OGR is in accordance with the provisions of the ETALAB licence.
The transcription of Alba is based on Frank and Hartmann (1997). The transcription of ChansLas is based on Bischoff (1984). The transcription of PassAugsb is based on Berschin et al. (1981). These transcriptions have not been checked against manuscript images.
The resolution of the Tironian notes and the reconstruction of missing text in Jonas is based on De Poerck (1955).
A previous annotated version of the Boeci text was created in collaboration with Olga Scrivner (Rainsford and Scrivner 2014).
All other transcriptions and annotations are my own original work, established on the basis of manuscript images and finalized in consultation with the published material listed in the bibliography below.

Bibliography

Avalle, D’Arco Silvio. 1962. Cultura e lingua francese delle origini nella ‘Passion’. Milan: Ricciardi. (Passion)
Avalle, D’Arco Silvio. 1965. Protostoria delle lingue romanze: dal dec. VII ai giuramenti di Strasburgo e con particolare riguardo al territorio gallo-romanzo. Turino: Giappichelli. (Serments)
Avalle, D’Arco Silvio. 1967. Monumenti prefranciani: il sermone di Valenciennes e il Sant Lethgier. Turin: Giappichelli. (Jonas, SLeger)
Avalle, D’Arco Silvio, and Raffaello Monterosso, eds. 1965. Sponsus: Dramma delle vergini prudenti e delle vergini stolte. Milan: Ricciardi. (Spons, PrDieu, PrVierge2, PrVierge3)
Berschin, Helmut, Walter Berschin, and Rolf Schmidt. 1981. ‹Augsburger Passionslied›. Ein neuer romanischer Text des X. Jahrhunderts. In Lateinische Dichtungen des X. und XI. Jahrhunderts: Festgabe für Walther Bulst zum 80. Geburtstag, edited by Walther Bulst, Walter Berschin, and Reinhard Düchting, 251-79. Heidelberg: Schneider.
Bischoff, Bernhard, ed. 1984. Anecdota Novissima: Texte des vierten bis sechzehnten Jahrhunderts. Stuttgart: Hiersemann. (ChansLas)
Blumenthal, Peter, and Achim Stein, eds. 2002. Tobler-Lommatzsch: Altfranzösisches Wörterbuch. Stuttgart: Steiner.
Brunel, Clovis. 1926. Les Plus Anciennes Chartes en langue provençale. Recueil des pièces originales antérieures au XIIIe siècle, publiées avec une étude morphologique. Paris: A. Picard.
Brunel-Lobrichon, Geneviève. 2003. Le manuscrit du Sponsus et ses poésies bilingues. Édition et traduction de deux poèmes à la Vierge (XIe siècle). In La tradition vive: mélanges d’histoire des textes en l’honneur de Louis Holtz, edited by Pierre Lardet, 401-15. Turnhout: Brepols. (PrDieu, PrVierge2, PrVierge3)
De Poerck, Guy. 1955. Le sermon bilingue sur Jonas du ms. Valenciennes 521 (475). Romanica Gandensia 4: 31-66. (Jonas).
De Poerck, Guy. 1963. Les plus anciens textes de la langue française comme témoins de l’époque. Revue de Linguistique Romane 27: 1-34.
De Poerck, Guy. 1964. Le ms. Clermont-Ferrand 240 (anc. 189), les Scriptoria d’Auvergne et les origines spirituelles de la vie française de Saint Léger. Scriptorium 18: 11-33. (Passion, SLeger)
Foerster, Wendelin. 1879. Épitre farcie de la Saint-Étienne. En vieux français du XIIe siècle. Revue des langues romanes 16: 5-15. (EpSEt)
Frank, Barbara, and Jörg Hartmann, eds. 1997. Inventaire systématique des premiers documents des langues romanes. 5 vols. Tübingen: Narr.
Gersbach, M. 1965. Eine altfranzösische Formel zu einem Gottesurteil. Vox Romanica 24: 64-75. (EpreuveJudic)
Guillot-Barbance, Céline, Serge Heiden, and Alexei Lavrentiev. 2017. Base de français médiéval : une base de référence de sources médiévales ouverte et libre au service de la communauté scientifique. Diachroniques 7: 168-84.
Hilty, Gerold. 1995. Les plus anciens monuments de la langue occitane. In Cantarem d’aquestz trobadors: studi occitanici in onore di Giuseppe Tavani, edited by Luciano Rossi and Giuseppe Tavani, 25-45. Alessandria: Edizioni dell’Orso.
Lafont, Robert, ed. 1998. La Chanson de sainte Foi : texte occitan du XIe siècle. Geneva: Droz. (SFoi)
Lazzerini, Lucia. 1986. À propos de l’aube de Fleury’. Romania 107: 552–53. (Alba)
Lazzerini, Lucia. 1993. A proposito di due Liebesstrophen pretrobadoriche. Cultura Neolatina 53: 123-34. (ChansLas)
Linskill, Joseph. 1937. Saint Léger : étude de la langue du manuscrit de Clermont-Ferrand. Paris: Droz. (SLeger)
Meneghetti, Maria Luisa. 1998. L’Alba di Fleury: un Osterlied. In Miscellanea Mediaevalia: Mélanges Offerts à Philippe Ménard, edited by Jean-Claude Faucon, 969-83. Paris: Champion. (Alba)
Mölk, Ulrich. 1996. Zwei Fragmente galloromanischer Weltlicher Lyrik des 11. Jahrhunderts. In Ensi firent li ancessor : mélanges de philologie médiévale offerts à Marc-René Jung, edited by Marc René Jung, Luciano Rossi, Christine Jacob-Hugon, and Ursula Bähler, 47-51. Alessandria: Edizioni dell’Orso. (ChansLas)
Mölk, Ulrich, and Günter Holtus. 1999. Alberics Alexanderfragment. Neuausgabe und Kommentar. Zeitschrift für romanische Philologie (ZrP) 115: 582-625. (AlexAlb)
Monaci, Ernesto, ed. 1910. Facsimili di documenti per la storia delle lingue e delle letterature romanze. Rome: Anderson.
Paden, William D. 2007. The language of the tenth‑century Occitan charms from Clermont‑Ferrand. In L’Art de la philologie : Mélanges en l’honneur de Leena Löfstedt, edited by Juhani Härmä, Elina Suomela-Härmä, and Olli Välikangas, 185-97. Helsinki: Société néophilologique. (BenClerm)
Rainsford, Thomas. 2022. Old Gallo-Romance (OGR) Corpus : annotation phonologique et métrique des plus anciens textes gallo-romans. SHS Web of Conferences 138: 02007. https://doi.org/10.1051/shsconf/202213802007.
Rainsford, Thomas. 2024. Proclisis and enclisis in early Gallo-Romance: evidence from sandhi phenomena. In Historical and Sociolinguistic Approaches to the French Language, edited by Janice Carruthers, Mairi McLaughlin, and Olivia Walsh, 25-48. Oxford: Oxford University Press. https://doi.org/10.1093/oso/9780192894366.003.0002.
Rainsford, Thomas, and Christiane Marchello-Nizia, eds. 2024. Vie de Saint Alexis. Lyon: ENS de Lyon. http://catalog.bfm-corpus.org/AlexisRaM.
Rainsford, Thomas, and Olga Scrivner. 2014. Metrical Annotation for a Verse Treebank. In Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13), edited by Verena Henrich, Erhard Hinrichs, Daniël de Kok, Petya Osenova, and Adam Przepiórkowski, 149-159. Tübingen: University of Tübingen. http://dx.doi.org/10.18419/opus-15350.
Storey, Christopher, ed. 1968. La Vie de Saint Alexis: texte du manuscrit de Hildesheim (L). Paris: Minard. (Alexis)
Thomas, Antoine, ed. 1925. La Chanson de Sainte Foi d’Agen : poème provençal du XIe siècle. Paris: Champion. (SFoi)
Thomas, Lucien-Paul, ed. 1951. Le ‘Sponsus’: (mystère des vierges sages et des vierges folles. Paris: Pr. Univ. de France. (Spons)
Zufferey, François. 2007. Perspectives nouvelles sur l’Alexandre d’Auberi de Besançon. Zeitschrift für romanische Philologie (ZrP) 123: 385-418.
Zufferey, François. 2020. La chanson de saint Alexis: essai d’édition critique de la version primitive, avec apparat synoptique de tous les témoins. Paris: Société des anciens textes français. (Alexis)
Zumthor, Paul. 1984. Un trompe‑l’œil Linguistique ? Le refrain de l’aube bilingue de Fleury. Romania 105: 171-92. (Alba)

Website

This website is hosted in the bwCloud and is built with hugo using the Whisper theme by Rob Austin.

OGR Corpus