SlideShare a Scribd company logo
1 of 33
Recommendations for Encoding
Etymological Information Using TEI XML
Laurent Romary
INRIA
France
Jack T. Bowers
iljackb@gmail.com
COST ENeL WG2 Meeting Vienna
13/02/2015
revision 06/04/2015
General Overview of Project
We are creating a set of structural recommendations for
TEI lexical dictionaries, including information relevant to:
• phonetic and orthographic forms;
• grammatical information;
• semantic and meta-linguistic
information;
• variation (on all levels);
• etymology;
• mono-/bi-/multi-/lingual
dictionaries; as well as in
dictionaries in which encyclopedic
and examples are included;
Models involve proposing changes to the TEI P5 guidelines itself
and defining our constraints on the TEI in an ODD;
Goals for TEI Etymological Markup Recommendations
(i) address the lack of sufficient digital markup models and standards for
representing etymological information;
(ii) coherence in treatment of the same exact linguistic information between
synchronic and diachronic data structures;
(iii) LMF and ONTOLEX compatible TEI structures;
(iv) make better use of linking mechanisms in TEI for:
• connecting cited forms in etymology and their project
internal sources (where possible);
• making use of existing external resources for lexical and
information conceptual not internal to a given project or
corpus:
e.g.
open source lexical & ontological knowledge and linked data resources
(v) increase diversity in the types of etymological information that can be
treated & make more use of concepts from linguistics:
0…n
<colloc>
<per>
<usg>
<case>
<gram> <pos>
<number>
<tns>
<gen>
<mood>
Working TEI Dictionary Metamodel (elements)
TEI
0…n
<quote>
0…n
1…n
0…1
<usg> <cit>sense
0…n 0…n0…1
<gramGrp>
1…10…10…1
0…n0…n
1…n
<bibl>
0…n<form>
<sense>
<orth> <pron>
0…n
<seg> <seg>
<listChange>
1…n
<change>
0…1
<bibl>
0…1
<seg>
<oRef>
<pRef>
<gramGrp>
0…n
0…n
0…n
<etym>sense
0…n
<etym>entry
0…n
<ref>
<gloss>
1…n
<oRef>
<pRef>
<lang>
<lbl>
0…n
<ptr>
<note>
<date>
<bibl>
<ptr>
<entry>
0…n
<ref>0…n
<spanGrp>
<span> <annotationGrp>
<annotations>
1…n 0…n
1…n
<def>
1…n
0…n
<def>
<cit>etym
<gramGrp>
<cit>
<num>
<cit>
<num>
<lbl>
<num>
<lbl>
0…n
<c><c>
Two Potential Etymology Structures in TEI0…n
<quote>
0…n
<cit>
0…n0…1
<gramGrp>
1…n
<bibl>
0…n
<seg>
<oRef>
<pRef>
<ptr>
<entry>
0…n
<ref>0…n
<spanGrp>
<span> <annotationGrp>
<annotations>
1…n 0…n
1…n
<def>
1…n
0…n
• if there are semantic implications for
the etymological change;
• no semantic implications for existing
lexical items in language the
etymological change;
<etym>sense
<etym>entry
• both may occur in the same entry
to account for unrelated changes
that occurred at different stages;
0…1 0…n
0…n
0…n
0…n
<etym>sense
0…n
<gramGrp>
<ref>
<gloss>
1…n
<cit>
<oRef>
<pRef>
<lang>
<lbl>
0…n
<ptr>
<note>
<date>
<bibl>
1…n
<def>
<num>
<cit>
<etym>entry
<sense>
0…n
<usg>
<etym>entry
• If there are no semantic implications for the
etymological change, and/or the semantic
change occurred in another language or
proto-language stage;
0…n
1…n
<entry>
<quote>
0…n
<cit>sense
0…n0…1
<gramGrp>
<bibl>
0…n
<seg>
<oRef>
<pRef>
<ptr>
0…n
<ref>0…n
<spanGrp>
<span> <annotationGrp>
<annotations>
1…n 0…n
1…n
<def>
1…n
0…n
<sense> • Inheritance
;
Phonetic and phonological processes:
(non exhaustive)
• assimilation (place, manner) ;
• epenthesis;
• metathasis
• erosion/deletion; (apokope,
• coalescence;
• tone changes;
(has own internal categories)
• neutralization;
• Borrowing*;
• lexical item imported from
other language;
1…n
0…n
0…n
0…n
0…n
0…n
<ref>
<gloss>
1…n
<cit>etym
<oRef>
<pRef>
<lang>
<lbl>
0…n
<ptr>
<note>
<date>
<bibl>
1…n
<def>
<num>
<cit>
<colloc>
<per>
<usg>
<case>
<gram>
<number>
<gen>
<mood>
1…n
<num>
<gramGrp>
<note>
<etym>sense
<cit>
<lbl>
<etym>entry
<pos>
<tns>
0…n
<quote>
0…n
0…1
<usg> <cit>sense
0…n0…1
<gramGrp>
1…n
<bibl>
0…n
<seg>
<oRef>
<pRef>
0…n
0…n
<etym>sense
0…n
<ref>
<gloss>
1…n
<cit>etym
<oRef>
<pRef>
<lang>
<lbl>
0…n
<ptr>
<note>
<date>
<bibl>
<ptr>
<entry>
0…n
<ref>0…n
<spanGrp>
<span> <annotationGrp>
<annotations>
1…n 0…n
1…n
<def>
1…n
0…n
Used when there are semantic implications
for the etymological change;
• *where there are multiple
etymological processes that
occur and some are semantic in
nature and others phonetic, they
may all be included in
<etym>sense if the former
permitted the latter.
1…n
<def>
<num>
<cit>
<etym>sense
• Metaphor;
• Metonymy
• Blending*;
• Compounding;
• Grammaticalizati
on;
• several of these processes can
co-occur;
<gramGrp>
0…n
0…n
0…n
<etym>entry
<num>
<lbl>
0…n
0…n
<colloc>
<pers>
<usg>
<case>
<gram>
<pos>
<number>
<tns>
<gen>
<mood>
1…n
<sense>
<num>
<lbl>
Etymological Processes: Inheritance
<entry xml:lang="it" xml:id=“buono">
<form type="lemma">
<orth>buono</orth>
<pron notation=“ipa">'bwo.no</pron>
<gramGrp>
<pos>adj.</pos>
<gen>masc.</gen>
</gramGrp>
</form>
<sense>
....
</sense>
<etym type="inheritance">
<cit type="etymon">
<oRef xml:lang="la">bónŭ</oRef>
<gramGrp>
<pos>adj.</pos>
<gen>masc.</gen
<case>nom.</case>
</gramGrp>
</cit>
</etym>
</entry>
Italian < Vulgar Latin
buono < bŏnu
synchronic entry
diachronic
(etymological)
entry
Note: processes and changes are approximate and meant for
demonstrating markup rather than asserting precise etymological
diachrony of individual items;
Etymological Processes:
2)
ˈbonu > ˈbon
<entry xml:lang="fr" xml:id="bon">
<form type="lemma">
<orth>bon</orth>
<pron notation=“ipa">'bɔ̃</pron>
<gramGrp>
<pos>adj</pos>
<gen>masc.</gen>
</gramGrp>
</form>
<sense>
....
</sense>
<etym type="inheritance">
<cit type=“etymon" xml:id="bónŭ" next="ˈbon">
<oRef xml:lang="la">bónŭ</oRef>
<gramGrp>
<case>nom.</case>
</gramGrp>
</cit>
<cit type=“etymon” xml:id="ˈbon" prev=“bónŭ” next="ˈbɔ̃">
<pRef xml:lang=“fro">ˈbon</oRef>
</cit>
<cit type=“etymon” xml:id="ˈbɔ̃" prev=“ˈbon">
<pRef xml:lang="fro">bɔ̃</oRef>
</cit>
</etym>
</entry>
bon < bónŭ
French < Vulgar Latin
(2) Intermediate
phonological
change
(1) Root level
etymological
process
(3) Final
phonological
change
Inheritance
&
Phonological
Changes
Note: processes and changes are approximate and
meant for demonstrating markup rather than asserting
precise etymological diachrony of individual items;
3)
ˈbon > ˈbɔ̃
Etymological Processes: Borrowing*
Key Linguistic concepts:
Description of lexical process:• where a language takes a
lexical item from different
language;
• aka: loaning, importing;
• often have historical and
practical explanation for
need
• source language;
• source form(s); phonetic,
orthographic
• importing language;
• imported or borrowed form;
• semantic/meta-linguistic
concept;
Source
Language:
Importing
Language:
Meta-
linguistic
Concept:
Borrowed
Form(s):
Source
Form(s):
orth(i..n)
pron(i..n)
orth(i..n)
pron(i..n)
Etymological Processes: Borrowing*
<entry xml:id="taxi" xml:lang="jpn">
<form type="lemma">
<orth type="transliterated" notation="romanji">takushī</orth>
<orth notation="katakana">タクシー</orth>
<pron notation="ipa">taku'shi:</pron>
<gramGrp>
<pos>noun</pos>
</gramGrp>
</form>
<sense corresp="http://dbpedia.org/page/Taxicab">
<usg type=“dom">transportation</usg>
</sense>
<etym type="borrowing">
<lbl>source</lbl> <lang>English</lang>
<cit type="etymon">
<oRef corresp="http://en.wiktionary.org/wiki/taxi" xml:lang="en">taxi</oRef>
<pRef notation=“ipa" corresp=“http://en.wiktionary.org/wiki/taxi#Pronunciation" xml:lang="en-US">'tæksi</pRef>
</cit>
</etym>
</entry>
Japanese < English:
taxi(cab)
Borrowed Form(s):
Source Form(s):
Meta-linguistic Concept:
Importing Language
Source Language
<cit type=“etymon"><orth @type @notation>
<pron @notation>
<form type=“lemma">
<gramGrp>
<pos>
<etym type=“borrowing”>
<entry @xml:id>
<oRef @corresp @xml:lang>
<sense @corresp>
<lbl>
<lang>
<usg type=“dom”>
TEI Model for Japanese ‘takushī’
Etymological Process: Borrowing
Lexical entry:
<pRef @notation @corresp @xml:lang>
Ontological resource for entry
External lexical entry
resource for source term
External pronunciation
resource for source term
Description of process:
Key components
• Domain of concept
(y): Source Domain;
• Domain of concept
(x): Target Domain
Source Concept:
Salient
Attributes
Target Concept:
• Lexical innovation based in
human cognition;
• Describe/understand one
concept (x) in terms of
concept (y);
• Requires a change in
semantic domains;
• Mapping between concepts
is only limited to certain
salient attributes;
• Results in lexical Polysemy
Etymological Processes: Metaphor
Source Domain Profile:
Domain (x)
Target Domain Profile:
Domain (y)
Lexical
Source Form(s)
Polysemous
Lexical Form(s)
phonetic
orthographic
Etymological Processes: Metaphor
Source Concept: bean
Target Concept: kidney
color shape
Source Domain Profile:
Legumes
Food
Target Domain Profile:
Body
Internal Organs
Lexical
Source Form(s)
[ndù.ʧí]
ntuchi
Polysemous
Lexical Form(s)
Mixtepec-Mixtec
‘ntuchi’ (bean > kidney)
<entry xml:id="kidney">
<form type=“lemma">
<orth>ntuchi</orth>
<pron notation="ipa">ndù.ʧí</pron>
<!— gramGrp cluster—>
</form>
<sense corresp="http://dbpedia.org/resource/Kidney">
…..
<usg type="dom">Body</usg>
<usg type=“dom">InternalOrgans</usg>
<etym type="metaphor">
<cit type=“etymon">
<oRef corresp="#bean">ntuchi</oRef>
<pRef corresp="#bean">ndù.ʧí</pRef>
<gloss>bean</gloss>
</cit>
</etym>
<entry xml:id="bean">
<form type=“lemma">
<orth>ntuchi</orth>
<pron notation="ipa">ndù.ʧí</pron>
<!— gramGrp cluster—>
</form>
…..
<sense corresp="http://dbpedia.org/resource/Pinto_bean">
<usg type="dom">Legume</usg>
<usg type="dom">Food</usg>
…….
<!— translation info here—>
</sense>
</entry>
Etymological Processes: Metaphor
dbpedia ontology
entry for: ‘pinto
bean’
dbpedia ontology
entry for: ‘kidney’
pointer to entry for ‘bean’
<usg type=“dom”> <etym type=“metaphor”>
<gloss>
<cit type=“etymon”>
<lbl>
<sense @corresp>
<entry @xml:id>
<cit type=“translation” @xml:lang>
<oRef @corresp>
<gramGrp>
<pos>
<orth>
<pron @notation>
<form @type=“lemma">
<sense @correp><form type=“lemma”>
<usg type=“dom”>
<cit @type @xml:lang>
<gramGrp>
<entry @xml:id>
TEI Model for Mixtepec-Mixtec ‘ntuchi’
Etymological process: Metaphor
Lexical entry:
Source entry:
<pRef @corresp @notation>
Ontological resource
for entry (kidney):
Ontological resource for
Source entry (bean):
<oRef @corresp>
<pron @notation>
<orth>
<pos>
Etymological Processes: Metonymy
Description of lexical process:
Key Linguistic concepts:
• concept (y) stands for concept (x);
• no change in semantic domains;
• one “vehicle” entity provides
mental access to another, (i.e. a
target) within the same domain.;
• source concept (cognitive);
• target concept (cognitive);
• source form (lexical);
• target form (lexical):
• results in (synchronic) polysemy
Vehicle Concept:
Target Concept:
Domain (X)
Etymological Processes: Metonymy
Mixtepec-Mixtec: ‘kiti’ (horse)
<entry xml:id=“animal”>
<form type="lemma">
<orth>kiti</orth>
<pron notation="ipa">kì.tí</pron>
<!—gramGrp here —>
</form>
<sense corresp="http://dbpedia.org/resource/Animal">
<usg type=“dom">Living Beings</usg>
<usg type=“dom">Animal</usg>
<cit type="translation" xml:lang="eng">
<oRef>animal</oRef>
</cit>
<!—other translations here —>
</sense>
</entry>
<entry xml:id=“animal-horse”>
<form type=“lemma">
<orth>kiti</orth>
<pron notation="ipa">kì.t̪í</pron>
<!—gramGrp here —>
</form>
<sense corresp="http://dbpedia.org/resource/Horse">
<usg type=“dom”>Animal</usg>
<etym type="metonymy">
<date notBefore="1517"/>
<cit type="etymon">
<oRef corresp="#animal">kiti</oRef>
<pRef notation="ipa" corresp="#animal">kì.t̪í</pRef>
<gloss>animal</gloss>
</cit>
<note>In this lexical item, the language reflects the
history, since there were no horses in Mexico until
the arrival of the Spanish, there was no Mixtecan word
for 'horse', thus they categorical noun for 'animal'
was used to describe the unnamed animal.
</note>
</etym>
<cit type="translation" xml:lang="eng">
<oRef>horse</oRef>
</cit>
<!—other translations here —>
</sense>
</entry>
Vehicle Concept; entryTarget Concept; entry
<usg type=“dom”>
<form type=“lemma">
<entry @xml:id>
<sense @corresp>
<cit type=“translation” @xml:lang>
<oRef>
<gramGrp>
<pos><orth>
<pron @notation>
<sense @corresp><form type=“lemma”>
<pron @notation>
<usg type=“dom”>
<cit type=“translation” @xml:lang>
<gramGrp>
<entry @xml:id>
TEI Model for Mixtepec-Mixtec ‘kiti’ (horse)
Etymological process: Metonymy
Lexical entry:
Source entry:
<etym type=“metonymy”>
Ontological resource
for entry:
Ontological resource for
Source entry:
<orth>
<cit type=“etymon”>
<note>
<gloss>
<oRef @corresp>
<pRef @corresp @notation>
<date @notBefore>
<pos>
<oRef>
Etymological Processes:
Compounding
Description of lexical process:
• Combines surface forms of two
lexical items to form new one;
• Become the sum of its lexical
and semantic parts;
• Can involve metaphor,
metonymy, and/or
grammaticalization
Etymon(i)*:
Etymon(ii)*:
grammatical
info(i)
grammatical
info(ii)
semantic/meta-
linguistic info(ii)
semantic/meta-
linguistic info(ii)
etym.
process
(0..n)
etym.
process
(0..n)
Etymological Processes: Compounding
(with Metonymy)
Salient attribute of location = “the
presence of hummingbirds”
Mixtepec-Mixtec: Yucha Nchu’u ’Puebla State’
<etym type="metonymy">
<cit type="etymon">
<oRef corresp=“#hummingbird”>Nchu’u</pRef>
<gramGrp>
<pos>concrete noun</pos>
</gramGrp>
<gloss>hummingbird</gloss>
</cit>
</etym>
<entry xml:id=“Puebla-state" xml:lang="mix" type="compound">
<form type="lemma">
<orth><seg corresp=“#lake">Yucha</seg> <seg corresp=“#hummingbird”>Nchu’u</seg></orth>
<!— <gramGrp> here —>…..
</form>
Etymon(1):
<sense corresp="http://dbpedia.org/resource/Puebla_State">
<etym type="compounding">
</etym>
….
</sense>
</entry>
<cit type="etymon">
<oRef corresp=“#lake”>Yucha</pRef>
<gramGrp>
<pos>concreteNoun</pos>
</gramGrp>
<gloss>hummingbird</gloss>
</cit> Etymological process(ii): Metonymy
(Primary) Etymological process: Compounding
Etymon(2):
<oRef @corresp>
<form type=“lemma">
<gramGrp>
<pos>
<orth>
<seg @corresp>
Etymological Processes: Compounding
TEI model for Mixtepec-Mixtec “Yucha Nchu’u”
<gloss>
<cit type=“etymon”>
<pos>
<gramGrp>
<oRef @corresp>
<gloss>
<cit type=“etymon”>
<etym type=“metonymy”>
<pos>
<gramGrp>
Lexical entry:
<entry @xml:id type=“compound”>
<etym type=“compounding”>
<sense @corresp>
<seg @corresp>
Ontological resource for entry:
Alt (2006) LMF etymology extension proposal;
merged with the LMF Core package
Form
Representation
Lexical Entry
Lexical DB
Text
Representation
Lexical Resource
Global Information
Statement
Form Representation
0…n
1…n
0…1
0…n
Etymon Etymological Link
Etymology
0…n
1…n
1…n
1…n 0…n
Sense
0…n
0…n
0…n
0…n
1…1
Definition
pompel
limoes
+pamplemousse pompelmoes
Synchronic Diachronic
DutchModern French
/etymologicalLink/
/source/=“..”/target/=“…”
/etymologicalClass/=/composition/
/biblSource/=“Boulan, König…”
/confidenceScore/=“probable”
Etymology of French ‘pamplemousse’:
from Trésore de la Langue Française (TFL)
Etymological stage
Composition
(eg., Compounding)
Etymological stage
Loan Word
(eg., Borrowing)
/etymon/
/orth/=“pompelmoes”
/language/=”nl”
/pos/=“commonNoun”
/gender/=“feminine”
/gloss/=“Citrus Maxima”
/etymologicalLink/
/source/=“..”/target/=“…”
/etymologicalClass/=/loan word/
/biblSource/=“TLF”
Alt (2006) LMF Etymology Extension: Borrowing Stage
/etymon/
/orth/=“limoes”
/language/=“nl”
/pos/=“commonNoun”
/gloss/=“citron”
/etymon/
/orth/=“pompel”
/language/=“nl”
/pos/=“adjective”
/gloss/=“gros, enflé”
<entry xml:id="LE1" xml:lang=“fr">
<form type="lemma">
<orth>pamplemousse</orth>
....
</form>
<sense>
....
</sense>
…..
</etym>
</entry>
<cit type="etymon" xml:id="L2">
<oRef xml:lang="nl">pompelmoes</oRef>
<gloss xml:lang="lat">Citrus maxima</gloss>
<gramGrp>
<pos>commonNoun</pos>
<gen>feminine</gen>
</gramGrp>
<note>probablement de l’origine
tamoule, De Vries, Nederl</note>
</cit>
<etym type=“borrowing">
…..
<ref target=“#TLF”>TLF</ref>
…..
Alt (2006) LMF Etymology Extension: Borrowing Stage
Converted TEI Markup
Note: our TEI structures do not explicitly use an equivalent of
/etymologicalLink/ or “ /source/=“..”/target/=“…” ) as this link is
implicitly present in the xml data structure
Dutch
Modern
French
pompelmoes
pamplemousse
/etymologicalLink/
/source/=“..”/target/=“…”
/etymologicalClass/=/loan word/
/biblSource/=“TLF”
/etymon/
/orth/=“pompelmoes”
/language/=”nl”
/pos/=“commonNoun”
/gender/=“feminine”
/gloss/=“Citrus Maxima”
<!— ‘compounding’ section goes here —>
≈
pompel
limoes
+pamplemousse pompelmoes
Synchronic Diachronic
DutchModern French
/etymologicalLink/
/source/=“..”/target/=“…”
/etymologicalClass/=/composition/
/biblSource/=“Boulan, König…”
/confidenceScore/=“probable”
Etymological stage
Composition
(eg., Compounding)
Etymological stage
Loan Word
(eg., Borrowing)
/etymon/
/orth/=“limoes”
/language/=“nl”
/pos/=“commonNoun”
/gloss/=“citron”
/etymon/
/orth/=“pompel”
/language/=“nl”
/pos/=“adjective”
/gloss/=“gros, enflé”
/etymon/
/orth/=“pompelmoes”
/language/=”nl”
/pos/=“commonNoun”
/gender/=“feminine”
/gloss/=“Citrus Maxima”
Alt (2006) LMF Etymology Extension: Compounding Stage
Etymology of French ‘pamplemousse’:
from Trésore de la Langue Française (TFL)
/etymologicalLink/
/source/=“..”/target/=“…”
/etymologicalClass/=/loan word/
/biblSource/=“TLF”
ation of Alt (2006) LMF Etymology Extension: Compounding Stage
<entry xml:id="LE1" xml:lang=“fr">
<form type="lemma">
<orth>pamplemousse</orth>
....
</form>
<sense>
....
</sense>
<etym type="borrowing">
……
…..
</etym>
</entry>
<etym type=“compounding”>
<ref target="#Boulan-König">Boulan, König...</ref>
</etym>
<cit type="etymon">
<oRef xml:lang="nl">pompel</oRef>
<gramGrp>
<pos>adjective</pos>
</gramGrp>
<gloss>gros, enflé</gloss>
</cit>
<cit type=“etymon">
<oRef xml:lang="nl">limoes</oRef>
<gramGrp>
<pos>commonNoun</pos>
</gramGrp>
<gloss>citron</gloss>
</cit>
/etymon/
/orth/=“pompel”
/language/=“nl”
/pos/=“adjective”
/gloss/=“gros, enflé”
/etymon/
/orth/=“limoes”
/language/=“nl”
/pos/=“commonNoun”
/gloss/=“citron”
pompel
limoes
+
pamplemousse
Historical
Dutch
Modern
French
/etymologicalLink/
/source/=“..”/target/=“…”
/etymologicalClass/=/composition/
/biblSource/=“Boulan, König…”
/confidenceScore/=“probable”
<!— ‘borrowing’ section goes here —>
Note: our TEI structures do not explicitly use an equivalent of
/etymologicalLink/ or “ /source/=“..”/target/=“…” ) as this link is
implicitly present in the xml data structure
≈
≈
<lbl>
<lang>
<sense> 0…n
<oRef @xml:lang>
<etym type=“borrowing”>
<ref @target>
<form type=“lemma">
<gramGrp>
<pos>
<c>
<orth>
<seg @corresp>
Etymological Processes: Borrowing & Compounding
TEI model for ‘pompelmousse’
as converted from LMF (Alt 2006)
<gloss @xml:lang>
<cit type=“etymon”>
<gen>
<note>
<pos>
<gramGrp>
<oRef @xml:lang>
<gloss @xml:lang>
<cit type=“etymon”>
<etym type=“compounding”>
<ref @target>
<pos>
<gramGrp>
Lexical entry:
<seg @corresp>
<entry @xml:id type=“compound”>
Étymol. et Hist. 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet
estre (Eneas, 9003, ibid.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12);
1824 peut-être bien (Joubert, loc. cit.); 2. 1636 employé elliptiquement pour répondre évasivement à
une question (Corneille, Le Cid, I, 2); 3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie
(Beaumarchais, Barbier de Séville, II, 2); 4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L.
Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1); 5. 1637 subst. un peut-estre (Id., La
Place royale, IV, 6). Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*.
<entry xml:id="peut-être" xml:lang="fr" type="compound">
<form type="lemma">
<orth><seg corresp="#pouvoir-3s-pres-ind">peut</seg><c>-</c><seg corresp="#être">être</seg></orth>
<gramGrp>
<pos>adv.</pos>
</gramGrp>
</form>
…
</entry>
PEUT-ÊTRE, adv.
Encoding from existing sources:
synchronic portion of entry
Trésor de la Langue Française
For “compound” entry types, @corresp can
(optionally) be used in the <seg> element to point
to the individual sub components of the item within
a project or externally;
PEUT-ÊTRE,adv.
Encoding from existing sources:
non-linguistic content portion of diachronic entry
….
<etym xml:id=“PEUT-ÊTRE-adv-Étym-et-Hist” >
<lbl>Étymol. et Hist.</lbl>
<num>1.</num>
……
<num>2.</num>
…..
<num>3.</num>
……
<num>4.</num>
…..
<num>5.</num>
……
<note> Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. </note>
</etym>
…
Trésor de la Langue Française
Étymol. et Hist.
2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2);
1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet estre (Eneas, 9003, ibid.); début
xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1824 peut-être bien (Joubert, loc. cit.);
3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2);
4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1);
5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6).
Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*.
PEUT-ÊTRE, adv.
Encoding from existing sources:
diachronic portion of entry
….
<sense>
<etym xml:id=“PEUT-ÊTRE-adv-Étym-et-Hist” type="inheritance">
<lbl>Étymol. et Hist.</lbl>
<num>1.</num>
……
<num>2.</num>
…..
<num>3.</num>
……
<num>4.</num>
…..
<num>5.</num>
……
<note> Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. </note>
</etym>
</sense>
…
Trésor de la Langue Française
<cit type="attestation">
<date> </date>
<oRef> </oRef>
<gramGrp>
<!—appropriate element here —>
</gramGrp>
<bibl> </bibl>
<note> </note>
</cit>
….
template
2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2);
1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.);
ca 1160 puet estre (Eneas, 9003, ibid.);
début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12);
1824 peut-être bien (Joubert, loc. cit.);
3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2);
4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407);
1641 peut-estre que (Corneille, Cinna, III, 1);
5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6).
Encoding from existing sources:
diachronic portion of entry
<cit type="attestation">
<date notBefore="1200" notAfter="1250">1re moitié du xiies</date>
<oRef xml:lang="fro">put cel estre</oRef>
<bibl>(Psautier Oxford, 54, 13 ds T.-L.)</bibl>
</cit>
Trésor de la Langue Française
iso 639-3 code
Old French (842-ca. 1400) fro
iso 639-3 code
Middle French (ca. 1400 - 1600) frm
<cit type="attestation">
<date notBefore="1400" notAfter="1450">début xves</date>
<oRef xml:lang="frm">peut-estre</oRef>
<bibl>(Quinze joies mariage, éd. J. Rychner, XII, 12)</bibl>
</cit>
<cit type="attestation">
<date when="1824">1824</date>
<oRef>peut-être bien</oRef>
<bibl>(Joubert, loc. cit.)</bibl>
</cit>
….
1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.);
1824 peut-être bien (Joubert, loc. cit.);
début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12);
1.
Conclusions and Summary
Our TEI recommendations can facilitate:
• linking and integrating corresponding data structures between
the synchronic and diachronic levels;
• the use of open source lexical resources and ontological
information;
• a more principled and consistent set of TEI guidelines for digitally
encoding etymological information;
• better compatibility between information traditionally kept, and
formatted separately in etymological dictionaries, lexical dictionaries
and linguistic analyses;
• models for encoding ubiquitous processes of linguistic change for
multiple levels of language;
• theoretically agnostic data structures;
• a more diverse set of etymological examples for the TEI guidelines;

More Related Content

What's hot

BL Demo Day - July2011 - (6) Language Tools for IMPACT
BL Demo Day - July2011 - (6) Language Tools for IMPACTBL Demo Day - July2011 - (6) Language Tools for IMPACT
BL Demo Day - July2011 - (6) Language Tools for IMPACTIMPACT Centre of Competence
 
Acl reading@2016 10-26
Acl reading@2016 10-26Acl reading@2016 10-26
Acl reading@2016 10-26sekizawayuuki
 
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAMORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAijnlc
 
A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...Guy De Pauw
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiIIIT Hyderabad
 
Lectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersLectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersMatias Menendez
 
Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?Federico Gobbo
 
Exploring Natural Language Processing in Ruby
Exploring Natural Language Processing in RubyExploring Natural Language Processing in Ruby
Exploring Natural Language Processing in RubyKevin Dias
 
Linguistic markup and processing of transclusion in XML documents (Notes)
Linguistic markup and processing of transclusion in XML documents (Notes)Linguistic markup and processing of transclusion in XML documents (Notes)
Linguistic markup and processing of transclusion in XML documents (Notes)Simon Dew
 
Basic techniques in nlp
Basic techniques in nlpBasic techniques in nlp
Basic techniques in nlpSumit Sony
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Facultad de Informática UCM
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
 
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingGuy De Pauw
 
Trends In Languages 2010
Trends In Languages 2010Trends In Languages 2010
Trends In Languages 2010Markus Voelter
 

What's hot (20)

BL Demo Day - July2011 - (6) Language Tools for IMPACT
BL Demo Day - July2011 - (6) Language Tools for IMPACTBL Demo Day - July2011 - (6) Language Tools for IMPACT
BL Demo Day - July2011 - (6) Language Tools for IMPACT
 
About XML
About XMLAbout XML
About XML
 
NLP new words
NLP new wordsNLP new words
NLP new words
 
Ijetcas14 575
Ijetcas14 575Ijetcas14 575
Ijetcas14 575
 
Acl reading@2016 10-26
Acl reading@2016 10-26Acl reading@2016 10-26
Acl reading@2016 10-26
 
Ivan Derganskyi
Ivan DerganskyiIvan Derganskyi
Ivan Derganskyi
 
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYAMORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
MORPHOLOGICAL SEGMENTATION WITH LSTM NEURAL NETWORKS FOR TIGRINYA
 
A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...A Repository of Free Lexical Resources for African Languages: The Project and...
A Repository of Free Lexical Resources for African Languages: The Project and...
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
 
Intro to NLP. Lecture 2
Intro to NLP.  Lecture 2Intro to NLP.  Lecture 2
Intro to NLP. Lecture 2
 
Lectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducersLectura 3.5 word normalizationintwitter finitestate_transducers
Lectura 3.5 word normalizationintwitter finitestate_transducers
 
Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?Is the Calvet Language Barometer useful to measure linguistic justice?
Is the Calvet Language Barometer useful to measure linguistic justice?
 
Exploring Natural Language Processing in Ruby
Exploring Natural Language Processing in RubyExploring Natural Language Processing in Ruby
Exploring Natural Language Processing in Ruby
 
Linguistic markup and processing of transclusion in XML documents (Notes)
Linguistic markup and processing of transclusion in XML documents (Notes)Linguistic markup and processing of transclusion in XML documents (Notes)
Linguistic markup and processing of transclusion in XML documents (Notes)
 
Basic techniques in nlp
Basic techniques in nlpBasic techniques in nlp
Basic techniques in nlp
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
 
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech TaggingA Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
A Knowledge-Light Approach to Luo Machine Translation and Part-of-Speech Tagging
 
Trends In Languages 2010
Trends In Languages 2010Trends In Languages 2010
Trends In Languages 2010
 
Versioning theory
Versioning theoryVersioning theory
Versioning theory
 

Similar to Etymology Markup in TEI XML

G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageijnlc
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningCITE
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfHabtamu100
 
Designing a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextDesigning a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextWaqas Tariq
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShashank Shisodia
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration surveyunyil96
 
Idn root zone lgr workshop icann53
Idn root zone lgr workshop icann53Idn root zone lgr workshop icann53
Idn root zone lgr workshop icann53ICANN
 
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...Jack Bowers
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...ijnlc
 
Remodelling of a Database of Bavarian Dialects into TEI XML and LOD
Remodelling of a Database of Bavarian Dialects into TEI XML and LODRemodelling of a Database of Bavarian Dialects into TEI XML and LOD
Remodelling of a Database of Bavarian Dialects into TEI XML and LODJack Bowers
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN
 
Diachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google NgramDiachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google NgramAnnalina Caputo
 

Similar to Etymology Markup in TEI XML (20)

OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual DictionariesOpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries
 
G2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian languageG2 pil a grapheme to-phoneme conversion tool for the italian language
G2 pil a grapheme to-phoneme conversion tool for the italian language
 
Php packages
Php packagesPhp packages
Php packages
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language LearningError Detection and Feedback with OT-LFG for Computer-assisted Language Learning
Error Detection and Feedback with OT-LFG for Computer-assisted Language Learning
 
Chapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdfChapter 2 Text Operation.pdf
Chapter 2 Text Operation.pdf
 
Machine translator Introduction
Machine translator IntroductionMachine translator Introduction
Machine translator Introduction
 
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
 
Designing a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo TextDesigning a Rule Based Stemmer for Afaan Oromo Text
Designing a Rule Based Stemmer for Afaan Oromo Text
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
Machine transliteration survey
Machine transliteration surveyMachine transliteration survey
Machine transliteration survey
 
Idn root zone lgr workshop icann53
Idn root zone lgr workshop icann53Idn root zone lgr workshop icann53
Idn root zone lgr workshop icann53
 
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
Exploring data models for heterogenous dialect data: the case of e​xplore.bre...
 
Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...Robust extended tokenization framework for romanian by semantic parallel text...
Robust extended tokenization framework for romanian by semantic parallel text...
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Anabela Barreiro - Alinhamentos
 
Cross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold setsCross language alignments - challenges guidelines and gold sets
Cross language alignments - challenges guidelines and gold sets
 
Remodelling of a Database of Bavarian Dialects into TEI XML and LOD
Remodelling of a Database of Bavarian Dialects into TEI XML and LODRemodelling of a Database of Bavarian Dialects into TEI XML and LOD
Remodelling of a Database of Bavarian Dialects into TEI XML and LOD
 
ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
Diachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google NgramDiachronic Analysis of Language exploiting Google Ngram
Diachronic Analysis of Language exploiting Google Ngram
 

Recently uploaded

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Recently uploaded (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Etymology Markup in TEI XML

  • 1. Recommendations for Encoding Etymological Information Using TEI XML Laurent Romary INRIA France Jack T. Bowers iljackb@gmail.com COST ENeL WG2 Meeting Vienna 13/02/2015 revision 06/04/2015
  • 2. General Overview of Project We are creating a set of structural recommendations for TEI lexical dictionaries, including information relevant to: • phonetic and orthographic forms; • grammatical information; • semantic and meta-linguistic information; • variation (on all levels); • etymology; • mono-/bi-/multi-/lingual dictionaries; as well as in dictionaries in which encyclopedic and examples are included; Models involve proposing changes to the TEI P5 guidelines itself and defining our constraints on the TEI in an ODD;
  • 3. Goals for TEI Etymological Markup Recommendations (i) address the lack of sufficient digital markup models and standards for representing etymological information; (ii) coherence in treatment of the same exact linguistic information between synchronic and diachronic data structures; (iii) LMF and ONTOLEX compatible TEI structures; (iv) make better use of linking mechanisms in TEI for: • connecting cited forms in etymology and their project internal sources (where possible); • making use of existing external resources for lexical and information conceptual not internal to a given project or corpus: e.g. open source lexical & ontological knowledge and linked data resources (v) increase diversity in the types of etymological information that can be treated & make more use of concepts from linguistics:
  • 4. 0…n <colloc> <per> <usg> <case> <gram> <pos> <number> <tns> <gen> <mood> Working TEI Dictionary Metamodel (elements) TEI 0…n <quote> 0…n 1…n 0…1 <usg> <cit>sense 0…n 0…n0…1 <gramGrp> 1…10…10…1 0…n0…n 1…n <bibl> 0…n<form> <sense> <orth> <pron> 0…n <seg> <seg> <listChange> 1…n <change> 0…1 <bibl> 0…1 <seg> <oRef> <pRef> <gramGrp> 0…n 0…n 0…n <etym>sense 0…n <etym>entry 0…n <ref> <gloss> 1…n <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> <ptr> <entry> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n <def> <cit>etym <gramGrp> <cit> <num> <cit> <num> <lbl> <num> <lbl> 0…n <c><c>
  • 5. Two Potential Etymology Structures in TEI0…n <quote> 0…n <cit> 0…n0…1 <gramGrp> 1…n <bibl> 0…n <seg> <oRef> <pRef> <ptr> <entry> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n • if there are semantic implications for the etymological change; • no semantic implications for existing lexical items in language the etymological change; <etym>sense <etym>entry • both may occur in the same entry to account for unrelated changes that occurred at different stages; 0…1 0…n 0…n 0…n 0…n <etym>sense 0…n <gramGrp> <ref> <gloss> 1…n <cit> <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> 1…n <def> <num> <cit> <etym>entry <sense> 0…n <usg>
  • 6. <etym>entry • If there are no semantic implications for the etymological change, and/or the semantic change occurred in another language or proto-language stage; 0…n 1…n <entry> <quote> 0…n <cit>sense 0…n0…1 <gramGrp> <bibl> 0…n <seg> <oRef> <pRef> <ptr> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n <sense> • Inheritance ; Phonetic and phonological processes: (non exhaustive) • assimilation (place, manner) ; • epenthesis; • metathasis • erosion/deletion; (apokope, • coalescence; • tone changes; (has own internal categories) • neutralization; • Borrowing*; • lexical item imported from other language; 1…n 0…n 0…n 0…n 0…n 0…n <ref> <gloss> 1…n <cit>etym <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> 1…n <def> <num> <cit> <colloc> <per> <usg> <case> <gram> <number> <gen> <mood> 1…n <num> <gramGrp> <note> <etym>sense <cit> <lbl> <etym>entry <pos> <tns>
  • 7. 0…n <quote> 0…n 0…1 <usg> <cit>sense 0…n0…1 <gramGrp> 1…n <bibl> 0…n <seg> <oRef> <pRef> 0…n 0…n <etym>sense 0…n <ref> <gloss> 1…n <cit>etym <oRef> <pRef> <lang> <lbl> 0…n <ptr> <note> <date> <bibl> <ptr> <entry> 0…n <ref>0…n <spanGrp> <span> <annotationGrp> <annotations> 1…n 0…n 1…n <def> 1…n 0…n Used when there are semantic implications for the etymological change; • *where there are multiple etymological processes that occur and some are semantic in nature and others phonetic, they may all be included in <etym>sense if the former permitted the latter. 1…n <def> <num> <cit> <etym>sense • Metaphor; • Metonymy • Blending*; • Compounding; • Grammaticalizati on; • several of these processes can co-occur; <gramGrp> 0…n 0…n 0…n <etym>entry <num> <lbl> 0…n 0…n <colloc> <pers> <usg> <case> <gram> <pos> <number> <tns> <gen> <mood> 1…n <sense> <num> <lbl>
  • 8. Etymological Processes: Inheritance <entry xml:lang="it" xml:id=“buono"> <form type="lemma"> <orth>buono</orth> <pron notation=“ipa">'bwo.no</pron> <gramGrp> <pos>adj.</pos> <gen>masc.</gen> </gramGrp> </form> <sense> .... </sense> <etym type="inheritance"> <cit type="etymon"> <oRef xml:lang="la">bónŭ</oRef> <gramGrp> <pos>adj.</pos> <gen>masc.</gen <case>nom.</case> </gramGrp> </cit> </etym> </entry> Italian < Vulgar Latin buono < bŏnu synchronic entry diachronic (etymological) entry Note: processes and changes are approximate and meant for demonstrating markup rather than asserting precise etymological diachrony of individual items;
  • 9. Etymological Processes: 2) ˈbonu > ˈbon <entry xml:lang="fr" xml:id="bon"> <form type="lemma"> <orth>bon</orth> <pron notation=“ipa">'bɔ̃</pron> <gramGrp> <pos>adj</pos> <gen>masc.</gen> </gramGrp> </form> <sense> .... </sense> <etym type="inheritance"> <cit type=“etymon" xml:id="bónŭ" next="ˈbon"> <oRef xml:lang="la">bónŭ</oRef> <gramGrp> <case>nom.</case> </gramGrp> </cit> <cit type=“etymon” xml:id="ˈbon" prev=“bónŭ” next="ˈbɔ̃"> <pRef xml:lang=“fro">ˈbon</oRef> </cit> <cit type=“etymon” xml:id="ˈbɔ̃" prev=“ˈbon"> <pRef xml:lang="fro">bɔ̃</oRef> </cit> </etym> </entry> bon < bónŭ French < Vulgar Latin (2) Intermediate phonological change (1) Root level etymological process (3) Final phonological change Inheritance & Phonological Changes Note: processes and changes are approximate and meant for demonstrating markup rather than asserting precise etymological diachrony of individual items; 3) ˈbon > ˈbɔ̃
  • 10. Etymological Processes: Borrowing* Key Linguistic concepts: Description of lexical process:• where a language takes a lexical item from different language; • aka: loaning, importing; • often have historical and practical explanation for need • source language; • source form(s); phonetic, orthographic • importing language; • imported or borrowed form; • semantic/meta-linguistic concept; Source Language: Importing Language: Meta- linguistic Concept: Borrowed Form(s): Source Form(s): orth(i..n) pron(i..n) orth(i..n) pron(i..n)
  • 11. Etymological Processes: Borrowing* <entry xml:id="taxi" xml:lang="jpn"> <form type="lemma"> <orth type="transliterated" notation="romanji">takushī</orth> <orth notation="katakana">タクシー</orth> <pron notation="ipa">taku'shi:</pron> <gramGrp> <pos>noun</pos> </gramGrp> </form> <sense corresp="http://dbpedia.org/page/Taxicab"> <usg type=“dom">transportation</usg> </sense> <etym type="borrowing"> <lbl>source</lbl> <lang>English</lang> <cit type="etymon"> <oRef corresp="http://en.wiktionary.org/wiki/taxi" xml:lang="en">taxi</oRef> <pRef notation=“ipa" corresp=“http://en.wiktionary.org/wiki/taxi#Pronunciation" xml:lang="en-US">'tæksi</pRef> </cit> </etym> </entry> Japanese < English: taxi(cab) Borrowed Form(s): Source Form(s): Meta-linguistic Concept: Importing Language Source Language
  • 12. <cit type=“etymon"><orth @type @notation> <pron @notation> <form type=“lemma"> <gramGrp> <pos> <etym type=“borrowing”> <entry @xml:id> <oRef @corresp @xml:lang> <sense @corresp> <lbl> <lang> <usg type=“dom”> TEI Model for Japanese ‘takushī’ Etymological Process: Borrowing Lexical entry: <pRef @notation @corresp @xml:lang> Ontological resource for entry External lexical entry resource for source term External pronunciation resource for source term
  • 13. Description of process: Key components • Domain of concept (y): Source Domain; • Domain of concept (x): Target Domain Source Concept: Salient Attributes Target Concept: • Lexical innovation based in human cognition; • Describe/understand one concept (x) in terms of concept (y); • Requires a change in semantic domains; • Mapping between concepts is only limited to certain salient attributes; • Results in lexical Polysemy Etymological Processes: Metaphor Source Domain Profile: Domain (x) Target Domain Profile: Domain (y) Lexical Source Form(s) Polysemous Lexical Form(s) phonetic orthographic
  • 14. Etymological Processes: Metaphor Source Concept: bean Target Concept: kidney color shape Source Domain Profile: Legumes Food Target Domain Profile: Body Internal Organs Lexical Source Form(s) [ndù.ʧí] ntuchi Polysemous Lexical Form(s) Mixtepec-Mixtec ‘ntuchi’ (bean > kidney)
  • 15. <entry xml:id="kidney"> <form type=“lemma"> <orth>ntuchi</orth> <pron notation="ipa">ndù.ʧí</pron> <!— gramGrp cluster—> </form> <sense corresp="http://dbpedia.org/resource/Kidney"> ….. <usg type="dom">Body</usg> <usg type=“dom">InternalOrgans</usg> <etym type="metaphor"> <cit type=“etymon"> <oRef corresp="#bean">ntuchi</oRef> <pRef corresp="#bean">ndù.ʧí</pRef> <gloss>bean</gloss> </cit> </etym> <entry xml:id="bean"> <form type=“lemma"> <orth>ntuchi</orth> <pron notation="ipa">ndù.ʧí</pron> <!— gramGrp cluster—> </form> ….. <sense corresp="http://dbpedia.org/resource/Pinto_bean"> <usg type="dom">Legume</usg> <usg type="dom">Food</usg> ……. <!— translation info here—> </sense> </entry> Etymological Processes: Metaphor dbpedia ontology entry for: ‘pinto bean’ dbpedia ontology entry for: ‘kidney’ pointer to entry for ‘bean’
  • 16. <usg type=“dom”> <etym type=“metaphor”> <gloss> <cit type=“etymon”> <lbl> <sense @corresp> <entry @xml:id> <cit type=“translation” @xml:lang> <oRef @corresp> <gramGrp> <pos> <orth> <pron @notation> <form @type=“lemma"> <sense @correp><form type=“lemma”> <usg type=“dom”> <cit @type @xml:lang> <gramGrp> <entry @xml:id> TEI Model for Mixtepec-Mixtec ‘ntuchi’ Etymological process: Metaphor Lexical entry: Source entry: <pRef @corresp @notation> Ontological resource for entry (kidney): Ontological resource for Source entry (bean): <oRef @corresp> <pron @notation> <orth> <pos>
  • 17. Etymological Processes: Metonymy Description of lexical process: Key Linguistic concepts: • concept (y) stands for concept (x); • no change in semantic domains; • one “vehicle” entity provides mental access to another, (i.e. a target) within the same domain.; • source concept (cognitive); • target concept (cognitive); • source form (lexical); • target form (lexical): • results in (synchronic) polysemy Vehicle Concept: Target Concept: Domain (X)
  • 18. Etymological Processes: Metonymy Mixtepec-Mixtec: ‘kiti’ (horse) <entry xml:id=“animal”> <form type="lemma"> <orth>kiti</orth> <pron notation="ipa">kì.tí</pron> <!—gramGrp here —> </form> <sense corresp="http://dbpedia.org/resource/Animal"> <usg type=“dom">Living Beings</usg> <usg type=“dom">Animal</usg> <cit type="translation" xml:lang="eng"> <oRef>animal</oRef> </cit> <!—other translations here —> </sense> </entry> <entry xml:id=“animal-horse”> <form type=“lemma"> <orth>kiti</orth> <pron notation="ipa">kì.t̪í</pron> <!—gramGrp here —> </form> <sense corresp="http://dbpedia.org/resource/Horse"> <usg type=“dom”>Animal</usg> <etym type="metonymy"> <date notBefore="1517"/> <cit type="etymon"> <oRef corresp="#animal">kiti</oRef> <pRef notation="ipa" corresp="#animal">kì.t̪í</pRef> <gloss>animal</gloss> </cit> <note>In this lexical item, the language reflects the history, since there were no horses in Mexico until the arrival of the Spanish, there was no Mixtecan word for 'horse', thus they categorical noun for 'animal' was used to describe the unnamed animal. </note> </etym> <cit type="translation" xml:lang="eng"> <oRef>horse</oRef> </cit> <!—other translations here —> </sense> </entry> Vehicle Concept; entryTarget Concept; entry
  • 19. <usg type=“dom”> <form type=“lemma"> <entry @xml:id> <sense @corresp> <cit type=“translation” @xml:lang> <oRef> <gramGrp> <pos><orth> <pron @notation> <sense @corresp><form type=“lemma”> <pron @notation> <usg type=“dom”> <cit type=“translation” @xml:lang> <gramGrp> <entry @xml:id> TEI Model for Mixtepec-Mixtec ‘kiti’ (horse) Etymological process: Metonymy Lexical entry: Source entry: <etym type=“metonymy”> Ontological resource for entry: Ontological resource for Source entry: <orth> <cit type=“etymon”> <note> <gloss> <oRef @corresp> <pRef @corresp @notation> <date @notBefore> <pos> <oRef>
  • 20. Etymological Processes: Compounding Description of lexical process: • Combines surface forms of two lexical items to form new one; • Become the sum of its lexical and semantic parts; • Can involve metaphor, metonymy, and/or grammaticalization Etymon(i)*: Etymon(ii)*: grammatical info(i) grammatical info(ii) semantic/meta- linguistic info(ii) semantic/meta- linguistic info(ii) etym. process (0..n) etym. process (0..n)
  • 21. Etymological Processes: Compounding (with Metonymy) Salient attribute of location = “the presence of hummingbirds” Mixtepec-Mixtec: Yucha Nchu’u ’Puebla State’ <etym type="metonymy"> <cit type="etymon"> <oRef corresp=“#hummingbird”>Nchu’u</pRef> <gramGrp> <pos>concrete noun</pos> </gramGrp> <gloss>hummingbird</gloss> </cit> </etym> <entry xml:id=“Puebla-state" xml:lang="mix" type="compound"> <form type="lemma"> <orth><seg corresp=“#lake">Yucha</seg> <seg corresp=“#hummingbird”>Nchu’u</seg></orth> <!— <gramGrp> here —>….. </form> Etymon(1): <sense corresp="http://dbpedia.org/resource/Puebla_State"> <etym type="compounding"> </etym> …. </sense> </entry> <cit type="etymon"> <oRef corresp=“#lake”>Yucha</pRef> <gramGrp> <pos>concreteNoun</pos> </gramGrp> <gloss>hummingbird</gloss> </cit> Etymological process(ii): Metonymy (Primary) Etymological process: Compounding Etymon(2):
  • 22. <oRef @corresp> <form type=“lemma"> <gramGrp> <pos> <orth> <seg @corresp> Etymological Processes: Compounding TEI model for Mixtepec-Mixtec “Yucha Nchu’u” <gloss> <cit type=“etymon”> <pos> <gramGrp> <oRef @corresp> <gloss> <cit type=“etymon”> <etym type=“metonymy”> <pos> <gramGrp> Lexical entry: <entry @xml:id type=“compound”> <etym type=“compounding”> <sense @corresp> <seg @corresp> Ontological resource for entry:
  • 23. Alt (2006) LMF etymology extension proposal; merged with the LMF Core package Form Representation Lexical Entry Lexical DB Text Representation Lexical Resource Global Information Statement Form Representation 0…n 1…n 0…1 0…n Etymon Etymological Link Etymology 0…n 1…n 1…n 1…n 0…n Sense 0…n 0…n 0…n 0…n 1…1 Definition
  • 24. pompel limoes +pamplemousse pompelmoes Synchronic Diachronic DutchModern French /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/composition/ /biblSource/=“Boulan, König…” /confidenceScore/=“probable” Etymology of French ‘pamplemousse’: from Trésore de la Langue Française (TFL) Etymological stage Composition (eg., Compounding) Etymological stage Loan Word (eg., Borrowing) /etymon/ /orth/=“pompelmoes” /language/=”nl” /pos/=“commonNoun” /gender/=“feminine” /gloss/=“Citrus Maxima” /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/loan word/ /biblSource/=“TLF” Alt (2006) LMF Etymology Extension: Borrowing Stage /etymon/ /orth/=“limoes” /language/=“nl” /pos/=“commonNoun” /gloss/=“citron” /etymon/ /orth/=“pompel” /language/=“nl” /pos/=“adjective” /gloss/=“gros, enflé”
  • 25. <entry xml:id="LE1" xml:lang=“fr"> <form type="lemma"> <orth>pamplemousse</orth> .... </form> <sense> .... </sense> ….. </etym> </entry> <cit type="etymon" xml:id="L2"> <oRef xml:lang="nl">pompelmoes</oRef> <gloss xml:lang="lat">Citrus maxima</gloss> <gramGrp> <pos>commonNoun</pos> <gen>feminine</gen> </gramGrp> <note>probablement de l’origine tamoule, De Vries, Nederl</note> </cit> <etym type=“borrowing"> ….. <ref target=“#TLF”>TLF</ref> ….. Alt (2006) LMF Etymology Extension: Borrowing Stage Converted TEI Markup Note: our TEI structures do not explicitly use an equivalent of /etymologicalLink/ or “ /source/=“..”/target/=“…” ) as this link is implicitly present in the xml data structure Dutch Modern French pompelmoes pamplemousse /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/loan word/ /biblSource/=“TLF” /etymon/ /orth/=“pompelmoes” /language/=”nl” /pos/=“commonNoun” /gender/=“feminine” /gloss/=“Citrus Maxima” <!— ‘compounding’ section goes here —> ≈
  • 26. pompel limoes +pamplemousse pompelmoes Synchronic Diachronic DutchModern French /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/composition/ /biblSource/=“Boulan, König…” /confidenceScore/=“probable” Etymological stage Composition (eg., Compounding) Etymological stage Loan Word (eg., Borrowing) /etymon/ /orth/=“limoes” /language/=“nl” /pos/=“commonNoun” /gloss/=“citron” /etymon/ /orth/=“pompel” /language/=“nl” /pos/=“adjective” /gloss/=“gros, enflé” /etymon/ /orth/=“pompelmoes” /language/=”nl” /pos/=“commonNoun” /gender/=“feminine” /gloss/=“Citrus Maxima” Alt (2006) LMF Etymology Extension: Compounding Stage Etymology of French ‘pamplemousse’: from Trésore de la Langue Française (TFL) /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/loan word/ /biblSource/=“TLF”
  • 27. ation of Alt (2006) LMF Etymology Extension: Compounding Stage <entry xml:id="LE1" xml:lang=“fr"> <form type="lemma"> <orth>pamplemousse</orth> .... </form> <sense> .... </sense> <etym type="borrowing"> …… ….. </etym> </entry> <etym type=“compounding”> <ref target="#Boulan-König">Boulan, König...</ref> </etym> <cit type="etymon"> <oRef xml:lang="nl">pompel</oRef> <gramGrp> <pos>adjective</pos> </gramGrp> <gloss>gros, enflé</gloss> </cit> <cit type=“etymon"> <oRef xml:lang="nl">limoes</oRef> <gramGrp> <pos>commonNoun</pos> </gramGrp> <gloss>citron</gloss> </cit> /etymon/ /orth/=“pompel” /language/=“nl” /pos/=“adjective” /gloss/=“gros, enflé” /etymon/ /orth/=“limoes” /language/=“nl” /pos/=“commonNoun” /gloss/=“citron” pompel limoes + pamplemousse Historical Dutch Modern French /etymologicalLink/ /source/=“..”/target/=“…” /etymologicalClass/=/composition/ /biblSource/=“Boulan, König…” /confidenceScore/=“probable” <!— ‘borrowing’ section goes here —> Note: our TEI structures do not explicitly use an equivalent of /etymologicalLink/ or “ /source/=“..”/target/=“…” ) as this link is implicitly present in the xml data structure ≈ ≈
  • 28. <lbl> <lang> <sense> 0…n <oRef @xml:lang> <etym type=“borrowing”> <ref @target> <form type=“lemma"> <gramGrp> <pos> <c> <orth> <seg @corresp> Etymological Processes: Borrowing & Compounding TEI model for ‘pompelmousse’ as converted from LMF (Alt 2006) <gloss @xml:lang> <cit type=“etymon”> <gen> <note> <pos> <gramGrp> <oRef @xml:lang> <gloss @xml:lang> <cit type=“etymon”> <etym type=“compounding”> <ref @target> <pos> <gramGrp> Lexical entry: <seg @corresp> <entry @xml:id type=“compound”>
  • 29. Étymol. et Hist. 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet estre (Eneas, 9003, ibid.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1824 peut-être bien (Joubert, loc. cit.); 2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2); 3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2); 4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1); 5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6). Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. <entry xml:id="peut-être" xml:lang="fr" type="compound"> <form type="lemma"> <orth><seg corresp="#pouvoir-3s-pres-ind">peut</seg><c>-</c><seg corresp="#être">être</seg></orth> <gramGrp> <pos>adv.</pos> </gramGrp> </form> … </entry> PEUT-ÊTRE, adv. Encoding from existing sources: synchronic portion of entry Trésor de la Langue Française For “compound” entry types, @corresp can (optionally) be used in the <seg> element to point to the individual sub components of the item within a project or externally;
  • 30. PEUT-ÊTRE,adv. Encoding from existing sources: non-linguistic content portion of diachronic entry …. <etym xml:id=“PEUT-ÊTRE-adv-Étym-et-Hist” > <lbl>Étymol. et Hist.</lbl> <num>1.</num> …… <num>2.</num> ….. <num>3.</num> …… <num>4.</num> ….. <num>5.</num> …… <note> Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. </note> </etym> … Trésor de la Langue Française Étymol. et Hist. 2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2); 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet estre (Eneas, 9003, ibid.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1824 peut-être bien (Joubert, loc. cit.); 3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2); 4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1); 5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6). Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*.
  • 31. PEUT-ÊTRE, adv. Encoding from existing sources: diachronic portion of entry …. <sense> <etym xml:id=“PEUT-ÊTRE-adv-Étym-et-Hist” type="inheritance"> <lbl>Étymol. et Hist.</lbl> <num>1.</num> …… <num>2.</num> ….. <num>3.</num> …… <num>4.</num> ….. <num>5.</num> …… <note> Comp. de peut, 3epers. du sing. de l'ind. prés. de pouvoir* et de être*. </note> </etym> </sense> … Trésor de la Langue Française <cit type="attestation"> <date> </date> <oRef> </oRef> <gramGrp> <!—appropriate element here —> </gramGrp> <bibl> </bibl> <note> </note> </cit> …. template 2. 1636 employé elliptiquement pour répondre évasivement à une question (Corneille, Le Cid, I, 2); 1. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); ca 1160 puet estre (Eneas, 9003, ibid.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1824 peut-être bien (Joubert, loc. cit.); 3. 1775 détaché en fin de phrase, exprimant le défi, l'ironie (Beaumarchais, Barbier de Séville, II, 2); 4. fin xiies. puet estre que (Flore et Blancheflor, éd. J.-L. Leclanche, 407); 1641 peut-estre que (Corneille, Cinna, III, 1); 5. 1637 subst. un peut-estre (Id., La Place royale, IV, 6).
  • 32. Encoding from existing sources: diachronic portion of entry <cit type="attestation"> <date notBefore="1200" notAfter="1250">1re moitié du xiies</date> <oRef xml:lang="fro">put cel estre</oRef> <bibl>(Psautier Oxford, 54, 13 ds T.-L.)</bibl> </cit> Trésor de la Langue Française iso 639-3 code Old French (842-ca. 1400) fro iso 639-3 code Middle French (ca. 1400 - 1600) frm <cit type="attestation"> <date notBefore="1400" notAfter="1450">début xves</date> <oRef xml:lang="frm">peut-estre</oRef> <bibl>(Quinze joies mariage, éd. J. Rychner, XII, 12)</bibl> </cit> <cit type="attestation"> <date when="1824">1824</date> <oRef>peut-être bien</oRef> <bibl>(Joubert, loc. cit.)</bibl> </cit> …. 1re moitié du xiies. put cel estre (Psautier Oxford, 54, 13 ds T.-L.); 1824 peut-être bien (Joubert, loc. cit.); début xves. peut-estre (Quinze joies mariage, éd. J. Rychner, XII, 12); 1.
  • 33. Conclusions and Summary Our TEI recommendations can facilitate: • linking and integrating corresponding data structures between the synchronic and diachronic levels; • the use of open source lexical resources and ontological information; • a more principled and consistent set of TEI guidelines for digitally encoding etymological information; • better compatibility between information traditionally kept, and formatted separately in etymological dictionaries, lexical dictionaries and linguistic analyses; • models for encoding ubiquitous processes of linguistic change for multiple levels of language; • theoretically agnostic data structures; • a more diverse set of etymological examples for the TEI guidelines;

Editor's Notes

  1. > Benefits of consistant and limited set of elements and data organization are that the data can be more easily found and identified, therefore increasing it’s long term reusability and accessibility; (a) (old) synchronic: (<form>) <orth>,<pron>; (old) diachronic: <mentioned>, <gloss>; (new) synchronic: <orth>,<pron>; (new) diachronic: <cit>;<oRef>, <pRef> (allowing text; using @corresp); <gloss>, <gramGrp>.. (b) (old) <xr>, <ref>, <bibl>, (new) ; <ref @target>, <ptr @target>, <bibl> (c) examples from TEI guidelines;
  2. -transparent element indicates that element is not or that it is less essential to model;
  3. Involves only phonetic and/or phonological changes; for when the etymology of the entry does not involve any significant semantic changes and is inherited from a ‘parent’ or ‘proto’ language (at least according to sources consulted or researchers/creating dictionary); - These etymologies are occur as direct children of <entry>; - Their top level <etym> element should have the attribute-value pair ‘type=“inheritance”’; - For each individual phonetic/phonological and/or orthographic change documented (they are not necessarily mutually exclusive as orthography is our best clue as to historical phonetics and phonology) it is possible to include embedded <etym> elements with specific phonological changes as the value of the “@type”; Any combination of changes where at least one involves a change in semantics and/or syntax (but not changes to any other form other than itself (thus no chain shifts or analogical leveling…); - - Can also include phonetic/phonological changes;
  4. Notes: simplified example - only using <oRef> because no recordings (obviously (“misspellings” often help induce info on pronunciation) - (if no iso-639-1/2 code for VL, use date range in attributes;
  5. Notes: simplified example
  6. May contain info relevant to: anthropology; semantics; morphosyntax; phonetics/phonology;
  7. Note: terminology flexible, could call it ‘loanword’, ‘loaning’, ‘importing’ etc.; the main necessity is that it is consistent. Ontologically, we have chosen to make the value of the <etym @type> the etymological process, as opposed to the result, in which case the value of the attribute here would be ‘loanword’; this is inherently implied in the labeling of the process; > the value of @xml:lang should be it the iso language code of the source language; > Loanwords are a good place to keep track <pron> in both the source and target languages because we may gain insight into the (importing) languages’ phonology in how it does or doesn’t change the pronunciation….
  8. -According to cognitive linguistics, metaphor is a universal cognitive process that takes place first at the cognitive level, then at the lexical level. Lexical innovation based in human cognition; Describe/understand one concept (x) in terms of concept (y); Requires a change in semantic domains; Mapping between concepts is only limited to certain salient attributes; Results in lexical Polysemy Domain of concept (y): Source Domain; Domain of concept (x): Target Domain
  9. -in the <oRef> and <oRef> elements, the value of @corresp points to the entry (#bean) within the same document, which is of course the lexical source of ‘kidney’ in Mixtepec-Mixtec; Their identical orthographic and phonetic forms (respectively) are the manifestations of synchronic polysemy in the language; -in the <sense> element the value of @corresp points to an external ontological definition of the respective source and target concepts (dbpedia); the mapping of metaphor concepts could to be done using programmed extraction processes on external ontological sources; (mirroring knowledge base of speakers performing these mental operations) ; however such programed processes have yet to be created -this pointing mechanism not only helps us organize the data information, but it also comes closer to integrating the data structure with at least some of the conceptual structure relevant to these etymological links; optionally, it is possible to provide the etymological process classification in a human readable format eg: <lbl>Metaphor</lbl> (however this is not necessary if the process is marked in the attribute values)
  10. -in the <oRef> and <oRef> elements, the value of @corresp points to the entry (#bean) within the same document, which is of course the lexical source of ‘kidney’ in Mixtepec-Mixtec; Their identical orthographic and phonetic forms (respectively) are the manifestations of synchronic polysemy in the language; -in the <sense> element the value of @corresp points to an external ontological definition of the respective source and target concepts (dbpedia); the mapping of metaphor concepts could to be done using programmed extraction processes on external ontological sources; (mirroring knowledge base of speakers performing these mental operations) ; however such programed processes have yet to be created -this pointing mechanism not only helps us organize the data information, but it also comes closer to integrating the data structure with at least some of the conceptual structure relevant to these etymological links; optionally, it is possible to provide the etymological process classification in a human readable format eg: <lbl>Metaphor</lbl> (however this is not necessary if the process is marked in the attribute values)
  11. Polysemy (first click) highlights the connection between the entry and their lexical source;
  12. Note: target form is the same as source unless process took place in past and target and source forms have since undergone grammaticalization and phonetic/phonological changes
  13. Metonymy: This is not a metaphor because since a horse is an animal, there is no change in conceptual domain as in the previous example; instead this is an example of (whole for part) metonymy; (eg., meronymy); -the use of <date> and <note> here; the former is used to represent the absolute earliest that this lexical innovation could have taken place given that the entity of ‘horse’ was not known to the native peoples of the America’s until the arrival of the Spanish. This date could be further refined with some research as to the year/dates of the first contact between Europeans and Mixtec peoples (present day Oaxaca, Puebla, and Guerrero states, Mexico);
  14. *information included in ‘eytmon’ diagram non-exhaustive; Note: this model’s concepts are also relevant to decomposition of forms derived from multiple morphemes;
  15. *information included in ‘eytmon’ diagram non-exhaustive; Note: this model’s concepts are also relevant to decomposition of forms derived from multiple morphemes;
  16. Etymology is not addressed in the core LMF; Alt (2005) provided an attempt to create an LMF etymology extension; (however it was never finalized and integrated into the LMF) -note: LMF is about to go through a revision and we are working with someone within that community who will be making decisions in that process to make the revisions more compatible with the TEI… thus this LMF Diagram will not be valid for long and will of course be revised once their new standards are published;
  17. NOTE: the diagram is taken from Alt (2005) the following changes and additions were made: -added parent elements /etymologocalLink/ and /etymon/ at the top of lists to reflect the way the XML structure was implemented; - also added were “/orth/“ and the corresponding value also to reflect XML implementation in Alt (2005;) indications of language (modern French - Dutch); specification of ‘stage’ and ‘process’ below diagram; ‘synchronic’ - ‘diachronic’ span below the ‘stage’-‘process’ spelling of */gloss/ was changed from the original */glose/ (LMF) & TEI <form> and <orth> correspond; since the entry ‘pamplemousse’ is an import word the <etym> goes in sense (although in the Alt version it is always outside of sense; LMF element <etymon> is represented in our TEI as the value of the @type attribute in the <cit> element; the second <form> within <etymon> is represented by <oRef> element (which we have proposed to revamp within the TEI); the @xml:lang is used the same way in our <oRef> as it is in Alt’s LMF <orth>; however we use the ISO 693 language code for Dutch, which is “nl”; whereas Alt embeds a separate <sense> element within the <etymon>, we use the native TEI <gloss> with the @xml:lang added and it’s value being “lat” for Latin; we use the TEI <note> in the same manner as the LMF does; we point to the source of the etymological portion that we assume would be in a bibliography entry within the document using the TEI <ref @target> (this was referred to in the Alt (2005) paper as the source but not explicitly referred to in the sample xml entry; we do not use an element to correspond to the <etymologicalLink>, as the relationship of the source etymon and the French entry are implied. Finally, instead of <etymologicalClass>, we use the @type with the value “borrowing” in the <etym> element; in choosing this value we decided to consistently refer to the etymological process rather than the result of that process; (eg. “borrowing” instead of “loanword”); in the original diagram the gloss of ‘pompelmoes (the dutch form) was given in French “gros citron” but in the actual source and in the XML implementation of the entry, the gloss was given in Latin, thus I changed it to the following: /gloss/=“Citrus Maxima”
  18. (LMF) & TEI <form> and <orth> correspond; since the entry ‘pampelmousse’ is an import word the <etym> goes in sense (although in the Alt version it is always outside of sense; LMF element <etymon> is represented in our TEI as the value of the @type attribute in the <cit> element; the second <form> within <etymon> is represented by <oRef> element (which we have proposed to revamp within the TEI); the @xml:lang is used the same way in our <oRef> as it is in Alt’s LMF <orth>; however we use the ISO 693 language code for Dutch, which is “nl”; whereas Alt embeds a separate <sense> element within the <etymon>, we use the native TEI <gloss> with the @xml:lang added and it’s value being “lat” for Latin; we use the TEI <note> in the same manner as the LMF does; we point to the source of the etymological portion that we assume would be in a bibliography entry within the document using the TEI <ref @target> (this was referred to in the Alt (2005) paper as the source but not explicitly referred to in the sample xml entry; we do not use an element to correspond to the <etymologicalLink>, as the relationship of the source etymon and the French entry are implied. Finally, instead of <etymologicalClass>, we use the @type with the value “borrowing” in the <etym> element; in choosing this value we decided to consistently refer to the etymological process rather than the result of that process; (eg. “borrowing” instead of “loanword”);