SlideShare a Scribd company logo
1 of 8
Download to read offline
ArtsSemNet: From Bilingual Dictionary to Bilingual Semantic Network
                        Ivanka Atanassova1, Svetlin Nakov2, Preslav Nakov3
   (1)
         University of Veliko Turnovo “St. Cyril and St. Methodius”, V. Turnovo, Bulgaria
                 (2)
                     Sofia University “St. Kliment Ohridski”, FMI, Sofia, Bulgaria
            (3)
                University of California at Berkeley, EECS, Berkeley CA 94720, USA
         (1)                         (2)                                        (3)
           ivanka49@mail.bg,            nakov@fmi.uni-sofia.com,                   nakov@eecs.berkeley.edu


Abstract: The paper presents two bilingual lexicographical
                                                                 research: Bulgarian and Russian. Although initially we
resources for the terminology of fine arts: the ArtsDict elec-   focused on Bulgarian, Russian support has been added
tronic dictionary and the ArtsSemNet semantic network, and       for two reasons: to illustrate the multilingual support (at
describes the process of transformation of the former into the   present the dictionary interface is bilingual, while the
latter. ArtsDict combines a broad range of information sour-     semantic network allows several languages to be used
ces and is currently the most complete dictionary of fine arts   in parallel) and to make use of the rich language materi-
terminology for both Bulgarian and Russian: not only elec-       al for Russian we already had. Adding other Balkan
tronic, but also in general. It contains 2,900 Bulgarian and     languages in combination/instead of Bulgarian/Russian
2,644 Russian terms, each annotated with complete dictiona-      would be attractive, once the necessary data is collected
ry definitions. These are further augmented with various ter-
                                                                 and made available.
minological relations (polysemy, synonymy, homonymy, anto-
nymy and hyponymy) and organised into a bilingual semantic
network similar to WordNet. In addition, a specialised hyper-    2. ARTSDICT: Bilingual Terminological
text browser is implemented in order to enable intuitive query   Dictionary
and navigation through the network.
                                                                 ArtsDict has been created in order to allow for easy cre-
Keywords: semantic network, terminology, polysemy, homo-
                                                                 ation and usage of parallel bilingual terminological dic-
nymy, hyponymy, antonymy, synonymy.
                                                                 tionaries for the purpose of lexicographical research.
                                                                 The dictionary data consists of a set of navigable dictio-
1. Introduction                                                  nary entries: a term (single-word term, SWT or multi-
The contemporary dictionary development has been de-             word term, MWT) and one or more glosses describing
eply affected by the wide spread of personal computers.          its sense(es). The main screen of ArtsDict is split both
Nowadays, a fast growing number of users already for-            horizontally (between the dictionaries) and vertically:
got the annoying lookups in huge paper-based dictiona-           the SWT and MWT, including doublets and variants,
ries and started using their computer equivalents. Altho-        appear on the left in alphabetical order, while their glos-
ugh the first computer dictionaries were often worse             ses are listed on the right. Although the user interface
than the traditional ones their potential was out of ques-       imposes no such restrictions, we enforced strict rules
tion. As early as in 1992 the creators of the Oxford Eng-        for the contents of the separate fields. For example, af-
lish Dictionary [OED] invested $13.5 millions in a five          ter the term we add in brackets its origin, when it is a
years project to enable the development of an electronic         foreign word, and the form for singular, when it is
version. It soon became clear that the computer dictio-          presented in plural. The doublets1 and variants2 appear
naries could potentially provide by far richer capabiliti-       horizontally comma separated after the term. Similarly,
es. In the mean time, some other resources, such as the-         after a neutral term its stylistic relative synonyms are
sauri, arose (e.g. the Roget’s thesaurus [RT]), which
provided the users with synonymy information. Soon,              1
                                                                   We consider the doublets and the variants as absolute syno-
the lexicographers started combining dictionaries and            nyms, the difference being that the former share the same
thesauri, which resulted in semantic networks (e.g.              root, while the latter do not.
                                                                 2
WordNet [Fellbaum,1998; Miller&al.,1990; WordNet]),                In fact the phonetic and orthographic variants are lexico-
including not just term glosses and synonyms lists, but          grammatical variants of the same word (allolexes), not dis-
also links to antonyms, hyponyms etc.                            tinct words (synonyms). We treat them as separate words (i.e.
     The work presented below progressed in a similar            synonyms) for two reasons: 1. to preserve the unified appro-
fashion: we started with electronic dictionaries and later       ach to all groups of variant, which represent distinct words or
                                                                 terminological collocations; 2. because the phonetic and gra-
transformed them into semantic networks with various
                                                                 phemic variants could be stylistic relative synonyms. It is not
terminological relations. We concentrated on the fine            possible for the lexico-grammatical variants of a word to be
arts terminology for two closely related and easy-to-            related to different styles, e.g. in the fine arts terminology: б.
combine Slavonic languages suitable for a comparative            зограф – изограф (the dialect for зографа).
Минерал, разновидност на берила, силикат на берилия и алуминия,
   Аквамарин (нем. Aquamarin, по лат.
                                                скъпоценен камък, с цвят от светлозелен до небесносин, използван като
   aqua 'вода' + marinus 'морски')
                                                материал за художествени изделия.
                                                   1. Акварелни бои - бои, състоящи се от пигмент и свързващо вещество
   Акварел (рус. акварель, фр. aquarelle,       (растително лепило с примеси на мед, захар, глицерин);
   от ит. acquarello, от лат. aqua 'вода')         2. Акварелна техника - живописна техника, използваща акварелни бои;
                                                   3. Произведение на живописта, изпълнено с акварелна техника.
                                                   Разновидност на портретния жанр, включваща портрети, изпълнени в
   Акварелен портрет
                                                акварелна техника.
   Акварелист (от ит. acquarello)                  вж. Художник-акварелист
   Акварелистка (от акварелист, от ит.
                                                   вж. Художничка-акварелистка.
   acquarello)
   Акварелна техника                               вж. Акварел във 2 знач.
   Акварелни бои, Водни бои                        вж. Акварел в 1 знач.

                                  Table 1. Extract from the Bulgarian dictionary contents.

                                                   Минерал, прозрачная разновидность берилла, синевато-зеленой или
   Аквамарин (нем. Aquamarin, по лат.
                                                голубой окраски, драгоценный камень, применяемый как материал для
   aqua marina 'морская вода')
                                                художественных изделий.
   Акварелист (ит. acquarello)                     см. Художник-акварелист.
   Акварелистка (от акварелист, от ит.             см. Художница-акварелистка.
   acquarello)
                                                   1. Красочный материал, предназначенный для акварельной живописи,
                                                состоящий из пигмента и большого процента клеящих веществ в качестве
                                                связующего (которым служит растительный клей с примесью меда,
   Акварель (фр. aquarelle, ит.
                                                сахара, глицерина);
   acquarello, от лат. aqua 'вода')
                                                   2. Техника живописи, выполняемая акварельными красками;
                                                   3. Произведение искусства, выполненное акварельными красками в
                                                соответствующей технике.
   Акварельная живопись                            см. Акварельная техника.
   Акварельная техника, Акварельная                см. Акварель во 2 знач.
   живопись, Живопись акварелью,
   Живопись водяными красками
   Акварельные краски (ед. ч. краска),             см. Акварель в 1 знач.
   Водяные краски

                                      Table 2. Extract from the Russian dictionary contents.

   Олово             Тежък мек ковък метал със сивосинкав цвят, използван като материал за художествени произве-
   (Bulgarian)    дения.
   Олово            Химический элемент, мягкий, ковкий, серебристо-белый металл, применяемый в изобразитель-
   (Russian)      ном искусстве как материал для художественных изделий. На български се превежда калай.

                                  Table 3. Example of translingual homonymy (Russian).
listed, since they represent the same notion (again com-              We would like to note that the dictionaries presen-
ma separated).                                                   ted here are the most complete fine arts terminological
     The presented arrangement of variants, doublets             ones for both Bulgarian and Russian and have been bui-
and stylistic synonyms allows equivalent terms in the            lt using a broad range of resources: scientific, popular-
two dictionaries (i.e. the two languages) to be examined         scientific, fine arts, publicist, social-political and other
in parallel, for the short entries, and sequentially, for the    (journals, specialised scientific and popular-scientific li-
longer ones (see Tables 1, 2). The parallel exploration          terature, catalogues, etc., [Flerov,1981; Odnoralova,
simplifies not only the unification of the dictionaries          1982; Pavlovsky, 1975; Tsonev,1957; Vinner,1954]). In
(by means of addition the corresponding equivalent: see          addition, Russian and Bulgarian dictionaries have been
Table 5) but also the search for translingual homonyms           used: terminological (e.g. [SDFAT,1965; SDFAT,
(see Table 3).                                                   1970]), encyclopaedic (e.g. [EFAB,1987]), orthographi-
Figure 1. Screenshot from ArtsDict.

cal, etymological, dictionaries of foreign words, terms          •   absolute synonyms chains: 483 Bulgarian and
lists of fine arts sources etc. Terminological terms, pro-           458 Russian;
fessional slang and nomenclatures are grouped together           •   relative synonyms chains: 136 Bulgarian and
and considered within a unified terminological frame-                114 Russian;
work (see [Atanasova,2003] for details).
                                                                 •   homonyms: 14 Bulgarian and 6 Russian;
3. ARTSSEMNET: Semantic Network                                  •   polysemous words: see Table 4.

  3.1. Creation                                                   The direct extraction of homonyms, synonyms (sty-
  The ArtsSemNet semantic network was built around           listic and relative) and polysemous terms from the dicti-
the ArtsDict dictionaries contents. For the purpose, we      onary entries was simplified because of the organisation
investigated and completely annotated (manually, but         of ArtsDict. The hyponyms and antonyms posed a prob-
with a partial computer automation using a formal and a      lem though. For the extraction of hyponyms sharing a
semantic techniques described below) several important       common term-element (root/stem, affix, word as a com-
terminological relations: polysemy, homonymy, syno-          ponent of MWT or another complex word, MWT), not
nymy, antonymy and hyponymy. As a result a semantic          necessarily shared also by the hypernym, a formal tech-
network of the type of WordNet, hierarchically organi-       nique was used. ArtsDict was given a hyponym/hyper-
sed around the hyponymy relation, was obtained. At the       nym, expressed through SWT or MWT, and it produced
moment of preparation of the paper it contained:             chains of SWT and MWT containing the target term-
                                                             element. These were further investigated and the hypo-
    •   lexemes: 2,900 Bulgarian and 2,644 Russian;          nyms were sieved by the lexicological researcher
    •   hyponyms chains: 276 Bulgarian and 283 Rus-          [Atanassova&al.,2002]. A similar technique was used
        sian;                                                to facilitate the extraction of antonyms sharing a com-
                                                             mon term-element as well as for shared-root synonyms
    •   antonyms chains: 157 Bulgarian and 134 Rus-
                                                             (also with common suffix or prefix).
        sian;
3.2. Functionality
  Senses
                  1       2      3     4    5    6    7       The primary purpose of ArtsSemNet is to assist the lexi-
  count
  Bulgarian     2,571    273    49     4    2    1    0       cographer with his work by providing him with a tool
  Russian       2,313    263    56     9    2    0    1       for fast and easy access to rich fine arts terminology
                                                              (see [Atanassova&al.,2003]). When a search for a parti-
               Table 4. Terms polysemy.                       cular term is performed ArtsSemNet displays its glosses,
                                                              homonyms, synonyms (both absolute and relative) and
   For the extraction of hyponyms sharing no term-ele-
                                                              synonyms chains, antonyms and antonyms chains, as
ment we used latent semantic analysis (LSA). This is a
                                                              well as hyponyms chains the target term is part of (both
popular technique for indexing, retrieval and analysis of
                                                              as hyponym or hypernym). ArtsSemNet offers a clean
textual data, and assumes a set of mutual latent depen-
                                                              and intuitive interface. The user can input a term to be
dencies between the terms and the contexts they are us-
                                                              explored, change the language being used or specify
ed in. This permits LSA to deal successfully with syno-
                                                              different search criteria. The information displayed for
nymy and partially with polysemy, which are the major
                                                              a given term includes:
problems with the word-based text processing tech-
niques (due to the freedom and variability of expressi-           •   term glosses list;
on). LSA is a two-stage process including learning and            •   homonyms list;
analysis. During the learning phase it is given a text col-
lection and it produces a real-valued vector for each             •   absolute synonyms chains;
term and for each document. The second phase is the               •   relative synonyms chains;
analysis when the proximity between a pair of docu-               •   antonyms chains;
ments or terms is calculated as the dot product between
                                                                  •   hyponyms chains with the target term as a hy-
their normalised LSA vectors (see [Landauer&al.,1998]
                                                                      pernym;
for an introduction to LSA).
     We tried to use as features raw or segmented words           •   hyponyms chains with the target term as a co-
(after stop-words and infrequent words removal; the                   hyponym.
SWT and MWT from the dictionary were considered as                 The system offers several options: whether the term
single words) and the former have been found to be mo-        is to be searched exactly or partial matches should be
re suitable for our task (see [Atanassova&Nakov,2001a]        considered as well (e.g. root or prefix); whether the
for details). During both training and analysis the engi-     homonyms, synonyms and synonyms chains, antonyms
ne has been used with one language at a time: Bulgarian       and antonyms chains, and hyponyms and hyponym
or Russian.                                                   chains should be displayed.
     In the analysis phase, LSA was given a hyponym or             Glosses are presented as plain text one per line with
a hypernym, expressed as SWT or MWT, and it produ-            numbers added in front, in case there is more than one
ced a ranked list as a result, sorted according to the se-    gloss for the target term. Homonyms are listed one per
mantic proximity to the target. The lexicographer ma-         line. Absolute synonyms, relative synonyms and anto-
nually investigated the result and kept only the true hy-     nyms are hyphen-separated. If a relative synonym of the
ponyms. Although LSA was intended to focus on hypo-           target term has some absolute synonyms these are listed
nyms with no shared term elements the returned list co-       after it comma-separated. So are the absolute synonyms
uld possibly contain such, as long as they are conside-       of the antonyms.
red semantically close enough by the LSA engine (see               Hyponyms chains are listed as terms lists where the
[Nakov&Atanassova,2001]).                                     hypernym is displayed first, followed by its hyponyms.
     The dualistic nature of LSA allowed us to measure        Again, if a term has absolute synonyms, these are sho-
the proximity not only between terms (SWT or MWT)             wn along with it separated by commas. If a polysemous
but also between their glosses (see [Atanassova&Na-           term is the hypernym of more than one hyponyms chain
kov,2001b]). We used as target the glosses of the target      the corresponding gloss is displayed in brackets for
hypernym (or the glosses of some of its known hypo-           each of them. This is similar to the synsets in WordNet.
nyms) but also the hypernym itself (using some of its         The user interface allows also displaying separately
known hyponyms was another option we found useful).           each hyponym, which is the hypernym of hyponyms
In the latter case we compared it against the term vec-       chains of its own as well as showing these chains.
tors while in the former – against the document vectors.           In any case, when the terms lists are displayed each
Querying using terms performed better but the two vari-       distinct one is presented as a hyperlink. When the latter
ants have been used in parallel since they proposed dif-      is followed the target term changes and the correspon-
ferent arrangement of the potential hyponyms and each         ding information about the new one is displayed (it in
of them was useful for the lexicographer who was not          turn contains hyperlinks to other terms and so on). The
willing to miss any potential hyponym.                        navigation mechanism is similar to the one provided by
Figure 2. Screenshot from ArtsSemNet.
a standard Web browser: even the standard forward and        WordNet are represented as one or more synsets (i.e. sy-
backward buttons are present, visualised as left and         nonym sets). A synset groups a term with some of its
right arrows, so that the user can navigate back to the      synonyms, which taken as a whole represent a particu-
already visited terms and then can go forth. Figure 2        lar lexical sense of that term (see [Fellbaum,1998; Mil-
shows ArtsSemNet after a successful search for the Bul-      ler&al.,1990]). A lexically ambiguous term is included
garian term надлъжна гравюра.                                in more than one synsets: one for each of its senses
     ArtsSemNet is implemented in Borland Delphi us-         (according to the sense granularity level chosen by the
ing the relational database management system Micro-         network). The synsets are hierarchically interconnected
soft Access for the storage and retrieval of the fine arts   according to the hyponymy and the meronymy (part-
terminological terms, designed in a way to ensure effi-      whole) relations and are further distinguished by more
cient processing for the kinds of queries needed.            specific properties. The work on the project continues
                                                             and the latest version 2.0 of WordNet includes 115,424
4. Related Work                                              synsets – 79,689 nouns, 13,508 verbs, 18,563 adjectives
                                                             and 3,664 adverbs [WordNet]. WordNet is among the
WordNet. WordNet has been developed by psycholin-            most important resources for natural language proces-
guists from the Cognitive Science Laboratory of the          sing, machine translation, word sense disambiguation,
Princeton University as a computational model of the         information extraction, information retrieval etc.
human lexical memory. Since then the project evaluated
into a general lexical reference system comprising thou-     EuroWordNet. The success of WordNet provoked inte-
sands of words and their corresponding glosses, organi-      rest in the development of similar resources for other
sed into a semantic network. The terms (lexemes) in          languages. In 1996 the European Commission funded
1. Един от жанровете на изобразителното изкуство, който изобразява битови предмети, зеленчуци,
 Натюрморт
                 плодове, убит дивеч, цветя и др.;
 (Bulgarian)
                 2. Отделно произведение от този жанр.
                 1. Один из жанров изобразительного искусства, посвященный воспроизведению предметов обихода,
 Натюрморт
                 снеди (овощи, мясо, битая дичь, фрукты), цветов и пр.;
 (Russian)
                 2. Отдельное произведение этого жанра.

                                 Table 5. Parallel notions in Bulgarian and Russian.
the EuroWordNet project, covering 7 European langua-        as entities of their own but as synsets. Although this is a
ges in parallel (see [EuroWordNet; Vossen,1998]): Cze-      clean way to express the lexical relations as holding
ch, Dutch, Estonian, French, German, Italian and Spa-       between senses and not between the terms themselves,
nish. Each part of EuroWordNet uses its own language-       it is also partly due to the fact that WordNet was desig-
specific synsets but all are interconnected by means of a   ned for English where the same word could often be-
common index based on WordNet, so that the navigati-        long to several different parts of speech (e.g. noun, ad-
on between the similar words in different languages is      jective and verb), which implies different senses accor-
possible in all directions. While the EuroWordNet pro-      ding to WordNet. This is highly unlikely for Slavonic
ject was finished in 1999 (as opposed to WordNet whi-       languages: while they are rich in homographs, these in-
ch has always been active) the work on other European       volve mostly inflected wordforms and only occasional-
languages continues. There are already WordNets avai-       ly hold between two or more lemmas. In addition, at
lable for Basque, Portuguese and Swedish. Under deve-       present ArtsSemNet focuses on nouns only, while the
lopment are ones for Bulgarian, Danish, Greek, Icelan-      homographs in the Slavonic languages involve mostly
dic, Latvian, Moldavian, Norwegian, Romanian, Russi-        words with different POS.
an (see [RWN]), Serbian, Slovenian, Swedish and Tur-             The synset organisation of WordNet implies also
kish. Several non-European languages have projects un-      some interface differences. When the user enters a que-
der development (see the Web page of the Global             ry word, WordNet displays all synsets it is included in
WordNet Association for details, [GWA]).                    along with their glosses. In addition, the synonyms, co-
     There have been also some attempts to integrate do-    hyponyms, hyponyms and hyponyms chains, mero-
main-specific terminologies into EuroWordNet [Magni-        nyms/holonyms, antonyms and coordinated words can
ni&Speranza,2001; Stamou&al.,2002].                         be shown. All this information is related to the corres-
                                                            ponding synsets of the target. A summary of the major
BalkaNet. This is an ongoing project whose aim is the
                                                            differences between ArtsSemNet and WordNet follows:
creation of a multilingual lexical database consisting of
WordNets for the following mostly Balkan languages:              • ArtsSemNet is term-centred, while WordNet is
Greek, Turkish, Romanian, Bulgarian, Czech and Serbi-       built on synsets (senses). ArtsSemNet includes some
an (in fact Czech is not a Balkan language, but is Slavo-   internal organisation similar to synsets as well but only
nic just like Bulgarian and Serbian). The objective is to   when it is really needed to split the term for a particular
collect some 15,000 comparable synsets (around 30,000       relation (e.g. hyponymy, see Tables 6,7). The synsets
literals) in each language, covering generic vocabulary,    do not necessarily correspond to different glosses. Even
distributed into the following POS categories: 65%          when a term has different glosses (i.e. senses) this does
nouns, 25% verbs, 5% adjectives and 5% adverbs (see         not imply that this will make difference for all the rela-
[BalkaNet]). The data will be later incorporated into       tions it is involved in (e.g. due to systematic relations).
EuroWordNet.                                                If one followed the WordNet approach for a focused
     The first attempts to build a Bulgarian WordNet        domain-specific terminological network this would re-
focused on automatic construction from English-Bul-         sult in several parallel sense-sense relations (see Tables
garian and Bulgarian-English electronic dictionaries        6,7), which we wanted to avoid.
(see [Nikolov&Petrova,2001]). For the BalkaNet project           • WordNet does not distinguish between absolute
though, everything has been created from scratch. At        and relative synonyms as ArtsSemNet does, which, in
the moment of preparation of the present paper the          our opinion, is an important distinction for a domain-
Bulgarian WordNet contained about 8,000 synsets (see        specific terminology. Examples of absolute synonyms:
[BWN]).                                                     Bulgarian (готически стил – готика; изумруд – сма-
                                                            рагд; историческо платно – историческа картина;
5. ARTSSEMNET and WORDNET                                   накити – бижу; торсо – торс; морски пейзаж – ма-
                                                            рина; разяждане – ецване) and Russian (муштабель
WordNet and ArtsSemNet have similar functionality but
                                                            – палка; арабеска – арабеск; барбы – заусенцы; вос-
there are also some important differences. As we menti-
                                                            ковая живопись – энкаустика; гематит – кровавик;
oned above, the terms in WordNet are represented not
                                                            отпечаток – оттиск; оклад – басма; мягкий краке-
Градски пейзаж – Исторически пейзаж – Морски пейзаж, Марина – Парков
  Пейзаж, Ландшафт (жанр)
                                        пейзаж
  Пейзаж, Ландшафт (произведение)       Ведута – Морски пейзаж, Марина
                                        Автопортрет – Акварелен портрет – Бюст, Бюстов портрет – Групов портрет
                                        – Кавалетен портрет – Камерен портрет – Ктиторски портрети – Параден
  Портрет (жанр)
                                        портрет – Психологически портрет – Скулптурен портрет – Социален портрет
                                        – Фаюмски портрет – Херма
  Портрет (произведение)                Автопортрет – Бюст, Бюстов портрет – Херма

                             Table 6. Pseudosynsets and parallel homonymy in Bulgarian.

  Перо (инструмент)                     Гусиное перо – Рейсфедер – Рондо – Тростниковое перо, Калам
  Перо (техника)                        Гусиное перо – Тростниковое перо, Калам

                              Table 7. Pseudosynsets and parallel homonymy in Russian.
люр – плывучий кракелюр). Examples of relative syno-        es in Bulgarian are: клееварка (клеянка), портретная
nyms: Bulgarian (бристол – ватман – торшон; куке-           (room for portraits), резьба по газопенобетону, резь-
ри – бабугери; мартеница – китица – гадалушка;              ба по ганчу, хохломская роспись (хохлома), палехс-
пафти – чапрази – куки; златарство – куюмджий-              кая миниатюра, сграффито с инкрустацией цвет-
ство; ножарство – бучакчийство) and Russian                 ных штукатурок. Some terms specific to Bulgarian
(мастихин – шпатель; картинная галерея – пинако-            include: каменина, ковано желязо, пастирска резба
тека; гиацинт – жёлтый яхонт; рубин – красный               (овчарска резба), чипровски килим. Another source of
яхонт).                                                     differences is the language-specific deficiency of whole
     • WordNet does not explicitly distinguish betwe-       classes of terms, e.g. particular female professionals:
en homonymy and polysemy, which has been shown im-          Bulgarian-only (графичка, декораторка, дизайнерка,
portant for some applications, e.g. information retrieval   експресионистка, калиграфка, керамичка, маринис-
(see [Krovetz,1993]).                                       тка, натуралистка, реставраторка) and Russian
     • ArtsSemNet does not support the meronymy/ho-         only (лепщица, медальерка, миниатюристка, силу-
lonymy relation (“X is part of Y”), present in WordNet.     этистка, юмористка). Unlike EuroWordNet, which
This is because we follow the Bulgarian and Russian         is a general semantic network, we wanted to build one
linguistics tradition, where meronymy is considered as      that is both specialised and as complete as possible. We
a special kind of hyponymy/hypernymy and not a sepa-        were not willing to sacrifice coverage in some langua-
rate relation.                                              ge, for the sake of cross-language index.
     • The user interface of WordNet does not provide
automated hyperlink-based navigation between terms          6. Availability and Usage
(as ArtsSemNet does), but has a programming interface.      Both ArtsDict and ArtsSemNet are freely available for
ArtsSemNet is kept in a relational database, which al-      research purposes and the latest versions can be found
lows a simple programming access, although a speciali-      on the Web (the applications and database for Bulgarian
sed interface is not supported at the moment.               and Russian): www.cs.berkeley.edu/~nakov/artssemnet.
     • ArtsSemNet supports both Bulgarian and Russi-            There are two variants of distribution: 1) Microsoft
an, while the original WordNet is for English only (and     Access .mdb file; and 2) SQL-script to create the
EuroWordNet supports another set of 7 European              database schema and populate the data. The first one is
languages, but at the moment – neither Bulgarian nor        oriented to Windows applications and is suitable even
Russian, but these are already under development).          for users that are not familiar with relational databases.
     We would like to point out that we have two sepa-      The second variant could be used by a software develo-
rate networks though without links between them. Al-        per to import the data into a standard RDBMS (e.g.
though they are accessed via the same interface, so that    MySQL, Oracle, SQL Server) and then access it using
a term can be looked up in either language (a lot of the    his/her favourite programming language (e.g. Java,
terms are present in both, but do not necessarily repre-    Perl, C++, C#).
sent parallel notions /Table 5/, but also translingual          Technically, the software part of ArtsSemNet (both
homonyms /Table 3/ etc.), there is no common index.         the application and the database) is not limited in any
This is because of problems due to language-specific        way neither to Bulgarian/Russian nor to fine arts termi-
terminology (crafts, materials, instruments, techniques)    nology. It can be used with any terminology in any lan-
originating from differences of culture, traditions, cli-   guage (except when the alphabet used may be of con-
mate etc. Examples for Russian terms with no analogu-       cern, e.g. Chinese) as long as information about the
terms, glosses and relations is available. Since the data               [Fellbaum,1998] Fellbaum C. (ed.). WordNet: An Electronic
is currently stored in format that is compatible with MS             Lexical Database, MIT Press, 1998.
Access, it can be used as an alternative way to explore                 [Flerov,1981] Flerov A. Material Knowledge and Technology of
                                                                     the Artistic Treatment of Metals (Russian: Материаловедение и
and edit the data, to add a new term, gloss or relation,             технология художественной обработки металлов). Vysshaya
even a new language. The changes will be then automa-                shkola. Moscow, 1981.
tically recognised and ready to use by the ArtsSemNet                   [GWA] Global WordNet Association:
interface presented above.                                           http://www.globalwordnet.org/
                                                                        [Krovetz,1993] Krovetz R. Viewing Morphology as an Inference
7. Future Work                                                       Process. Proc. 16th ACM SIGIR Conf. on R&D in IR. pp. 191-202.
                                                                     ACM. New York. 1993.
There are several directions for further improvement                    [Landauer&al.,1998] Landauer T., P. Foltz, D. Laham. Introduc-
and development of ArtsSemNet. First of all, some mi-                tion to LSA. Discourse Processes, vol. 25, pp. 259-284, 1998.
nor functional additions are possible: e.g. enable direct               [Magnini&Speranza,2001] Magnini B., Speranza M. Integrating
search for co-hyponyms. Second, it would be good to                  Generic and Specialized Wordnets. Proc. Euroconference RANLP.
                                                                     pp. 149-153, Tzigov Chark, Bulgaria, 2001.
provide a more intuitive navigation: e.g. display the hy-
ponymy hierarchy in the form of tree/graph(s) thus                      [Miller&al.,1990] Miller G., Beckwith R., Fellbaum C., Gross D.,
                                                                     Miller K. Introduction to WordNet: An on-line lexical database.
providing a better visual idea of the relations holding              Journal of Lexicography, 3(4), pp. 235-244, 1990.
between the different terms. Other relations, e.g. holo-               [Nakov&Atanassova,2001] Nakov P., Atanassova I. Automatic
nymy can also benefit from a hierarchical visualisation.             hyponymy extraction from Bulgarian and Russian terminological
A suitable graphical representation similar to the one               dictionaries. Proc. Naval Scientific Forum, vol. 3, pp.327-335.
used in the QuickGO browser (see [QuickGO]) for the                  Varna, Bulgaria, 2001.
Gene Ontology Web interface is another interesting op-                 [Nikolov&Petrova,2001] Nikolov T., K. Petrova. Towards Buil-
tion. It would be good to allow for editing/adding/dele-             ding Bulgarian WordNet. Proc. Euroconference Recent Advances in
                                                                     Natural Language. Eds. G.Angelova, K.Bontcheva, R.Mitkov, N.Ni-
ting terms, glosses and relations directly from the brow-            colov, N.Nikolov. pp.199-203, Tzigov Chark, Bulgaria, 2001.
ser interface. It would be also nice to try to interconnect            [Novikov,1982] Novikov L. Semantika russkogo yazyka
(maybe partially) the two languages similarly to                     (Семантика русского языка). Vysshaya shkola. Moscow, 1982.
EuroWordNet. Adding more languages is another possi-                    [Odnoralova,1982] Odnoralova N. Sculpture and Sculptural
bility.                                                              materials. (Russian: Скульптура и скульптурные материалы.),
                                                                     Izobrazitel’noe iskusstvo. Moscow, 1982.
8. References                                                           [OED] Oxford English Dictionary http://www.oed.com
                                                                        [Pavlovsky,1975] Pavlovsky A. Monumental Decorative Arts
   [Atanasova,2003] Atanasova I. Fine Arts Terminology in Russian    Materials and Technique (Russian: Материалы и техника
and Bulgarian (semasiological and onomasiological aspect. Ph.D.      монументально-декоративного искусства). Sovetsky hudozhnik.
thesis. Veliko Turnovo, Bulgaria, 2003.                              Moscow, 1975.
   [Atanassova&al.,2003] Atanassova I., S. Nakov, P. Nakov. Arts-       [QuickGO] QuickGO: GO Browser http://www.ebi.ac.uk/ego
SemNet: A Bilingual Semantic Network for Bulgarian and Russian
Fine Arts Terminology. Proceedings of BulMET, Varna, Bulgaria,          [RT] Roget’s Thesaurus: http://www.bartleby.com/thesauri
2003                                                                    [RWN] Russian WordNet:
   [Atanassova&al.,2002] Atanassova I., Nakov P, Nakov S. In-        http://www.phil.pu.ru/depts/12/RN/Main.html
formation Technologies Helping the Linguist-Explorer. Proc. VIIIth     [SDFAT,1970] Short Dictionary of Fine Arts Terminology (in
International Simposium MAPRIAL 2002. pp. 304-309. Veliko Tur-       Bulgarian: Кратък речник на термините в изобразителното
novo, Bulgaria, 2002.                                                изкуство). Bulgarski Hudozhnik. Sofia, 1970.
   [Atanassova&Nakov,2001a] Atanassova I., Nakov P. The Impact          [SDFAT,1965] Short Dictionary of Fine Arts Terminology (Rus-
of the Segmentation on the Automatic Hyponyms Extraction from        sian: Краткий словарь терминов изобразительного искусства).
Terminological Dictionaries. Proc. Conference on Contemporary        Sovremenniy hudozhnik. Moscow, 1965.
Achievements in the Philological Sciences and the Foreign               [Stamou&al.,2002] Stamou, S., Ntoulas, A., Kyriakopoulou, M.,
Language University Education. Veliko Turnovo, Bulgaria, 2001.       Christodoulakis D. Expanding EuroWordNet with Domain-Specific
   [Atanassova&Nakov,2001b] Atanassova I., Nakov P. Term and         Terminology Using Common Lexical Resources: Vocabulary
Document from the Point of View of the Latent Semantic Analysis.     Completeness and Coverage Issues. Proc. First International
Proc. International Conference “Technologies, Safety and Ecology”,   WordNet Conference. Mysore, India, 2002.
pp.(69)193-205. Veliko Turnovo, Bulgaria, 2001.                         [Tsonev,1957] Tsonev K. Painter’s Technical Guide (Bulgarian:
    [BalkaNet] BalkaNet: http://www.ceid.upatras.gr/Balkanet/        Технически наръчник на художника). Nauka i izkustvo,1957.
    [BWN] Bulgarian WordNet: http://www.ibl.bas.bg/balk_en.htm          [Vossen,1998] Vossen P. (ed.). EuroWordNet: A Multilingual
    [EFAB,1987] Encyclopaedia of Fine Arts in Bulgaria (Bulgarian:   Database with Lexical Semantic Networks, Kluwer Academic
Енциклопедия на изобразителните изкуства в България.) vol. I-        Publishers, Dordrecht. 1998.
II. Sofia, 1987.                                                        [Vinner,1954] Vinner A. Art of Painting Materials. (Russian:
    [EuroWordNet] EuroWordNet:                                       Материалы живописи). Sovetsky hudozhnik. Moscow, 1954.
http://www.illc.uva.nl/EuroWordNet/                                     [WordNet] http://www.cogsci.princeton.edu/~wn

More Related Content

Similar to Atanassova I., Nakov S., Nakov P., ArtsSemNet: From Bilingual Dictionary to Bilingual Semantic Network

Stretching your brain the challenge of translation
Stretching your brain   the challenge of translationStretching your brain   the challenge of translation
Stretching your brain the challenge of translationCarmen Cabrera Alvarez
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguisticsIrum Malik
 
On the Problem of Preserving the Ecological Purity of the Language in the Lin...
On the Problem of Preserving the Ecological Purity of the Language in the Lin...On the Problem of Preserving the Ecological Purity of the Language in the Lin...
On the Problem of Preserving the Ecological Purity of the Language in the Lin...YogeshIJTSRD
 
General information on dictionary use
General information on dictionary useGeneral information on dictionary use
General information on dictionary useIbrahim Muneer
 
美国教授对中国学生写英文文章的建议
美国教授对中国学生写英文文章的建议美国教授对中国学生写英文文章的建议
美国教授对中国学生写英文文章的建议chengcheng zhou
 
lexicography
lexicographylexicography
lexicographyayfa
 
The Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdfThe Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdfssuserf7cd2b
 
The Oxford Dictionary of English Grammar ( PDFDrive ).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ).pdfThe Oxford Dictionary of English Grammar ( PDFDrive ).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ).pdfssuserf7cd2b
 
parts of speech,punctuation,use of grammer,active passive voice, change of ac...
parts of speech,punctuation,use of grammer,active passive voice, change of ac...parts of speech,punctuation,use of grammer,active passive voice, change of ac...
parts of speech,punctuation,use of grammer,active passive voice, change of ac...UmarKhan422
 
Lin ing 2006-7
Lin ing 2006-7Lin ing 2006-7
Lin ing 2006-7icgrava
 
Lin ing 2006-7
Lin ing 2006-7Lin ing 2006-7
Lin ing 2006-7icgrava
 
Kurso de esperanto
Kurso de esperantoKurso de esperanto
Kurso de esperantoSunny Ananth
 
Language and literature
Language and literatureLanguage and literature
Language and literatureSivabala Naidu
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnClaudiu Mihăilă
 
A New Approach For Paraphrasing And Rewording A Challenging Text
A New Approach For Paraphrasing And Rewording A Challenging TextA New Approach For Paraphrasing And Rewording A Challenging Text
A New Approach For Paraphrasing And Rewording A Challenging TextKate Campbell
 

Similar to Atanassova I., Nakov S., Nakov P., ArtsSemNet: From Bilingual Dictionary to Bilingual Semantic Network (20)

Stretching your brain the challenge of translation
Stretching your brain   the challenge of translationStretching your brain   the challenge of translation
Stretching your brain the challenge of translation
 
Corpus linguistics
Corpus linguisticsCorpus linguistics
Corpus linguistics
 
Talk nbu
Talk nbuTalk nbu
Talk nbu
 
On the Problem of Preserving the Ecological Purity of the Language in the Lin...
On the Problem of Preserving the Ecological Purity of the Language in the Lin...On the Problem of Preserving the Ecological Purity of the Language in the Lin...
On the Problem of Preserving the Ecological Purity of the Language in the Lin...
 
General information on dictionary use
General information on dictionary useGeneral information on dictionary use
General information on dictionary use
 
Dictionary and Thesaurus
Dictionary and ThesaurusDictionary and Thesaurus
Dictionary and Thesaurus
 
美国教授对中国学生写英文文章的建议
美国教授对中国学生写英文文章的建议美国教授对中国学生写英文文章的建议
美国教授对中国学生写英文文章的建议
 
lexicography
lexicographylexicography
lexicography
 
Lidia Pivovarova
Lidia PivovarovaLidia Pivovarova
Lidia Pivovarova
 
The Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdfThe Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ) (1).pdf
 
The Oxford Dictionary of English Grammar ( PDFDrive ).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ).pdfThe Oxford Dictionary of English Grammar ( PDFDrive ).pdf
The Oxford Dictionary of English Grammar ( PDFDrive ).pdf
 
Barbiers iclave-fr
Barbiers iclave-frBarbiers iclave-fr
Barbiers iclave-fr
 
parts of speech,punctuation,use of grammer,active passive voice, change of ac...
parts of speech,punctuation,use of grammer,active passive voice, change of ac...parts of speech,punctuation,use of grammer,active passive voice, change of ac...
parts of speech,punctuation,use of grammer,active passive voice, change of ac...
 
Lin ing 2006-7
Lin ing 2006-7Lin ing 2006-7
Lin ing 2006-7
 
Lin ing 2006-7
Lin ing 2006-7Lin ing 2006-7
Lin ing 2006-7
 
Kurso de esperanto
Kurso de esperantoKurso de esperanto
Kurso de esperanto
 
Language and literature
Language and literatureLanguage and literature
Language and literature
 
AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
 
TEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition YarnTEDDY - Thesaurus Editor: Design and Definition Yarn
TEDDY - Thesaurus Editor: Design and Definition Yarn
 
A New Approach For Paraphrasing And Rewording A Challenging Text
A New Approach For Paraphrasing And Rewording A Challenging TextA New Approach For Paraphrasing And Rewording A Challenging Text
A New Approach For Paraphrasing And Rewording A Challenging Text
 

More from Svetlin Nakov

BG-IT-Edu: отворено учебно съдържание за ИТ учители
BG-IT-Edu: отворено учебно съдържание за ИТ учителиBG-IT-Edu: отворено учебно съдържание за ИТ учители
BG-IT-Edu: отворено учебно съдържание за ИТ учителиSvetlin Nakov
 
Programming World in 2024
Programming World in 2024Programming World in 2024
Programming World in 2024Svetlin Nakov
 
AI Tools for Business and Startups
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and StartupsSvetlin Nakov
 
AI Tools for Scientists - Nakov (Oct 2023)
AI Tools for Scientists - Nakov (Oct 2023)AI Tools for Scientists - Nakov (Oct 2023)
AI Tools for Scientists - Nakov (Oct 2023)Svetlin Nakov
 
AI Tools for Entrepreneurs
AI Tools for EntrepreneursAI Tools for Entrepreneurs
AI Tools for EntrepreneursSvetlin Nakov
 
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023Svetlin Nakov
 
AI Tools for Business and Personal Life
AI Tools for Business and Personal LifeAI Tools for Business and Personal Life
AI Tools for Business and Personal LifeSvetlin Nakov
 
Дипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин НаковДипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин НаковSvetlin Nakov
 
Дипломна работа: учебно съдържание по ООП
Дипломна работа: учебно съдържание по ООПДипломна работа: учебно съдържание по ООП
Дипломна работа: учебно съдържание по ООПSvetlin Nakov
 
Свободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТСвободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТSvetlin Nakov
 
AI and the Professions of the Future
AI and the Professions of the FutureAI and the Professions of the Future
AI and the Professions of the FutureSvetlin Nakov
 
Programming Languages Trends for 2023
Programming Languages Trends for 2023Programming Languages Trends for 2023
Programming Languages Trends for 2023Svetlin Nakov
 
IT Professions and How to Become a Developer
IT Professions and How to Become a DeveloperIT Professions and How to Become a Developer
IT Professions and How to Become a DeveloperSvetlin Nakov
 
GitHub Actions (Nakov at RuseConf, Sept 2022)
GitHub Actions (Nakov at RuseConf, Sept 2022)GitHub Actions (Nakov at RuseConf, Sept 2022)
GitHub Actions (Nakov at RuseConf, Sept 2022)Svetlin Nakov
 
IT Professions and Their Future
IT Professions and Their FutureIT Professions and Their Future
IT Professions and Their FutureSvetlin Nakov
 
How to Become a QA Engineer and Start a Job
How to Become a QA Engineer and Start a JobHow to Become a QA Engineer and Start a Job
How to Become a QA Engineer and Start a JobSvetlin Nakov
 
Призвание и цели: моята рецепта
Призвание и цели: моята рецептаПризвание и цели: моята рецепта
Призвание и цели: моята рецептаSvetlin Nakov
 
What Mongolian IT Industry Can Learn from Bulgaria?
What Mongolian IT Industry Can Learn from Bulgaria?What Mongolian IT Industry Can Learn from Bulgaria?
What Mongolian IT Industry Can Learn from Bulgaria?Svetlin Nakov
 
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)Svetlin Nakov
 
Blockchain and DeFi Overview (Nakov, Sept 2021)
Blockchain and DeFi Overview (Nakov, Sept 2021)Blockchain and DeFi Overview (Nakov, Sept 2021)
Blockchain and DeFi Overview (Nakov, Sept 2021)Svetlin Nakov
 

More from Svetlin Nakov (20)

BG-IT-Edu: отворено учебно съдържание за ИТ учители
BG-IT-Edu: отворено учебно съдържание за ИТ учителиBG-IT-Edu: отворено учебно съдържание за ИТ учители
BG-IT-Edu: отворено учебно съдържание за ИТ учители
 
Programming World in 2024
Programming World in 2024Programming World in 2024
Programming World in 2024
 
AI Tools for Business and Startups
AI Tools for Business and StartupsAI Tools for Business and Startups
AI Tools for Business and Startups
 
AI Tools for Scientists - Nakov (Oct 2023)
AI Tools for Scientists - Nakov (Oct 2023)AI Tools for Scientists - Nakov (Oct 2023)
AI Tools for Scientists - Nakov (Oct 2023)
 
AI Tools for Entrepreneurs
AI Tools for EntrepreneursAI Tools for Entrepreneurs
AI Tools for Entrepreneurs
 
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
Bulgarian Tech Industry - Nakov at Dev.BG All in One Conference 2023
 
AI Tools for Business and Personal Life
AI Tools for Business and Personal LifeAI Tools for Business and Personal Life
AI Tools for Business and Personal Life
 
Дипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин НаковДипломна работа: учебно съдържание по ООП - Светлин Наков
Дипломна работа: учебно съдържание по ООП - Светлин Наков
 
Дипломна работа: учебно съдържание по ООП
Дипломна работа: учебно съдържание по ООПДипломна работа: учебно съдържание по ООП
Дипломна работа: учебно съдържание по ООП
 
Свободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТСвободно ИТ учебно съдържание за учители по програмиране и ИТ
Свободно ИТ учебно съдържание за учители по програмиране и ИТ
 
AI and the Professions of the Future
AI and the Professions of the FutureAI and the Professions of the Future
AI and the Professions of the Future
 
Programming Languages Trends for 2023
Programming Languages Trends for 2023Programming Languages Trends for 2023
Programming Languages Trends for 2023
 
IT Professions and How to Become a Developer
IT Professions and How to Become a DeveloperIT Professions and How to Become a Developer
IT Professions and How to Become a Developer
 
GitHub Actions (Nakov at RuseConf, Sept 2022)
GitHub Actions (Nakov at RuseConf, Sept 2022)GitHub Actions (Nakov at RuseConf, Sept 2022)
GitHub Actions (Nakov at RuseConf, Sept 2022)
 
IT Professions and Their Future
IT Professions and Their FutureIT Professions and Their Future
IT Professions and Their Future
 
How to Become a QA Engineer and Start a Job
How to Become a QA Engineer and Start a JobHow to Become a QA Engineer and Start a Job
How to Become a QA Engineer and Start a Job
 
Призвание и цели: моята рецепта
Призвание и цели: моята рецептаПризвание и цели: моята рецепта
Призвание и цели: моята рецепта
 
What Mongolian IT Industry Can Learn from Bulgaria?
What Mongolian IT Industry Can Learn from Bulgaria?What Mongolian IT Industry Can Learn from Bulgaria?
What Mongolian IT Industry Can Learn from Bulgaria?
 
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
How to Become a Software Developer - Nakov in Mongolia (Oct 2022)
 
Blockchain and DeFi Overview (Nakov, Sept 2021)
Blockchain and DeFi Overview (Nakov, Sept 2021)Blockchain and DeFi Overview (Nakov, Sept 2021)
Blockchain and DeFi Overview (Nakov, Sept 2021)
 

Recently uploaded

Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...allensay1
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Phases of negotiation .pptx
 Phases of negotiation .pptx Phases of negotiation .pptx
Phases of negotiation .pptxnandhinijagan9867
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noidadlhescort
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxWorkforce Group
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxpriyanshujha201
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperityhemanthkumar470700
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...amitlee9823
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756dollysharma2066
 

Recently uploaded (20)

Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Phases of negotiation .pptx
 Phases of negotiation .pptx Phases of negotiation .pptx
Phases of negotiation .pptx
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Falcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to ProsperityFalcon's Invoice Discounting: Your Path to Prosperity
Falcon's Invoice Discounting: Your Path to Prosperity
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 

Atanassova I., Nakov S., Nakov P., ArtsSemNet: From Bilingual Dictionary to Bilingual Semantic Network

  • 1. ArtsSemNet: From Bilingual Dictionary to Bilingual Semantic Network Ivanka Atanassova1, Svetlin Nakov2, Preslav Nakov3 (1) University of Veliko Turnovo “St. Cyril and St. Methodius”, V. Turnovo, Bulgaria (2) Sofia University “St. Kliment Ohridski”, FMI, Sofia, Bulgaria (3) University of California at Berkeley, EECS, Berkeley CA 94720, USA (1) (2) (3) ivanka49@mail.bg, nakov@fmi.uni-sofia.com, nakov@eecs.berkeley.edu Abstract: The paper presents two bilingual lexicographical research: Bulgarian and Russian. Although initially we resources for the terminology of fine arts: the ArtsDict elec- focused on Bulgarian, Russian support has been added tronic dictionary and the ArtsSemNet semantic network, and for two reasons: to illustrate the multilingual support (at describes the process of transformation of the former into the present the dictionary interface is bilingual, while the latter. ArtsDict combines a broad range of information sour- semantic network allows several languages to be used ces and is currently the most complete dictionary of fine arts in parallel) and to make use of the rich language materi- terminology for both Bulgarian and Russian: not only elec- al for Russian we already had. Adding other Balkan tronic, but also in general. It contains 2,900 Bulgarian and languages in combination/instead of Bulgarian/Russian 2,644 Russian terms, each annotated with complete dictiona- would be attractive, once the necessary data is collected ry definitions. These are further augmented with various ter- and made available. minological relations (polysemy, synonymy, homonymy, anto- nymy and hyponymy) and organised into a bilingual semantic network similar to WordNet. In addition, a specialised hyper- 2. ARTSDICT: Bilingual Terminological text browser is implemented in order to enable intuitive query Dictionary and navigation through the network. ArtsDict has been created in order to allow for easy cre- Keywords: semantic network, terminology, polysemy, homo- ation and usage of parallel bilingual terminological dic- nymy, hyponymy, antonymy, synonymy. tionaries for the purpose of lexicographical research. The dictionary data consists of a set of navigable dictio- 1. Introduction nary entries: a term (single-word term, SWT or multi- The contemporary dictionary development has been de- word term, MWT) and one or more glosses describing eply affected by the wide spread of personal computers. its sense(es). The main screen of ArtsDict is split both Nowadays, a fast growing number of users already for- horizontally (between the dictionaries) and vertically: got the annoying lookups in huge paper-based dictiona- the SWT and MWT, including doublets and variants, ries and started using their computer equivalents. Altho- appear on the left in alphabetical order, while their glos- ugh the first computer dictionaries were often worse ses are listed on the right. Although the user interface than the traditional ones their potential was out of ques- imposes no such restrictions, we enforced strict rules tion. As early as in 1992 the creators of the Oxford Eng- for the contents of the separate fields. For example, af- lish Dictionary [OED] invested $13.5 millions in a five ter the term we add in brackets its origin, when it is a years project to enable the development of an electronic foreign word, and the form for singular, when it is version. It soon became clear that the computer dictio- presented in plural. The doublets1 and variants2 appear naries could potentially provide by far richer capabiliti- horizontally comma separated after the term. Similarly, es. In the mean time, some other resources, such as the- after a neutral term its stylistic relative synonyms are sauri, arose (e.g. the Roget’s thesaurus [RT]), which provided the users with synonymy information. Soon, 1 We consider the doublets and the variants as absolute syno- the lexicographers started combining dictionaries and nyms, the difference being that the former share the same thesauri, which resulted in semantic networks (e.g. root, while the latter do not. 2 WordNet [Fellbaum,1998; Miller&al.,1990; WordNet]), In fact the phonetic and orthographic variants are lexico- including not just term glosses and synonyms lists, but grammatical variants of the same word (allolexes), not dis- also links to antonyms, hyponyms etc. tinct words (synonyms). We treat them as separate words (i.e. The work presented below progressed in a similar synonyms) for two reasons: 1. to preserve the unified appro- fashion: we started with electronic dictionaries and later ach to all groups of variant, which represent distinct words or terminological collocations; 2. because the phonetic and gra- transformed them into semantic networks with various phemic variants could be stylistic relative synonyms. It is not terminological relations. We concentrated on the fine possible for the lexico-grammatical variants of a word to be arts terminology for two closely related and easy-to- related to different styles, e.g. in the fine arts terminology: б. combine Slavonic languages suitable for a comparative зограф – изограф (the dialect for зографа).
  • 2. Минерал, разновидност на берила, силикат на берилия и алуминия, Аквамарин (нем. Aquamarin, по лат. скъпоценен камък, с цвят от светлозелен до небесносин, използван като aqua 'вода' + marinus 'морски') материал за художествени изделия. 1. Акварелни бои - бои, състоящи се от пигмент и свързващо вещество Акварел (рус. акварель, фр. aquarelle, (растително лепило с примеси на мед, захар, глицерин); от ит. acquarello, от лат. aqua 'вода') 2. Акварелна техника - живописна техника, използваща акварелни бои; 3. Произведение на живописта, изпълнено с акварелна техника. Разновидност на портретния жанр, включваща портрети, изпълнени в Акварелен портрет акварелна техника. Акварелист (от ит. acquarello) вж. Художник-акварелист Акварелистка (от акварелист, от ит. вж. Художничка-акварелистка. acquarello) Акварелна техника вж. Акварел във 2 знач. Акварелни бои, Водни бои вж. Акварел в 1 знач. Table 1. Extract from the Bulgarian dictionary contents. Минерал, прозрачная разновидность берилла, синевато-зеленой или Аквамарин (нем. Aquamarin, по лат. голубой окраски, драгоценный камень, применяемый как материал для aqua marina 'морская вода') художественных изделий. Акварелист (ит. acquarello) см. Художник-акварелист. Акварелистка (от акварелист, от ит. см. Художница-акварелистка. acquarello) 1. Красочный материал, предназначенный для акварельной живописи, состоящий из пигмента и большого процента клеящих веществ в качестве связующего (которым служит растительный клей с примесью меда, Акварель (фр. aquarelle, ит. сахара, глицерина); acquarello, от лат. aqua 'вода') 2. Техника живописи, выполняемая акварельными красками; 3. Произведение искусства, выполненное акварельными красками в соответствующей технике. Акварельная живопись см. Акварельная техника. Акварельная техника, Акварельная см. Акварель во 2 знач. живопись, Живопись акварелью, Живопись водяными красками Акварельные краски (ед. ч. краска), см. Акварель в 1 знач. Водяные краски Table 2. Extract from the Russian dictionary contents. Олово Тежък мек ковък метал със сивосинкав цвят, използван като материал за художествени произве- (Bulgarian) дения. Олово Химический элемент, мягкий, ковкий, серебристо-белый металл, применяемый в изобразитель- (Russian) ном искусстве как материал для художественных изделий. На български се превежда калай. Table 3. Example of translingual homonymy (Russian). listed, since they represent the same notion (again com- We would like to note that the dictionaries presen- ma separated). ted here are the most complete fine arts terminological The presented arrangement of variants, doublets ones for both Bulgarian and Russian and have been bui- and stylistic synonyms allows equivalent terms in the lt using a broad range of resources: scientific, popular- two dictionaries (i.e. the two languages) to be examined scientific, fine arts, publicist, social-political and other in parallel, for the short entries, and sequentially, for the (journals, specialised scientific and popular-scientific li- longer ones (see Tables 1, 2). The parallel exploration terature, catalogues, etc., [Flerov,1981; Odnoralova, simplifies not only the unification of the dictionaries 1982; Pavlovsky, 1975; Tsonev,1957; Vinner,1954]). In (by means of addition the corresponding equivalent: see addition, Russian and Bulgarian dictionaries have been Table 5) but also the search for translingual homonyms used: terminological (e.g. [SDFAT,1965; SDFAT, (see Table 3). 1970]), encyclopaedic (e.g. [EFAB,1987]), orthographi-
  • 3. Figure 1. Screenshot from ArtsDict. cal, etymological, dictionaries of foreign words, terms • absolute synonyms chains: 483 Bulgarian and lists of fine arts sources etc. Terminological terms, pro- 458 Russian; fessional slang and nomenclatures are grouped together • relative synonyms chains: 136 Bulgarian and and considered within a unified terminological frame- 114 Russian; work (see [Atanasova,2003] for details). • homonyms: 14 Bulgarian and 6 Russian; 3. ARTSSEMNET: Semantic Network • polysemous words: see Table 4. 3.1. Creation The direct extraction of homonyms, synonyms (sty- The ArtsSemNet semantic network was built around listic and relative) and polysemous terms from the dicti- the ArtsDict dictionaries contents. For the purpose, we onary entries was simplified because of the organisation investigated and completely annotated (manually, but of ArtsDict. The hyponyms and antonyms posed a prob- with a partial computer automation using a formal and a lem though. For the extraction of hyponyms sharing a semantic techniques described below) several important common term-element (root/stem, affix, word as a com- terminological relations: polysemy, homonymy, syno- ponent of MWT or another complex word, MWT), not nymy, antonymy and hyponymy. As a result a semantic necessarily shared also by the hypernym, a formal tech- network of the type of WordNet, hierarchically organi- nique was used. ArtsDict was given a hyponym/hyper- sed around the hyponymy relation, was obtained. At the nym, expressed through SWT or MWT, and it produced moment of preparation of the paper it contained: chains of SWT and MWT containing the target term- element. These were further investigated and the hypo- • lexemes: 2,900 Bulgarian and 2,644 Russian; nyms were sieved by the lexicological researcher • hyponyms chains: 276 Bulgarian and 283 Rus- [Atanassova&al.,2002]. A similar technique was used sian; to facilitate the extraction of antonyms sharing a com- mon term-element as well as for shared-root synonyms • antonyms chains: 157 Bulgarian and 134 Rus- (also with common suffix or prefix). sian;
  • 4. 3.2. Functionality Senses 1 2 3 4 5 6 7 The primary purpose of ArtsSemNet is to assist the lexi- count Bulgarian 2,571 273 49 4 2 1 0 cographer with his work by providing him with a tool Russian 2,313 263 56 9 2 0 1 for fast and easy access to rich fine arts terminology (see [Atanassova&al.,2003]). When a search for a parti- Table 4. Terms polysemy. cular term is performed ArtsSemNet displays its glosses, homonyms, synonyms (both absolute and relative) and For the extraction of hyponyms sharing no term-ele- synonyms chains, antonyms and antonyms chains, as ment we used latent semantic analysis (LSA). This is a well as hyponyms chains the target term is part of (both popular technique for indexing, retrieval and analysis of as hyponym or hypernym). ArtsSemNet offers a clean textual data, and assumes a set of mutual latent depen- and intuitive interface. The user can input a term to be dencies between the terms and the contexts they are us- explored, change the language being used or specify ed in. This permits LSA to deal successfully with syno- different search criteria. The information displayed for nymy and partially with polysemy, which are the major a given term includes: problems with the word-based text processing tech- niques (due to the freedom and variability of expressi- • term glosses list; on). LSA is a two-stage process including learning and • homonyms list; analysis. During the learning phase it is given a text col- lection and it produces a real-valued vector for each • absolute synonyms chains; term and for each document. The second phase is the • relative synonyms chains; analysis when the proximity between a pair of docu- • antonyms chains; ments or terms is calculated as the dot product between • hyponyms chains with the target term as a hy- their normalised LSA vectors (see [Landauer&al.,1998] pernym; for an introduction to LSA). We tried to use as features raw or segmented words • hyponyms chains with the target term as a co- (after stop-words and infrequent words removal; the hyponym. SWT and MWT from the dictionary were considered as The system offers several options: whether the term single words) and the former have been found to be mo- is to be searched exactly or partial matches should be re suitable for our task (see [Atanassova&Nakov,2001a] considered as well (e.g. root or prefix); whether the for details). During both training and analysis the engi- homonyms, synonyms and synonyms chains, antonyms ne has been used with one language at a time: Bulgarian and antonyms chains, and hyponyms and hyponym or Russian. chains should be displayed. In the analysis phase, LSA was given a hyponym or Glosses are presented as plain text one per line with a hypernym, expressed as SWT or MWT, and it produ- numbers added in front, in case there is more than one ced a ranked list as a result, sorted according to the se- gloss for the target term. Homonyms are listed one per mantic proximity to the target. The lexicographer ma- line. Absolute synonyms, relative synonyms and anto- nually investigated the result and kept only the true hy- nyms are hyphen-separated. If a relative synonym of the ponyms. Although LSA was intended to focus on hypo- target term has some absolute synonyms these are listed nyms with no shared term elements the returned list co- after it comma-separated. So are the absolute synonyms uld possibly contain such, as long as they are conside- of the antonyms. red semantically close enough by the LSA engine (see Hyponyms chains are listed as terms lists where the [Nakov&Atanassova,2001]). hypernym is displayed first, followed by its hyponyms. The dualistic nature of LSA allowed us to measure Again, if a term has absolute synonyms, these are sho- the proximity not only between terms (SWT or MWT) wn along with it separated by commas. If a polysemous but also between their glosses (see [Atanassova&Na- term is the hypernym of more than one hyponyms chain kov,2001b]). We used as target the glosses of the target the corresponding gloss is displayed in brackets for hypernym (or the glosses of some of its known hypo- each of them. This is similar to the synsets in WordNet. nyms) but also the hypernym itself (using some of its The user interface allows also displaying separately known hyponyms was another option we found useful). each hyponym, which is the hypernym of hyponyms In the latter case we compared it against the term vec- chains of its own as well as showing these chains. tors while in the former – against the document vectors. In any case, when the terms lists are displayed each Querying using terms performed better but the two vari- distinct one is presented as a hyperlink. When the latter ants have been used in parallel since they proposed dif- is followed the target term changes and the correspon- ferent arrangement of the potential hyponyms and each ding information about the new one is displayed (it in of them was useful for the lexicographer who was not turn contains hyperlinks to other terms and so on). The willing to miss any potential hyponym. navigation mechanism is similar to the one provided by
  • 5. Figure 2. Screenshot from ArtsSemNet. a standard Web browser: even the standard forward and WordNet are represented as one or more synsets (i.e. sy- backward buttons are present, visualised as left and nonym sets). A synset groups a term with some of its right arrows, so that the user can navigate back to the synonyms, which taken as a whole represent a particu- already visited terms and then can go forth. Figure 2 lar lexical sense of that term (see [Fellbaum,1998; Mil- shows ArtsSemNet after a successful search for the Bul- ler&al.,1990]). A lexically ambiguous term is included garian term надлъжна гравюра. in more than one synsets: one for each of its senses ArtsSemNet is implemented in Borland Delphi us- (according to the sense granularity level chosen by the ing the relational database management system Micro- network). The synsets are hierarchically interconnected soft Access for the storage and retrieval of the fine arts according to the hyponymy and the meronymy (part- terminological terms, designed in a way to ensure effi- whole) relations and are further distinguished by more cient processing for the kinds of queries needed. specific properties. The work on the project continues and the latest version 2.0 of WordNet includes 115,424 4. Related Work synsets – 79,689 nouns, 13,508 verbs, 18,563 adjectives and 3,664 adverbs [WordNet]. WordNet is among the WordNet. WordNet has been developed by psycholin- most important resources for natural language proces- guists from the Cognitive Science Laboratory of the sing, machine translation, word sense disambiguation, Princeton University as a computational model of the information extraction, information retrieval etc. human lexical memory. Since then the project evaluated into a general lexical reference system comprising thou- EuroWordNet. The success of WordNet provoked inte- sands of words and their corresponding glosses, organi- rest in the development of similar resources for other sed into a semantic network. The terms (lexemes) in languages. In 1996 the European Commission funded
  • 6. 1. Един от жанровете на изобразителното изкуство, който изобразява битови предмети, зеленчуци, Натюрморт плодове, убит дивеч, цветя и др.; (Bulgarian) 2. Отделно произведение от този жанр. 1. Один из жанров изобразительного искусства, посвященный воспроизведению предметов обихода, Натюрморт снеди (овощи, мясо, битая дичь, фрукты), цветов и пр.; (Russian) 2. Отдельное произведение этого жанра. Table 5. Parallel notions in Bulgarian and Russian. the EuroWordNet project, covering 7 European langua- as entities of their own but as synsets. Although this is a ges in parallel (see [EuroWordNet; Vossen,1998]): Cze- clean way to express the lexical relations as holding ch, Dutch, Estonian, French, German, Italian and Spa- between senses and not between the terms themselves, nish. Each part of EuroWordNet uses its own language- it is also partly due to the fact that WordNet was desig- specific synsets but all are interconnected by means of a ned for English where the same word could often be- common index based on WordNet, so that the navigati- long to several different parts of speech (e.g. noun, ad- on between the similar words in different languages is jective and verb), which implies different senses accor- possible in all directions. While the EuroWordNet pro- ding to WordNet. This is highly unlikely for Slavonic ject was finished in 1999 (as opposed to WordNet whi- languages: while they are rich in homographs, these in- ch has always been active) the work on other European volve mostly inflected wordforms and only occasional- languages continues. There are already WordNets avai- ly hold between two or more lemmas. In addition, at lable for Basque, Portuguese and Swedish. Under deve- present ArtsSemNet focuses on nouns only, while the lopment are ones for Bulgarian, Danish, Greek, Icelan- homographs in the Slavonic languages involve mostly dic, Latvian, Moldavian, Norwegian, Romanian, Russi- words with different POS. an (see [RWN]), Serbian, Slovenian, Swedish and Tur- The synset organisation of WordNet implies also kish. Several non-European languages have projects un- some interface differences. When the user enters a que- der development (see the Web page of the Global ry word, WordNet displays all synsets it is included in WordNet Association for details, [GWA]). along with their glosses. In addition, the synonyms, co- There have been also some attempts to integrate do- hyponyms, hyponyms and hyponyms chains, mero- main-specific terminologies into EuroWordNet [Magni- nyms/holonyms, antonyms and coordinated words can ni&Speranza,2001; Stamou&al.,2002]. be shown. All this information is related to the corres- ponding synsets of the target. A summary of the major BalkaNet. This is an ongoing project whose aim is the differences between ArtsSemNet and WordNet follows: creation of a multilingual lexical database consisting of WordNets for the following mostly Balkan languages: • ArtsSemNet is term-centred, while WordNet is Greek, Turkish, Romanian, Bulgarian, Czech and Serbi- built on synsets (senses). ArtsSemNet includes some an (in fact Czech is not a Balkan language, but is Slavo- internal organisation similar to synsets as well but only nic just like Bulgarian and Serbian). The objective is to when it is really needed to split the term for a particular collect some 15,000 comparable synsets (around 30,000 relation (e.g. hyponymy, see Tables 6,7). The synsets literals) in each language, covering generic vocabulary, do not necessarily correspond to different glosses. Even distributed into the following POS categories: 65% when a term has different glosses (i.e. senses) this does nouns, 25% verbs, 5% adjectives and 5% adverbs (see not imply that this will make difference for all the rela- [BalkaNet]). The data will be later incorporated into tions it is involved in (e.g. due to systematic relations). EuroWordNet. If one followed the WordNet approach for a focused The first attempts to build a Bulgarian WordNet domain-specific terminological network this would re- focused on automatic construction from English-Bul- sult in several parallel sense-sense relations (see Tables garian and Bulgarian-English electronic dictionaries 6,7), which we wanted to avoid. (see [Nikolov&Petrova,2001]). For the BalkaNet project • WordNet does not distinguish between absolute though, everything has been created from scratch. At and relative synonyms as ArtsSemNet does, which, in the moment of preparation of the present paper the our opinion, is an important distinction for a domain- Bulgarian WordNet contained about 8,000 synsets (see specific terminology. Examples of absolute synonyms: [BWN]). Bulgarian (готически стил – готика; изумруд – сма- рагд; историческо платно – историческа картина; 5. ARTSSEMNET and WORDNET накити – бижу; торсо – торс; морски пейзаж – ма- рина; разяждане – ецване) and Russian (муштабель WordNet and ArtsSemNet have similar functionality but – палка; арабеска – арабеск; барбы – заусенцы; вос- there are also some important differences. As we menti- ковая живопись – энкаустика; гематит – кровавик; oned above, the terms in WordNet are represented not отпечаток – оттиск; оклад – басма; мягкий краке-
  • 7. Градски пейзаж – Исторически пейзаж – Морски пейзаж, Марина – Парков Пейзаж, Ландшафт (жанр) пейзаж Пейзаж, Ландшафт (произведение) Ведута – Морски пейзаж, Марина Автопортрет – Акварелен портрет – Бюст, Бюстов портрет – Групов портрет – Кавалетен портрет – Камерен портрет – Ктиторски портрети – Параден Портрет (жанр) портрет – Психологически портрет – Скулптурен портрет – Социален портрет – Фаюмски портрет – Херма Портрет (произведение) Автопортрет – Бюст, Бюстов портрет – Херма Table 6. Pseudosynsets and parallel homonymy in Bulgarian. Перо (инструмент) Гусиное перо – Рейсфедер – Рондо – Тростниковое перо, Калам Перо (техника) Гусиное перо – Тростниковое перо, Калам Table 7. Pseudosynsets and parallel homonymy in Russian. люр – плывучий кракелюр). Examples of relative syno- es in Bulgarian are: клееварка (клеянка), портретная nyms: Bulgarian (бристол – ватман – торшон; куке- (room for portraits), резьба по газопенобетону, резь- ри – бабугери; мартеница – китица – гадалушка; ба по ганчу, хохломская роспись (хохлома), палехс- пафти – чапрази – куки; златарство – куюмджий- кая миниатюра, сграффито с инкрустацией цвет- ство; ножарство – бучакчийство) and Russian ных штукатурок. Some terms specific to Bulgarian (мастихин – шпатель; картинная галерея – пинако- include: каменина, ковано желязо, пастирска резба тека; гиацинт – жёлтый яхонт; рубин – красный (овчарска резба), чипровски килим. Another source of яхонт). differences is the language-specific deficiency of whole • WordNet does not explicitly distinguish betwe- classes of terms, e.g. particular female professionals: en homonymy and polysemy, which has been shown im- Bulgarian-only (графичка, декораторка, дизайнерка, portant for some applications, e.g. information retrieval експресионистка, калиграфка, керамичка, маринис- (see [Krovetz,1993]). тка, натуралистка, реставраторка) and Russian • ArtsSemNet does not support the meronymy/ho- only (лепщица, медальерка, миниатюристка, силу- lonymy relation (“X is part of Y”), present in WordNet. этистка, юмористка). Unlike EuroWordNet, which This is because we follow the Bulgarian and Russian is a general semantic network, we wanted to build one linguistics tradition, where meronymy is considered as that is both specialised and as complete as possible. We a special kind of hyponymy/hypernymy and not a sepa- were not willing to sacrifice coverage in some langua- rate relation. ge, for the sake of cross-language index. • The user interface of WordNet does not provide automated hyperlink-based navigation between terms 6. Availability and Usage (as ArtsSemNet does), but has a programming interface. Both ArtsDict and ArtsSemNet are freely available for ArtsSemNet is kept in a relational database, which al- research purposes and the latest versions can be found lows a simple programming access, although a speciali- on the Web (the applications and database for Bulgarian sed interface is not supported at the moment. and Russian): www.cs.berkeley.edu/~nakov/artssemnet. • ArtsSemNet supports both Bulgarian and Russi- There are two variants of distribution: 1) Microsoft an, while the original WordNet is for English only (and Access .mdb file; and 2) SQL-script to create the EuroWordNet supports another set of 7 European database schema and populate the data. The first one is languages, but at the moment – neither Bulgarian nor oriented to Windows applications and is suitable even Russian, but these are already under development). for users that are not familiar with relational databases. We would like to point out that we have two sepa- The second variant could be used by a software develo- rate networks though without links between them. Al- per to import the data into a standard RDBMS (e.g. though they are accessed via the same interface, so that MySQL, Oracle, SQL Server) and then access it using a term can be looked up in either language (a lot of the his/her favourite programming language (e.g. Java, terms are present in both, but do not necessarily repre- Perl, C++, C#). sent parallel notions /Table 5/, but also translingual Technically, the software part of ArtsSemNet (both homonyms /Table 3/ etc.), there is no common index. the application and the database) is not limited in any This is because of problems due to language-specific way neither to Bulgarian/Russian nor to fine arts termi- terminology (crafts, materials, instruments, techniques) nology. It can be used with any terminology in any lan- originating from differences of culture, traditions, cli- guage (except when the alphabet used may be of con- mate etc. Examples for Russian terms with no analogu- cern, e.g. Chinese) as long as information about the
  • 8. terms, glosses and relations is available. Since the data [Fellbaum,1998] Fellbaum C. (ed.). WordNet: An Electronic is currently stored in format that is compatible with MS Lexical Database, MIT Press, 1998. Access, it can be used as an alternative way to explore [Flerov,1981] Flerov A. Material Knowledge and Technology of the Artistic Treatment of Metals (Russian: Материаловедение и and edit the data, to add a new term, gloss or relation, технология художественной обработки металлов). Vysshaya even a new language. The changes will be then automa- shkola. Moscow, 1981. tically recognised and ready to use by the ArtsSemNet [GWA] Global WordNet Association: interface presented above. http://www.globalwordnet.org/ [Krovetz,1993] Krovetz R. Viewing Morphology as an Inference 7. Future Work Process. Proc. 16th ACM SIGIR Conf. on R&D in IR. pp. 191-202. ACM. New York. 1993. There are several directions for further improvement [Landauer&al.,1998] Landauer T., P. Foltz, D. Laham. Introduc- and development of ArtsSemNet. First of all, some mi- tion to LSA. Discourse Processes, vol. 25, pp. 259-284, 1998. nor functional additions are possible: e.g. enable direct [Magnini&Speranza,2001] Magnini B., Speranza M. Integrating search for co-hyponyms. Second, it would be good to Generic and Specialized Wordnets. Proc. Euroconference RANLP. pp. 149-153, Tzigov Chark, Bulgaria, 2001. provide a more intuitive navigation: e.g. display the hy- ponymy hierarchy in the form of tree/graph(s) thus [Miller&al.,1990] Miller G., Beckwith R., Fellbaum C., Gross D., Miller K. Introduction to WordNet: An on-line lexical database. providing a better visual idea of the relations holding Journal of Lexicography, 3(4), pp. 235-244, 1990. between the different terms. Other relations, e.g. holo- [Nakov&Atanassova,2001] Nakov P., Atanassova I. Automatic nymy can also benefit from a hierarchical visualisation. hyponymy extraction from Bulgarian and Russian terminological A suitable graphical representation similar to the one dictionaries. Proc. Naval Scientific Forum, vol. 3, pp.327-335. used in the QuickGO browser (see [QuickGO]) for the Varna, Bulgaria, 2001. Gene Ontology Web interface is another interesting op- [Nikolov&Petrova,2001] Nikolov T., K. Petrova. Towards Buil- tion. It would be good to allow for editing/adding/dele- ding Bulgarian WordNet. Proc. Euroconference Recent Advances in Natural Language. Eds. G.Angelova, K.Bontcheva, R.Mitkov, N.Ni- ting terms, glosses and relations directly from the brow- colov, N.Nikolov. pp.199-203, Tzigov Chark, Bulgaria, 2001. ser interface. It would be also nice to try to interconnect [Novikov,1982] Novikov L. Semantika russkogo yazyka (maybe partially) the two languages similarly to (Семантика русского языка). Vysshaya shkola. Moscow, 1982. EuroWordNet. Adding more languages is another possi- [Odnoralova,1982] Odnoralova N. Sculpture and Sculptural bility. materials. (Russian: Скульптура и скульптурные материалы.), Izobrazitel’noe iskusstvo. Moscow, 1982. 8. References [OED] Oxford English Dictionary http://www.oed.com [Pavlovsky,1975] Pavlovsky A. Monumental Decorative Arts [Atanasova,2003] Atanasova I. Fine Arts Terminology in Russian Materials and Technique (Russian: Материалы и техника and Bulgarian (semasiological and onomasiological aspect. Ph.D. монументально-декоративного искусства). Sovetsky hudozhnik. thesis. Veliko Turnovo, Bulgaria, 2003. Moscow, 1975. [Atanassova&al.,2003] Atanassova I., S. Nakov, P. Nakov. Arts- [QuickGO] QuickGO: GO Browser http://www.ebi.ac.uk/ego SemNet: A Bilingual Semantic Network for Bulgarian and Russian Fine Arts Terminology. Proceedings of BulMET, Varna, Bulgaria, [RT] Roget’s Thesaurus: http://www.bartleby.com/thesauri 2003 [RWN] Russian WordNet: [Atanassova&al.,2002] Atanassova I., Nakov P, Nakov S. In- http://www.phil.pu.ru/depts/12/RN/Main.html formation Technologies Helping the Linguist-Explorer. Proc. VIIIth [SDFAT,1970] Short Dictionary of Fine Arts Terminology (in International Simposium MAPRIAL 2002. pp. 304-309. Veliko Tur- Bulgarian: Кратък речник на термините в изобразителното novo, Bulgaria, 2002. изкуство). Bulgarski Hudozhnik. Sofia, 1970. [Atanassova&Nakov,2001a] Atanassova I., Nakov P. The Impact [SDFAT,1965] Short Dictionary of Fine Arts Terminology (Rus- of the Segmentation on the Automatic Hyponyms Extraction from sian: Краткий словарь терминов изобразительного искусства). Terminological Dictionaries. Proc. Conference on Contemporary Sovremenniy hudozhnik. Moscow, 1965. Achievements in the Philological Sciences and the Foreign [Stamou&al.,2002] Stamou, S., Ntoulas, A., Kyriakopoulou, M., Language University Education. Veliko Turnovo, Bulgaria, 2001. Christodoulakis D. Expanding EuroWordNet with Domain-Specific [Atanassova&Nakov,2001b] Atanassova I., Nakov P. Term and Terminology Using Common Lexical Resources: Vocabulary Document from the Point of View of the Latent Semantic Analysis. Completeness and Coverage Issues. Proc. First International Proc. International Conference “Technologies, Safety and Ecology”, WordNet Conference. Mysore, India, 2002. pp.(69)193-205. Veliko Turnovo, Bulgaria, 2001. [Tsonev,1957] Tsonev K. Painter’s Technical Guide (Bulgarian: [BalkaNet] BalkaNet: http://www.ceid.upatras.gr/Balkanet/ Технически наръчник на художника). Nauka i izkustvo,1957. [BWN] Bulgarian WordNet: http://www.ibl.bas.bg/balk_en.htm [Vossen,1998] Vossen P. (ed.). EuroWordNet: A Multilingual [EFAB,1987] Encyclopaedia of Fine Arts in Bulgaria (Bulgarian: Database with Lexical Semantic Networks, Kluwer Academic Енциклопедия на изобразителните изкуства в България.) vol. I- Publishers, Dordrecht. 1998. II. Sofia, 1987. [Vinner,1954] Vinner A. Art of Painting Materials. (Russian: [EuroWordNet] EuroWordNet: Материалы живописи). Sovetsky hudozhnik. Moscow, 1954. http://www.illc.uva.nl/EuroWordNet/ [WordNet] http://www.cogsci.princeton.edu/~wn