SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
LOHAI: Providing a baseline for
         KOS based
     automatic indexing

                Kai Eckert
      Mannheim University Library, Germany
         eckert@bib.uni-mannheim.de

               First workshop on
      Semantic Digital Archives (SDA 2011),
              Sep 29th 2011, Berlin
Motivation

●   General KOS based indexer for various (even
    serious) purposes:
       –   Document exploration
       –   Thesaurus examination
       –   Automatic Indexing
       –   ...
●   No free and open source implementation was
    available.
●   LOHAI: Low Hanging Fruits Automatic Indexer
Design Principles

●   Simplicity over quality
       –   Easy to use
       –   Easy to understand
       –   Easy to improve
●   Knowledge-poor and without any training
       –   Must not rely on any additional sources
       –   No training step, to be usable in a setting
            where no preindexed documents are
            available.
Indexing Pipeline
Indexing Pipeline




              Covered in this talk.
Required Disambiguation
●   Compound terms:
       –   “insur” >> “insurance”, “insurance market”
●   Overstemming:
       –   „nation“ >> “nationalism”, “nationality”, “nation”
●   Homonyms:
       –   “bank”: the financial institution
       –   “bank”: a raised portion of seabed or sloping
             ground along the edge of a stream, river, or lake
Compound Term Detection



Money   Insur   Market   Cross-Concordance
Compound Term Detection
        Ambigous


Money
Money Transfer
...

   Money           Insur   Market   Cross-Concordance
Compound Term Detection
                         Ambigous


Money            Insurance
Money Transfer   Insurance Market
...              ...

   Money              Insur         Market   Cross-Concordance
Compound Term Detection
                                           Ambigous


Money            Insurance          Market
Money Transfer   Insurance Market   Insurance Market
...              ...                ...

   Money              Insur             Market         Cross-Concordance
Compound Term Detection
                                                               Unique: STOP


Money            Insurance          Market             Cross-Concordance
Money Transfer   Insurance Market   Insurance Market
...              ...                ...

   Money              Insur             Market         Cross-Concordance
Compound Term Detection

Money            Insurance          Market             Cross-Concordance
Money Transfer   Insurance Market   Insurance Market
...              ...                ...

   Money              Insur             Market         Cross-Concordance


                                                       No match



   Money              Insur             Market
Compound Term Detection

Money            Insurance          Market             Cross-Concordance
Money Transfer   Insurance Market   Insurance Market
...              ...                ...

   Money              Insur             Market         Cross-Concordance


                                                       No match



   Money              Insur             Market
Compound Term Detection

Money            Insurance          Market             Cross-Concordance
Money Transfer   Insurance Market   Insurance Market
...              ...                ...

   Money              Insur             Market         Cross-Concordance


                                                       MATCH



   Money              Insur             Market           >> Money
Compound Term Detection

Money            Insurance          Market             Cross-Concordance
Money Transfer   Insurance Market   Insurance Market
...              ...                ...

   Money              Insur             Market         Cross-Concordance




   Money              Insur             Market           >> Money

                                                       MATCH



   Money              Insur             Market           >> Insurance Market
Unstemming
●   Overstemming:
          –   „nation“ >> “nationalism”, “nationality”, “nation”
●   Compare unstemmed terms.
●   If they match:
          –   Assign them.
●   If not:
          –   Continue with Word Sense Disambiguation.
Word Sense Disambiguation

Yarowsky's assumptions:
●   One sense per collocation
       –   Collocated terms are unique for each possible
            sense of a given term.
●   One sense per discourse
       –   Only one sense for a given word is used
            throughout a whole document.
Jaccard Measure

●   Term Environments:
       –   Document: 100 words before and after the
            term occurrence form set W.
       –   KOS: All labels of the concept, its direct
            children, siblings and parents form set C.
●   Jaccard Measure:
                               ∣W ∪C∣
             Jaccard W , C =
                               ∣W ∩C∣
●   Assign one sense per document, based on the
    Jaccard value of all occurrences.
Example Result
Hardin, Russell: Contractarianism: Wistful Thinking
The contract metaphor in political and moral theory is misguided. It is a poor
metaphor both descriptively and normatively, but here I address its normative
problems. Normatively, contractarianism is supposed to give justifications for
political institutions and for moral rules, just as contracting in the law is
supposed to give justification for claims of obligation based on consent or
agreement. This metaphorical association fails for several reasons. First,
actual contracts generally govern prisoner’s dilemma, or exchange, relations;
the so-called social contract governs these and more diverse interactions as
well. Second, agreement, which is the moral basis of contractarianism, is not
right-making per se. Third, a contract in law gives information on what are the
interests of the parties; a hypothetical social contract requires such
knowledge, it does not reveal it. Hence, much of contemporary contractarian
theory is perversely rationalist at its base because it requires prior, rational
derivation of interests or other values. Finally, contractarian moral theory has
the further disadvantage that, unlike contract in the law, its agreements cannot
be connected to relevant motivations to abide by them.
Constitutional Political Economy, 1 (2) 1990: 35-52
Example Result

Manual assignment              LOHAI

●   Constitutional economics   ●   Contract Law (1.21)
●   Influence of government    ●   Contract (0.76)
●   Ethics                     ●   Social contract (0.64)
●   Theory                     ●   Law (0.51)
                               ●   Politics (0.37)
                               ●   Prisoner’s dilemma (0.34)
                               ●   Theory (0.32)
                               ●   Rationalism (0.24)
                               ●   Association (0.23)
                               ●   Exchange (0.20)
                               ●   Knowledge (0.19)
                               ●   Government (0.16)
                               ●   Information (0.12)
Example Result
The contract metaphor in political and moral theory is misguided. It is a
poor metaphor both descriptively and normatively, but here I address
its normative problems. Normatively, contractarianism is supposed to
give justifications for political institutions and for moral rules, just as
contracting in the law is supposed to give justification for claims of
obligation based on consent or agreement. This metaphorical
association fails for several reasons. First, actual contracts generally
govern prisoner’s dilemma, or exchange, relations; the so-called social
contract governs these and more diverse interactions as well. Second,
agreement, which is the moral basis of contractarianism, is not right-
making per se. Third, a contract in law gives information on what are
the interests of the parties; a hypothetical social contract requires such
knowledge, it does not reveal it. Hence, much of contemporary
contractarian theory is perversely rationalist at its base because it
requires prior, rational derivation of interests or other values. Finally,
contractarian moral theory has the further disadvantage that, unlike
contract in the law, its agreements cannot be connected to relevant
motivations to abide by them.
Conclusion
●   Reasonable results without „Black Box“ Effect.
●   Directly usable for new KOS and document
    sets.
●   No training step needed.
●   Low hanging fruits, but a good baseline.
●   ~ 500 LoC in (quite verbose) Java.
●   Easy to adapt and to improve.
●   Free and open source:
        –   https://github.com/kaiec/LOHAI
Thank you.



       Questions/Ideas?

            Kai Eckert
Mannheim University Library, Germany


eckert@bib.uni-mannheim.de

Contenu connexe

En vedette

Linked Data nach dem Hype
Linked Data nach dem HypeLinked Data nach dem Hype
Linked Data nach dem HypeKai Eckert
 
Specialising the EDM for Digitised Manuscript (SWIB13)
Specialising the EDM for Digitised Manuscript (SWIB13)Specialising the EDM for Digitised Manuscript (SWIB13)
Specialising the EDM for Digitised Manuscript (SWIB13)Kai Eckert
 
SWIB 2010: Linked Open Projects
SWIB 2010: Linked Open ProjectsSWIB 2010: Linked Open Projects
SWIB 2010: Linked Open ProjectsKai Eckert
 
Thesaurusvisualisierung mit ICE-Map und SEMTINEL
Thesaurusvisualisierung mit ICE-Map und SEMTINELThesaurusvisualisierung mit ICE-Map und SEMTINEL
Thesaurusvisualisierung mit ICE-Map und SEMTINELKai Eckert
 
Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim
Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim
Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim Kai Eckert
 
RDF Application Profiles
RDF Application ProfilesRDF Application Profiles
RDF Application ProfilesKai Eckert
 

En vedette (6)

Linked Data nach dem Hype
Linked Data nach dem HypeLinked Data nach dem Hype
Linked Data nach dem Hype
 
Specialising the EDM for Digitised Manuscript (SWIB13)
Specialising the EDM for Digitised Manuscript (SWIB13)Specialising the EDM for Digitised Manuscript (SWIB13)
Specialising the EDM for Digitised Manuscript (SWIB13)
 
SWIB 2010: Linked Open Projects
SWIB 2010: Linked Open ProjectsSWIB 2010: Linked Open Projects
SWIB 2010: Linked Open Projects
 
Thesaurusvisualisierung mit ICE-Map und SEMTINEL
Thesaurusvisualisierung mit ICE-Map und SEMTINELThesaurusvisualisierung mit ICE-Map und SEMTINEL
Thesaurusvisualisierung mit ICE-Map und SEMTINEL
 
Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim
Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim
Bibliotheken und Linked Open Data - Erfahrungen und Ideen aus der UB Mannheim
 
RDF Application Profiles
RDF Application ProfilesRDF Application Profiles
RDF Application Profiles
 

Similaire à LOHAI: Providing a baseline for KOS based automatic indexing

Reinsurance Dispute Resolution/NAIW
Reinsurance Dispute Resolution/NAIWReinsurance Dispute Resolution/NAIW
Reinsurance Dispute Resolution/NAIWTheresa Hajost
 
Derivatives and Pakistani Market
Derivatives and Pakistani MarketDerivatives and Pakistani Market
Derivatives and Pakistani Markethadishaikh2
 
The Law on Token sales
The Law on Token salesThe Law on Token sales
The Law on Token salesMathew Chacko
 
Statementby Christopher Whalen Sbc 062209
Statementby Christopher Whalen Sbc 062209Statementby Christopher Whalen Sbc 062209
Statementby Christopher Whalen Sbc 062209bartonp
 
Contract theory
Contract theoryContract theory
Contract theoryav190286
 
Discussion paper series - wacc using market value or estimated value
Discussion paper series - wacc using market value or estimated valueDiscussion paper series - wacc using market value or estimated value
Discussion paper series - wacc using market value or estimated valueFuturum2
 
PPT - Law Easy Writing Services PowerPoint Presentation, Free Download
PPT - Law Easy Writing Services PowerPoint Presentation, Free DownloadPPT - Law Easy Writing Services PowerPoint Presentation, Free Download
PPT - Law Easy Writing Services PowerPoint Presentation, Free DownloadSue Ganguli
 
Derivatives - Basics
Derivatives - Basics Derivatives - Basics
Derivatives - Basics BFSI academy
 
Ronnie goodin.ronnie contracts
Ronnie goodin.ronnie contractsRonnie goodin.ronnie contracts
Ronnie goodin.ronnie contractsNASAPMC
 
Custom Writing Pros Cheap Cust
Custom Writing Pros Cheap CustCustom Writing Pros Cheap Cust
Custom Writing Pros Cheap CustTraci Webb
 
FIM - Currency Futures
FIM - Currency FuturesFIM - Currency Futures
FIM - Currency FuturesBishnu Kumar
 
Good Intros Start Essay - Dissertationsmean.X.Fc2.Com
Good Intros Start Essay - Dissertationsmean.X.Fc2.ComGood Intros Start Essay - Dissertationsmean.X.Fc2.Com
Good Intros Start Essay - Dissertationsmean.X.Fc2.ComSue Jones
 
Satire Essay - GCSE English - Marked By Teachers.Com
Satire Essay - GCSE English - Marked By Teachers.ComSatire Essay - GCSE English - Marked By Teachers.Com
Satire Essay - GCSE English - Marked By Teachers.ComDenise Enriquez
 
Eurex Institutional insights 201307
Eurex Institutional insights 201307Eurex Institutional insights 201307
Eurex Institutional insights 201307Barbara Kearney
 
Using Cross Asset Information To Improve Portfolio Risk Estimation
Using Cross Asset Information To Improve Portfolio Risk EstimationUsing Cross Asset Information To Improve Portfolio Risk Estimation
Using Cross Asset Information To Improve Portfolio Risk Estimationyamanote
 

Similaire à LOHAI: Providing a baseline for KOS based automatic indexing (20)

Reinsurance Dispute Resolution/NAIW
Reinsurance Dispute Resolution/NAIWReinsurance Dispute Resolution/NAIW
Reinsurance Dispute Resolution/NAIW
 
Essay Cruelty To Animals
Essay Cruelty To AnimalsEssay Cruelty To Animals
Essay Cruelty To Animals
 
Derivatives and Pakistani Market
Derivatives and Pakistani MarketDerivatives and Pakistani Market
Derivatives and Pakistani Market
 
Lecture 7 financial_engineering
Lecture 7 financial_engineeringLecture 7 financial_engineering
Lecture 7 financial_engineering
 
The Law on Token sales
The Law on Token salesThe Law on Token sales
The Law on Token sales
 
Derivatives market
Derivatives marketDerivatives market
Derivatives market
 
Statementby Christopher Whalen Sbc 062209
Statementby Christopher Whalen Sbc 062209Statementby Christopher Whalen Sbc 062209
Statementby Christopher Whalen Sbc 062209
 
Contract theory
Contract theoryContract theory
Contract theory
 
Discussion paper series - wacc using market value or estimated value
Discussion paper series - wacc using market value or estimated valueDiscussion paper series - wacc using market value or estimated value
Discussion paper series - wacc using market value or estimated value
 
PPT - Law Easy Writing Services PowerPoint Presentation, Free Download
PPT - Law Easy Writing Services PowerPoint Presentation, Free DownloadPPT - Law Easy Writing Services PowerPoint Presentation, Free Download
PPT - Law Easy Writing Services PowerPoint Presentation, Free Download
 
Derivatives - Basics
Derivatives - Basics Derivatives - Basics
Derivatives - Basics
 
Ronnie goodin.ronnie contracts
Ronnie goodin.ronnie contractsRonnie goodin.ronnie contracts
Ronnie goodin.ronnie contracts
 
Custom Writing Pros Cheap Cust
Custom Writing Pros Cheap CustCustom Writing Pros Cheap Cust
Custom Writing Pros Cheap Cust
 
FIM - Currency Futures
FIM - Currency FuturesFIM - Currency Futures
FIM - Currency Futures
 
Good Intros Start Essay - Dissertationsmean.X.Fc2.Com
Good Intros Start Essay - Dissertationsmean.X.Fc2.ComGood Intros Start Essay - Dissertationsmean.X.Fc2.Com
Good Intros Start Essay - Dissertationsmean.X.Fc2.Com
 
Satire Essay - GCSE English - Marked By Teachers.Com
Satire Essay - GCSE English - Marked By Teachers.ComSatire Essay - GCSE English - Marked By Teachers.Com
Satire Essay - GCSE English - Marked By Teachers.Com
 
Eurex Institutional insights 201307
Eurex Institutional insights 201307Eurex Institutional insights 201307
Eurex Institutional insights 201307
 
Using Cross Asset Information To Improve Portfolio Risk Estimation
Using Cross Asset Information To Improve Portfolio Risk EstimationUsing Cross Asset Information To Improve Portfolio Risk Estimation
Using Cross Asset Information To Improve Portfolio Risk Estimation
 
Products in derivatives market
Products in derivatives marketProducts in derivatives market
Products in derivatives market
 
Products in derivatives market
Products in derivatives marketProducts in derivatives market
Products in derivatives market
 

Plus de Kai Eckert

Judaica link und der FID Jüdische Studien
Judaica link und der FID Jüdische StudienJudaica link und der FID Jüdische Studien
Judaica link und der FID Jüdische StudienKai Eckert
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Kai Eckert
 
JudaicaLink: Linked Data in the Jewish Studies FID
JudaicaLink: Linked Data in the Jewish Studies FIDJudaicaLink: Linked Data in the Jewish Studies FID
JudaicaLink: Linked Data in the Jewish Studies FIDKai Eckert
 
JudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish EncyclopediaeJudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish EncyclopediaeKai Eckert
 
Towards Interoperable Metadata Provenance
Towards Interoperable Metadata ProvenanceTowards Interoperable Metadata Provenance
Towards Interoperable Metadata ProvenanceKai Eckert
 
Linked Open Projects (DCMI Library Community)
Linked Open Projects (DCMI Library Community)Linked Open Projects (DCMI Library Community)
Linked Open Projects (DCMI Library Community)Kai Eckert
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata ProvenanceKai Eckert
 
Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)Kai Eckert
 
Linked Open Projects
Linked Open ProjectsLinked Open Projects
Linked Open ProjectsKai Eckert
 
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept HierarchiesCrowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept HierarchiesKai Eckert
 
A Unified Approach for Representing Metametadata
A Unified Approach for Representing MetametadataA Unified Approach for Representing Metametadata
A Unified Approach for Representing MetametadataKai Eckert
 
Semantic Web, SKOS und Linked Data
Semantic Web, SKOS und Linked DataSemantic Web, SKOS und Linked Data
Semantic Web, SKOS und Linked DataKai Eckert
 

Plus de Kai Eckert (12)

Judaica link und der FID Jüdische Studien
Judaica link und der FID Jüdische StudienJudaica link und der FID Jüdische Studien
Judaica link und der FID Jüdische Studien
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)
 
JudaicaLink: Linked Data in the Jewish Studies FID
JudaicaLink: Linked Data in the Jewish Studies FIDJudaicaLink: Linked Data in the Jewish Studies FID
JudaicaLink: Linked Data in the Jewish Studies FID
 
JudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish EncyclopediaeJudaicaLink: Linked Data from Jewish Encyclopediae
JudaicaLink: Linked Data from Jewish Encyclopediae
 
Towards Interoperable Metadata Provenance
Towards Interoperable Metadata ProvenanceTowards Interoperable Metadata Provenance
Towards Interoperable Metadata Provenance
 
Linked Open Projects (DCMI Library Community)
Linked Open Projects (DCMI Library Community)Linked Open Projects (DCMI Library Community)
Linked Open Projects (DCMI Library Community)
 
Metadata Provenance
Metadata ProvenanceMetadata Provenance
Metadata Provenance
 
Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)Linked Open Projects (DGI-Konferenz)
Linked Open Projects (DGI-Konferenz)
 
Linked Open Projects
Linked Open ProjectsLinked Open Projects
Linked Open Projects
 
Crowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept HierarchiesCrowdsourcing the Assembly of Concept Hierarchies
Crowdsourcing the Assembly of Concept Hierarchies
 
A Unified Approach for Representing Metametadata
A Unified Approach for Representing MetametadataA Unified Approach for Representing Metametadata
A Unified Approach for Representing Metametadata
 
Semantic Web, SKOS und Linked Data
Semantic Web, SKOS und Linked DataSemantic Web, SKOS und Linked Data
Semantic Web, SKOS und Linked Data
 

Dernier

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxPooja Bhuva
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 

Dernier (20)

Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 

LOHAI: Providing a baseline for KOS based automatic indexing

  • 1. LOHAI: Providing a baseline for KOS based automatic indexing Kai Eckert Mannheim University Library, Germany eckert@bib.uni-mannheim.de First workshop on Semantic Digital Archives (SDA 2011), Sep 29th 2011, Berlin
  • 2. Motivation ● General KOS based indexer for various (even serious) purposes: – Document exploration – Thesaurus examination – Automatic Indexing – ... ● No free and open source implementation was available. ● LOHAI: Low Hanging Fruits Automatic Indexer
  • 3. Design Principles ● Simplicity over quality – Easy to use – Easy to understand – Easy to improve ● Knowledge-poor and without any training – Must not rely on any additional sources – No training step, to be usable in a setting where no preindexed documents are available.
  • 5. Indexing Pipeline Covered in this talk.
  • 6. Required Disambiguation ● Compound terms: – “insur” >> “insurance”, “insurance market” ● Overstemming: – „nation“ >> “nationalism”, “nationality”, “nation” ● Homonyms: – “bank”: the financial institution – “bank”: a raised portion of seabed or sloping ground along the edge of a stream, river, or lake
  • 7. Compound Term Detection Money Insur Market Cross-Concordance
  • 8. Compound Term Detection Ambigous Money Money Transfer ... Money Insur Market Cross-Concordance
  • 9. Compound Term Detection Ambigous Money Insurance Money Transfer Insurance Market ... ... Money Insur Market Cross-Concordance
  • 10. Compound Term Detection Ambigous Money Insurance Market Money Transfer Insurance Market Insurance Market ... ... ... Money Insur Market Cross-Concordance
  • 11. Compound Term Detection Unique: STOP Money Insurance Market Cross-Concordance Money Transfer Insurance Market Insurance Market ... ... ... Money Insur Market Cross-Concordance
  • 12. Compound Term Detection Money Insurance Market Cross-Concordance Money Transfer Insurance Market Insurance Market ... ... ... Money Insur Market Cross-Concordance No match Money Insur Market
  • 13. Compound Term Detection Money Insurance Market Cross-Concordance Money Transfer Insurance Market Insurance Market ... ... ... Money Insur Market Cross-Concordance No match Money Insur Market
  • 14. Compound Term Detection Money Insurance Market Cross-Concordance Money Transfer Insurance Market Insurance Market ... ... ... Money Insur Market Cross-Concordance MATCH Money Insur Market >> Money
  • 15. Compound Term Detection Money Insurance Market Cross-Concordance Money Transfer Insurance Market Insurance Market ... ... ... Money Insur Market Cross-Concordance Money Insur Market >> Money MATCH Money Insur Market >> Insurance Market
  • 16. Unstemming ● Overstemming: – „nation“ >> “nationalism”, “nationality”, “nation” ● Compare unstemmed terms. ● If they match: – Assign them. ● If not: – Continue with Word Sense Disambiguation.
  • 17. Word Sense Disambiguation Yarowsky's assumptions: ● One sense per collocation – Collocated terms are unique for each possible sense of a given term. ● One sense per discourse – Only one sense for a given word is used throughout a whole document.
  • 18. Jaccard Measure ● Term Environments: – Document: 100 words before and after the term occurrence form set W. – KOS: All labels of the concept, its direct children, siblings and parents form set C. ● Jaccard Measure: ∣W ∪C∣ Jaccard W , C = ∣W ∩C∣ ● Assign one sense per document, based on the Jaccard value of all occurrences.
  • 19. Example Result Hardin, Russell: Contractarianism: Wistful Thinking The contract metaphor in political and moral theory is misguided. It is a poor metaphor both descriptively and normatively, but here I address its normative problems. Normatively, contractarianism is supposed to give justifications for political institutions and for moral rules, just as contracting in the law is supposed to give justification for claims of obligation based on consent or agreement. This metaphorical association fails for several reasons. First, actual contracts generally govern prisoner’s dilemma, or exchange, relations; the so-called social contract governs these and more diverse interactions as well. Second, agreement, which is the moral basis of contractarianism, is not right-making per se. Third, a contract in law gives information on what are the interests of the parties; a hypothetical social contract requires such knowledge, it does not reveal it. Hence, much of contemporary contractarian theory is perversely rationalist at its base because it requires prior, rational derivation of interests or other values. Finally, contractarian moral theory has the further disadvantage that, unlike contract in the law, its agreements cannot be connected to relevant motivations to abide by them. Constitutional Political Economy, 1 (2) 1990: 35-52
  • 20. Example Result Manual assignment LOHAI ● Constitutional economics ● Contract Law (1.21) ● Influence of government ● Contract (0.76) ● Ethics ● Social contract (0.64) ● Theory ● Law (0.51) ● Politics (0.37) ● Prisoner’s dilemma (0.34) ● Theory (0.32) ● Rationalism (0.24) ● Association (0.23) ● Exchange (0.20) ● Knowledge (0.19) ● Government (0.16) ● Information (0.12)
  • 21. Example Result The contract metaphor in political and moral theory is misguided. It is a poor metaphor both descriptively and normatively, but here I address its normative problems. Normatively, contractarianism is supposed to give justifications for political institutions and for moral rules, just as contracting in the law is supposed to give justification for claims of obligation based on consent or agreement. This metaphorical association fails for several reasons. First, actual contracts generally govern prisoner’s dilemma, or exchange, relations; the so-called social contract governs these and more diverse interactions as well. Second, agreement, which is the moral basis of contractarianism, is not right- making per se. Third, a contract in law gives information on what are the interests of the parties; a hypothetical social contract requires such knowledge, it does not reveal it. Hence, much of contemporary contractarian theory is perversely rationalist at its base because it requires prior, rational derivation of interests or other values. Finally, contractarian moral theory has the further disadvantage that, unlike contract in the law, its agreements cannot be connected to relevant motivations to abide by them.
  • 22. Conclusion ● Reasonable results without „Black Box“ Effect. ● Directly usable for new KOS and document sets. ● No training step needed. ● Low hanging fruits, but a good baseline. ● ~ 500 LoC in (quite verbose) Java. ● Easy to adapt and to improve. ● Free and open source: – https://github.com/kaiec/LOHAI
  • 23. Thank you. Questions/Ideas? Kai Eckert Mannheim University Library, Germany eckert@bib.uni-mannheim.de