SlideShare une entreprise Scribd logo
1  sur  16
Semantic Transforms Using
 Collaborative Knowledge Bases


Yegin Genc, Winter Mason, Jeffrey V. Nickerson

          Stevens Institute of Technology
Overview


• Automatically understand online information

• Using network artifacts, such as Wikipedia, to
  help
Topic Models
       Algorithms to understand and
       organize documents by
       uncovering semantic structure
       of a document collection

       • Discover hidden themes –
         patterns of word use
       • Connect documents that
         exhibit similar patterns
Latent Dirichlet Allocation (LDA)

   “In the computer science field of artificial intelligence, a genetic algorithm (GA) is a
   search heuristic that mimics the process of natural evolution. This heuristic is
   routinely used to generate useful solutions to optimization and search problems.
   Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which
   generate solutions to optimization problems using techniques inspired by natural
   evolution, such as inheritance, mutation, selection, and crossover.” 1


            Algorithms      – 0.28               Genetic         – 0.18
            Optimization    – 0.28               Natural         – 0.18
            Algorithm       – 0.14               Evolution       – 0.18
            Computer        – 0.14               Evolutionary    – 0.09
            Techniques      – 0.14               …
            ….
1http://en.wikipedia.org/wiki/Genetic_algorithm
Topics from LDA
     computer          chemistry           cortex             orbit           infection
     methods            synthesis         stimulus            dust            immune
      number            oxidation             fig            jupiter             aids
         two            reaction            vision             line            infected
      principle          product           neuron            system              viral
       design            organic         recordings           solar              cells
Five topics from a 50-topic LDA model to fit Science from 1980 – 2002 (Blei and Lafferty, 2009)



   methods      k               of   the    for  the              the      operations     the
      the      the           objects of     the   o               and         the          of
       a        of              to     a  linear we                of      functional       a
       of   algorithm         and     to problem and               to       requires       is
   problems    for             the   we problems a                that        and          in
Ten randomly chosen topics from a 50-topic LDA model fit to abstracts from the Journal of
the ACM (JACM) from the years 1987 to 2004 (Blei et al., 2010).
The interpretation problem
1. Labeling the topics is difficult (J. Chang et al.,
   2009)
2. The relationships between topics are not
   identified
3. The information in the topics is based solely
   on the input corpus
4. The external validity of the topics may be
   limited
Collaborative Knowledge Bases
1. Labeled topics
2. Connected to each other in a meaningful way
3. Contain rich, focused information on
   particular topics
4. Contain fresh, up-to-date information about
   practically everything
Wikipedia Pages as Topics
LDA topic      Wikipedia Page

   orbit       Solar System
   dust        “The Solar System[a] consists of the Sun
  jupiter      and the astronomical objects
               gravitationally bound in orbit around it,
    line
               all of which formed from the collapse of a
  system       giant molecular cloud approximately 4.6
   solar       billion years ago…”
    gas
atmospheric    (http://en.wikipedia.org/wiki/Solar_System)
   mars
   field
Wikipedia Pages as Topics
Topics are characterized as distributions over observed words in
Wikipedia pages

 Wikipedia Word Freq.
     orbit    34      0.12
     dust      7      0.02                                   {Wi Î k}
                                      bk = p(Wi | k) =   N
    jupiter   36      0.12
      line     0      0.00                               å {W Î k}
                                                                i
                                                         i
    system    76      0.26
                                      βk : Per-topic word distribution
     solar    110     0.38
      gas     11      0.04
  atmospheric  1      0.00
     mars      8      0.03
     field     8      0.03
DOCUMENT – TOPIC          DOCUMENT – W0RD                    TOPIC - WORD
          Θ (D x K)                 W (D x W )
                                                                    β (K x W)
             Z d,n                                                         W d,n

                                              n
                                                            Z d,n
LDA



         d                          d




                                                                     Wiki (W x K)
                     k                                                       k
WIKI




         d                   =          d
                                                          *


                     D: Documents           K: Topics   W: Words
Experiment
Data
617 abstracts from Journal of the ACM
Classified into 80 categories by their authors
53 categories have corresponding Wikipedia Pages

Abstracts
{Article Name:        On the (Im)possibility of Obfuscating Programs,
    Category:         D.4. Operating Systems
    Add. Category:    F.1 Computation by Abstract Devices
    …
}

Category Mappings
    Category                                Wikipedia Page
    D.4 Operating Systems:                  Operating System
    F.1 Computation by Abstract Devices :   Abstract Machine
Three variations of our method



- Inbound links are Wikipedia pages that link to the topic page
- Outbound links are Wikipedia pages linked to by the topic
  page
- Text-based method only uses word distributions in topic pages
Results
      Method                    Primary                   Primary or Additional

         Text                 182 (29.5%)                      314 (50.8%)

   Inbound links              131 (21.2%)                      249 (40.0%)

  Outbound links               79 (12.8%)                      166 (26.9%)



The number (and percentage) of authors’ primary ACM topic labels, or authors’
primary + additional ACM topics successfully identified by each method.

LDA cannot be compared without an additional step mapping word distributions to
ACM topics.
Results (Qualitative)
Concluding Remarks
The Wiki categories often match the categories that
were chosen by the authors. When they don’t
match, they generally appear plausible.

Among the variations of our method, the text based
approach performed better than link based
approaches.

Among the link based approaches, inbound links
performed better than outbound links.
Next Steps

Dependent topic structures

Combine heuristics with generative models:
  Wikipedia as a prior for the topic distribution
  Learn from the documents observed.

Contenu connexe

Tendances

Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...National Institute of Informatics
 
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. FreyMachine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. FreyNathan Frey, PhD
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...KAMAL CHOUDHARY
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Designaimsnist
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Anubhav Jain
 
A Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge SystemsA Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge Systemsaimsnist
 

Tendances (8)

Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...Application of a Novel Subject Classification Scheme for a Bibliographic Data...
Application of a Novel Subject Classification Scheme for a Bibliographic Data...
 
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. FreyMachine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey
Machine Learning in Materials Science and Chemistry, USPTO, Nathan C. Frey
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
 
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Data Mining The Sky
Data Mining The SkyData Mining The Sky
Data Mining The Sky
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...Software Tools, Methods and Applications of Machine Learning in Functional Ma...
Software Tools, Methods and Applications of Machine Learning in Functional Ma...
 
A Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge SystemsA Machine Learning Framework for Materials Knowledge Systems
A Machine Learning Framework for Materials Knowledge Systems
 

En vedette

En vedette (6)

H0ly L4nd
H0ly L4ndH0ly L4nd
H0ly L4nd
 
windward5
windward5windward5
windward5
 
Discovering Context
Discovering ContextDiscovering Context
Discovering Context
 
Creative
CreativeCreative
Creative
 
Knights
KnightsKnights
Knights
 
Advertising
AdvertisingAdvertising
Advertising
 

Similaire à Semantic Transforms Using Collaborative Knowledge Bases

Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)cdtpv
 
Development of a Trans-Field Learning System Based on Multidimensional Topic ...
Development of a Trans-Field Learning System Based on Multidimensional Topic ...Development of a Trans-Field Learning System Based on Multidimensional Topic ...
Development of a Trans-Field Learning System Based on Multidimensional Topic ...tmra
 
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...Ahmed Saleh
 
Ontology driven Annotation
Ontology driven AnnotationOntology driven Annotation
Ontology driven AnnotationAshish Kulkarni
 
The Unbearable Lightness of Wiking
The Unbearable Lightness of Wiking The Unbearable Lightness of Wiking
The Unbearable Lightness of Wiking Jie Bao
 
SWiM – A wiki for collaborating on mathematical ontologies
SWiM – A wiki for collaborating on mathematical ontologiesSWiM – A wiki for collaborating on mathematical ontologies
SWiM – A wiki for collaborating on mathematical ontologiesChristoph Lange
 
Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]Marcia Zeng
 
Exploring Content with Wikipedia
Exploring Content with WikipediaExploring Content with Wikipedia
Exploring Content with WikipediaYegin Genc
 
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009Ajay Ohri
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language ProcessingBerlin Language Technology
 
Wikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing DocumentsWikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing DocumentsZareen Syed
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasAngelo Salatino
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than DataAmit Sheth
 
AdS Biology and Quantum Information Science
AdS Biology and Quantum Information ScienceAdS Biology and Quantum Information Science
AdS Biology and Quantum Information ScienceMelanie Swan
 

Similaire à Semantic Transforms Using Collaborative Knowledge Bases (20)

Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
Materials Modelling: From theory to solar cells (Lecture 1)
Materials Modelling: From theory to solar cells  (Lecture 1)Materials Modelling: From theory to solar cells  (Lecture 1)
Materials Modelling: From theory to solar cells (Lecture 1)
 
Development of a Trans-Field Learning System Based on Multidimensional Topic ...
Development of a Trans-Field Learning System Based on Multidimensional Topic ...Development of a Trans-Field Learning System Based on Multidimensional Topic ...
Development of a Trans-Field Learning System Based on Multidimensional Topic ...
 
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
Performance Comparison of Ad-hoc Retrieval Models over Full-text vs. Titles o...
 
Ontology driven Annotation
Ontology driven AnnotationOntology driven Annotation
Ontology driven Annotation
 
The Unbearable Lightness of Wiking
The Unbearable Lightness of Wiking The Unbearable Lightness of Wiking
The Unbearable Lightness of Wiking
 
SWiM – A wiki for collaborating on mathematical ontologies
SWiM – A wiki for collaborating on mathematical ontologiesSWiM – A wiki for collaborating on mathematical ontologies
SWiM – A wiki for collaborating on mathematical ontologies
 
Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]Linking KOS Data [using SKOS and OWL2]
Linking KOS Data [using SKOS and OWL2]
 
Exploring Content with Wikipedia
Exploring Content with WikipediaExploring Content with Wikipedia
Exploring Content with Wikipedia
 
Blei lafferty2009
Blei lafferty2009Blei lafferty2009
Blei lafferty2009
 
Cyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and BeyondCyberinfrastructure for Einstein's Equations and Beyond
Cyberinfrastructure for Einstein's Equations and Beyond
 
#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing#4 Convolutional Neural Networks for Natural Language Processing
#4 Convolutional Neural Networks for Natural Language Processing
 
Wikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing DocumentsWikipedia as an Ontology for Describing Documents
Wikipedia as an Ontology for Describing Documents
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
 
How To Make Linked Data More than Data
How To Make Linked Data More than DataHow To Make Linked Data More than Data
How To Make Linked Data More than Data
 
AdS Biology and Quantum Information Science
AdS Biology and Quantum Information ScienceAdS Biology and Quantum Information Science
AdS Biology and Quantum Information Science
 
LDAvis
LDAvisLDAvis
LDAvis
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 

Dernier

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Semantic Transforms Using Collaborative Knowledge Bases

  • 1. Semantic Transforms Using Collaborative Knowledge Bases Yegin Genc, Winter Mason, Jeffrey V. Nickerson Stevens Institute of Technology
  • 2. Overview • Automatically understand online information • Using network artifacts, such as Wikipedia, to help
  • 3. Topic Models Algorithms to understand and organize documents by uncovering semantic structure of a document collection • Discover hidden themes – patterns of word use • Connect documents that exhibit similar patterns
  • 4. Latent Dirichlet Allocation (LDA) “In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural evolution. This heuristic is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover.” 1 Algorithms – 0.28 Genetic – 0.18 Optimization – 0.28 Natural – 0.18 Algorithm – 0.14 Evolution – 0.18 Computer – 0.14 Evolutionary – 0.09 Techniques – 0.14 … …. 1http://en.wikipedia.org/wiki/Genetic_algorithm
  • 5. Topics from LDA computer chemistry cortex orbit infection methods synthesis stimulus dust immune number oxidation fig jupiter aids two reaction vision line infected principle product neuron system viral design organic recordings solar cells Five topics from a 50-topic LDA model to fit Science from 1980 – 2002 (Blei and Lafferty, 2009) methods k of the for the the operations the the the objects of the o and the of a of to a linear we of functional a of algorithm and to problem and to requires is problems for the we problems a that and in Ten randomly chosen topics from a 50-topic LDA model fit to abstracts from the Journal of the ACM (JACM) from the years 1987 to 2004 (Blei et al., 2010).
  • 6. The interpretation problem 1. Labeling the topics is difficult (J. Chang et al., 2009) 2. The relationships between topics are not identified 3. The information in the topics is based solely on the input corpus 4. The external validity of the topics may be limited
  • 7. Collaborative Knowledge Bases 1. Labeled topics 2. Connected to each other in a meaningful way 3. Contain rich, focused information on particular topics 4. Contain fresh, up-to-date information about practically everything
  • 8. Wikipedia Pages as Topics LDA topic Wikipedia Page orbit Solar System dust “The Solar System[a] consists of the Sun jupiter and the astronomical objects gravitationally bound in orbit around it, line all of which formed from the collapse of a system giant molecular cloud approximately 4.6 solar billion years ago…” gas atmospheric (http://en.wikipedia.org/wiki/Solar_System) mars field
  • 9. Wikipedia Pages as Topics Topics are characterized as distributions over observed words in Wikipedia pages Wikipedia Word Freq. orbit 34 0.12 dust 7 0.02 {Wi Î k} bk = p(Wi | k) = N jupiter 36 0.12 line 0 0.00 å {W Î k} i i system 76 0.26 βk : Per-topic word distribution solar 110 0.38 gas 11 0.04 atmospheric 1 0.00 mars 8 0.03 field 8 0.03
  • 10. DOCUMENT – TOPIC DOCUMENT – W0RD TOPIC - WORD Θ (D x K) W (D x W ) β (K x W) Z d,n W d,n n Z d,n LDA d d Wiki (W x K) k k WIKI d = d * D: Documents K: Topics W: Words
  • 11. Experiment Data 617 abstracts from Journal of the ACM Classified into 80 categories by their authors 53 categories have corresponding Wikipedia Pages Abstracts {Article Name: On the (Im)possibility of Obfuscating Programs, Category: D.4. Operating Systems Add. Category: F.1 Computation by Abstract Devices … } Category Mappings Category Wikipedia Page D.4 Operating Systems: Operating System F.1 Computation by Abstract Devices : Abstract Machine
  • 12. Three variations of our method - Inbound links are Wikipedia pages that link to the topic page - Outbound links are Wikipedia pages linked to by the topic page - Text-based method only uses word distributions in topic pages
  • 13. Results Method Primary Primary or Additional Text 182 (29.5%) 314 (50.8%) Inbound links 131 (21.2%) 249 (40.0%) Outbound links 79 (12.8%) 166 (26.9%) The number (and percentage) of authors’ primary ACM topic labels, or authors’ primary + additional ACM topics successfully identified by each method. LDA cannot be compared without an additional step mapping word distributions to ACM topics.
  • 15. Concluding Remarks The Wiki categories often match the categories that were chosen by the authors. When they don’t match, they generally appear plausible. Among the variations of our method, the text based approach performed better than link based approaches. Among the link based approaches, inbound links performed better than outbound links.
  • 16. Next Steps Dependent topic structures Combine heuristics with generative models: Wikipedia as a prior for the topic distribution Learn from the documents observed.

Notes de l'éditeur

  1. Blei- “Much of my research is in topic models, which are a suite of algorithms to uncover the hidden thematic structure of a collection of documents. These algorithms help us develop new ways to search, browse and summarize large archives of texts.”
  2. Here is an example of a paragraphWe assume that some number of topics exist in a document setEach document is a mixture of these corpus wide topicsEach topic is a distribution over wordsEach word is drawn from one of those topics
  3. Describing what they mean is different,
  4. Use posterior expectations / approximate posterior inference: gibbs sampling, variational inference
  5. The reason we chose this so that we can validate our results
  6. Pause… Thank you