SlideShare une entreprise Scribd logo
1  sur  27
Web Science & Technologies
                        University of Koblenz ▪ Landau, Germany




Topic Discovery in Unstructured Data:
          The Next Generation

   Christoph Kling, Sergej Sizov, Steffen Staab
Understanding Social Media: Example Yahoo News Comments

                                                      • Many
                                                        comments
                                                      • More opinions
                                                      • Commenting
                                                        different
                                                        (sub)topics




 WeST           Steffen Staab   Topic Detection - TNG
                                2 of 25               10/09/2012        2
Discovering topics using LDA




WeST         Steffen Staab   Topic Detection - TNG
                             3 of 25
Browse by topic




                                                     more..




                                                     more..


WeST         Steffen Staab   Topic Detection - TNG
                             4 of 25
We have: Topic-Document – All Fine?

How do we understand               We work on:
   the topics?                     • Opinions about topics
Are all topics of same             • Diversity of opinions
   value?                          • Localisation of topics
Is there structured data to             •     Time-varying topic
   correlate?                                 models (Blei, Lafferty)
  •    Space                            •     ....
  •    Time                             •     Geo-varying topic
  •    Network information                    models




WeST           Steffen Staab   Topic Detection - TNG
                               5 of 25
Geo-located social media content




                                     BMW
                                     Audi            Audi
                                                                   Citroen
                   BMW   Chevrolet                                 Peugeot
                                                                             Renault
                                                         Citroen
       Chevrolet
                                                         BMW
                         Pontiac        Mercedes                   Audi
                         Chevrolet

        Pontiac
                                                                      Fiat
                                                     Mercedes
                                                     BMW




WeST                 Steffen Staab                 Topic Detection - TNG
                                                   6 of 25
Geo-located social media content




 chevrolet
                                                                                   citroen
 pontiac                                                                           renault
 bmw                                    BMW
                                                        Audi                       peugeot
 mercedes                               Audi
                                                                      Citroen      bmw
 audi                       Chevrolet                                 Peugeot
                      BMW
                                                                                 Renault
                                                            Citroen
       Chevrolet
                                                            BMW
                            Pontiac        Mercedes                   Audi
                            Chevrolet

            Pontiac
                                                                         Fiat
                                                        Mercedes
                                                        BMW
                                                                                bmw
                                                                                audi
                                                                                mercedes
                                                                                fiat
                                                                                citroen




WeST                    Steffen Staab                 Topic Detection - TNG
                                                      7 of 25
Related work




 chevrolet
                                                                                   citroen
 pontiac                                                                           renault
 bmw                                    BMW
                                                        Audi                       peugeot
 mercedes                               Audi
                                                                      Citroen      bmw
 audi                       Chevrolet                                 Peugeot
                      BMW
                                                                                 Renault
                                                            Citroen
       Chevrolet
                                                            BMW
                            Pontiac        Mercedes                   Audi
                            Chevrolet

            Pontiac
                                                                         Fiat
                                                        Mercedes
                                                        BMW
                                                                                bmw
                                                                                audi
                                                                                mercedes
                                                                                fiat
                                                                                citroen

 LGTA, Yin et al. 2011

WeST                    Steffen Staab                 Topic Detection - TNG
                                                      8 of 25
Problem

Geographical distribution of topics




       Language areas                Dominating religion


WeST         Steffen Staab   Topic Detection - TNG
                             9 of 25
Our approach




 chevrolet
                                                                                   citroen
 pontiac                                                                           renault
 bmw                                    BMW
                                                        Audi                       peugeot
 mercedes                               Audi
                                                                      Citroen      bmw
 audi                       Chevrolet                                 Peugeot
                      BMW
                                                                                 Renault
                                                            Citroen
       Chevrolet
                                                            BMW
                            Pontiac        Mercedes                   Audi
                            Chevrolet

            Pontiac
                                                                         Fiat
                                                        Mercedes
                                                        BMW
                                                                                bmw
                                                                                audi
                                                                                mercedes
                                                                                fiat
                                                                                citroen




WeST                    Steffen Staab                 Topic Detection - TNG
                                                      10 of 25
Our approach

chevrolet                                                                    citroen
pontiac                                                                      renault
                                     BMW
bmw                                  Audi
                                                    Audi                     peugeot
mercedes                                                         Citroen     bmw
audi               BMW Chevrolet                                 Peugeot
                                                                            Renault
       Chevrolet                                       Citroen
                                                       BMW
                        Pontiac          Mercedes                Audi
                        Chevrolet
        Pontiac        chevrolet
                                                                    Fiat                                         citroen
                       pontiac                      Mercedes                                                     renault
                       bmw                          BMW
                                                                  BMW      bmw                                   peugeot
                                                                  Audi                Audi
                       mercedes                                            audi                     Citroen      bmw
                       audi                          Chevrolet             mercedes                 Peugeot
                                              BMW
                                                                           fiat
                                                                                                               Renault
                                                                           citroen        Citroen
                               Chevrolet
                                                                                          BMW
                                                    Pontiac             Mercedes                    Audi
                                                    Chevrolet

                                    Pontiac
                                                                                                       Fiat
                                                                                      Mercedes
                                                                                      BMW
                                                                                                              bmw
                                                                                                              audi
                                                                                                              mercedes
WeST                      Steffen Staab                    Topic Detection - TNG                              fiat
                                                                                                              citroen
                                                           11 of 25
Geographical network construction




  Data points          Spatial region centroids            Geographical network



WeST            Steffen Staab       Topic Detection - TNG
                                    12 of 25              10/09/2012       12
Topic detection




                           Topic assignments




WeST       Steffen Staab            Topic Detection - TNG
                                    13 of 25
Topic detection




                           Topic assignments




WeST       Steffen Staab            Topic Detection - TNG
                                    14 of 25
Topic detection




                           Topic assignments




WeST       Steffen Staab            Topic Detection - TNG
                                    15 of 25
Topic detection




                           Topic assignments




WeST       Steffen Staab            Topic Detection - TNG
                                    16 of 25
Topic detection




                           Topic assignments




WeST       Steffen Staab            Topic Detection - TNG
                                    17 of 25
Topic detection

Topic exchange between adjacent clusters:



        Pontiac



                       Chevrolet



       BMW                         BMW
              Pontiac
              Chevrolet




WeST              Steffen Staab          Topic Detection - TNG
                                         18 of 25
Topic detection

Topic exchange between adjacent clusters:
spatial region A
                    spatial region B

                  Pontiac             spatial region
                              B       D
         A
                                  Chevrolet
             1                                D
                 BMW              C           BMW
                        Pontiac
                        Chevrolet


       document1
                       spatial region C




WeST                        Steffen Staab              Topic Detection - TNG
                                                       19 of 25
Topic detection

Topic exchange between adjacent clusters:
spatial region A
                    spatial region B

                  Pontiac             spatial region
                              B       D
         A
                                  Chevrolet
             1                                D
                 BMW              C           BMW
                        Pontiac
                        Chevrolet


       document1
                       spatial region C




WeST                        Steffen Staab              Topic Detection - TNG
                                                       20 of 25
Topic detection
     Pontiac
               B
A                  Chevrolet
 1                         D
BMW            C           BMW
          Pontiac
          Chevrolet




                                      A               B
                                                  1                  1 C
                                           1


                                   is drawn from
                               with equal probability
 WeST                            Steffen Staab   Topic Detection - TNG
                                                 21 of 25
Visualisation




chevrolet   0.35                 bmw                  0.29
bmw         0.18                 audi                 0.18
cadillac    0.16                 fiat                 0.10
pontiac     0.09                 citroen              0.09
gmc         0.07                 renault              0.09
buick       0.06                 peugeot              0.08
audi        0.05                 mercedesbenz         0.06
                                 chevrolet            0.05




WeST          Steffen Staab   Topic Detection - TNG
                              22 of 25
Visualisation




bmw            0.63              fiat        0.66          renault   pontiac 0.92
mercedesbenz   0.17              bmw         0.10          0.28
audi           0.13              citroen     0.09          citroen
                                 renault     0.05          0.22
                                                           peugeot
                                                           0.15
                                                           bmw
                                                           0.10
                                                           audi
                                                           0.09
                                                           fiat
                                                           0.07


WeST             Steffen Staab             Topic Detection - TNG
                                           23 of 25
Topic Detection: The next generation

GeoMTD
• Better understandability: „nicer regions“
• Improved quality
   • Better explanation of the data
   • Measured in terms of reduced perplexity
       • about half compared to related work




WeST           Steffen Staab    Topic Detection - TNG
                                24 of 25
Topic Detection: The next generation


Other next generation mechanisms for understanding social
media:
• Opinions
   • adding vocabularies with meaning (LIWC, POMS,...)
• Diversity
   • maximizing for spread of topics and opinions
• Author-topic-time...



         Need to balance between complexity of
              model and sparsity of data!

WeST         Steffen Staab   Topic Detection - TNG
                             25 of 25
Web Science & Technologies
                          University of Koblenz ▪ Landau, Germany




Thank you for your attention!
References

Hierarchical Dirichlet processes
by: Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei
In: Journal of the American Statistical Association, Vol. 101 (2006) , p. 1566-1581.

GeoFolk: latent spatial semantics in web 2.0 social media.
by: Sergej Sizov
In: WSDM ACM (2010) , p. 281-290.

Geographical topic discovery and comparison.
by: Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas S. Huang
In: WWW ACM (2011) , p. 247-256.

A Nonparametric Bayesian Model of Multi-Level Category Learning.
by: Kevin Robert Canini, and Thomas L. Griffiths
In: AAAI AAAI Press (2011) .

Naveed, Nasir; Gottron, Thomas; Sizov, Sergej; Staab, Steffen (2012): FREuD: Feature-Centric
Sentiment Diversification of Online Discussions. In: WebSci'12: Proceedings of the 4th International
Conference on Web Science. ACM, 2012.

Nasir Naveed, Sergej Sizov, Steffen Staab: ATTention: Understanding Authors and Topics in Context of
Temporal Evolution. European Conference on Information Retrieval 2011: 733-737. Springer, 2011.

Further papers about our work currently in preparation. Contact us if interested


WeST                     Steffen Staab                Topic Detection - TNG
                                                      27 of 25

Contenu connexe

Plus de Steffen Staab

Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureSteffen Staab
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSteffen Staab
 
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Steffen Staab
 
Web Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableWeb Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableSteffen Staab
 
Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Steffen Staab
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudSteffen Staab
 
Ontologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagOntologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagSteffen Staab
 
Opinion Formation and Spreading
Opinion Formation and SpreadingOpinion Formation and Spreading
Opinion Formation and SpreadingSteffen Staab
 
(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contentsSteffen Staab
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad DataSteffen Staab
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with ContextSteffen Staab
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSteffen Staab
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015Steffen Staab
 
Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Steffen Staab
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySteffen Staab
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownSteffen Staab
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic WebSteffen Staab
 

Plus de Steffen Staab (20)

Knowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sureKnowledge graphs for knowing more and knowing for sure
Knowledge graphs for knowing more and knowing for sure
 
Symbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine LearningSymbolic Background Knowledge for Machine Learning
Symbolic Background Knowledge for Machine Learning
 
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
Soziale Netzwerke und Medien: Multi-disziplinäre Ansätze für ein multi-dimens...
 
Web Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, SustainableWeb Futures: Inclusive, Intelligent, Sustainable
Web Futures: Inclusive, Intelligent, Sustainable
 
Eyeing the Web
Eyeing the WebEyeing the Web
Eyeing the Web
 
Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )Concepts in Application Context ( How we may think conceptually )
Concepts in Application Context ( How we may think conceptually )
 
Storing and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the CloudStoring and Querying Semantic Data in the Cloud
Storing and Querying Semantic Data in the Cloud
 
Semantics reloaded
Semantics reloadedSemantics reloaded
Semantics reloaded
 
Ontologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag TerminologietagOntologien und Semantic Web - Impulsvortrag Terminologietag
Ontologien und Semantic Web - Impulsvortrag Terminologietag
 
Opinion Formation and Spreading
Opinion Formation and SpreadingOpinion Formation and Spreading
Opinion Formation and Spreading
 
The Web We Want
The Web We WantThe Web We Want
The Web We Want
 
(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents(Semi-)Automatic analysis of online contents
(Semi-)Automatic analysis of online contents
 
Programming with Semantic Broad Data
Programming with Semantic Broad DataProgramming with Semantic Broad Data
Programming with Semantic Broad Data
 
Text Mining using LDA with Context
Text Mining using LDA with ContextText Mining using LDA with Context
Text Mining using LDA with Context
 
Semantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and PracticesSemantic Web Technologies: Principles and Practices
Semantic Web Technologies: Principles and Practices
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015
 
Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data Semantic Technologies and Programmatic Access to Semantic Data
Semantic Technologies and Programmatic Access to Semantic Data
 
Seamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuitySeamless semantics - avoiding semantic discontinuity
Seamless semantics - avoiding semantic discontinuity
 
The Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the UnknownThe Semantic Web - Interacting with the Unknown
The Semantic Web - Interacting with the Unknown
 
Programming the Semantic Web
Programming the Semantic WebProgramming the Semantic Web
Programming the Semantic Web
 

Dernier

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Dernier (20)

SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

Topic Discovery in Unstructured Data: The Next Generation

  • 1. Web Science & Technologies University of Koblenz ▪ Landau, Germany Topic Discovery in Unstructured Data: The Next Generation Christoph Kling, Sergej Sizov, Steffen Staab
  • 2. Understanding Social Media: Example Yahoo News Comments • Many comments • More opinions • Commenting different (sub)topics WeST Steffen Staab Topic Detection - TNG 2 of 25 10/09/2012 2
  • 3. Discovering topics using LDA WeST Steffen Staab Topic Detection - TNG 3 of 25
  • 4. Browse by topic more.. more.. WeST Steffen Staab Topic Detection - TNG 4 of 25
  • 5. We have: Topic-Document – All Fine? How do we understand We work on: the topics? • Opinions about topics Are all topics of same • Diversity of opinions value? • Localisation of topics Is there structured data to • Time-varying topic correlate? models (Blei, Lafferty) • Space • .... • Time • Geo-varying topic • Network information models WeST Steffen Staab Topic Detection - TNG 5 of 25
  • 6. Geo-located social media content BMW Audi Audi Citroen BMW Chevrolet Peugeot Renault Citroen Chevrolet BMW Pontiac Mercedes Audi Chevrolet Pontiac Fiat Mercedes BMW WeST Steffen Staab Topic Detection - TNG 6 of 25
  • 7. Geo-located social media content chevrolet citroen pontiac renault bmw BMW Audi peugeot mercedes Audi Citroen bmw audi Chevrolet Peugeot BMW Renault Citroen Chevrolet BMW Pontiac Mercedes Audi Chevrolet Pontiac Fiat Mercedes BMW bmw audi mercedes fiat citroen WeST Steffen Staab Topic Detection - TNG 7 of 25
  • 8. Related work chevrolet citroen pontiac renault bmw BMW Audi peugeot mercedes Audi Citroen bmw audi Chevrolet Peugeot BMW Renault Citroen Chevrolet BMW Pontiac Mercedes Audi Chevrolet Pontiac Fiat Mercedes BMW bmw audi mercedes fiat citroen LGTA, Yin et al. 2011 WeST Steffen Staab Topic Detection - TNG 8 of 25
  • 9. Problem Geographical distribution of topics Language areas Dominating religion WeST Steffen Staab Topic Detection - TNG 9 of 25
  • 10. Our approach chevrolet citroen pontiac renault bmw BMW Audi peugeot mercedes Audi Citroen bmw audi Chevrolet Peugeot BMW Renault Citroen Chevrolet BMW Pontiac Mercedes Audi Chevrolet Pontiac Fiat Mercedes BMW bmw audi mercedes fiat citroen WeST Steffen Staab Topic Detection - TNG 10 of 25
  • 11. Our approach chevrolet citroen pontiac renault BMW bmw Audi Audi peugeot mercedes Citroen bmw audi BMW Chevrolet Peugeot Renault Chevrolet Citroen BMW Pontiac Mercedes Audi Chevrolet Pontiac chevrolet Fiat citroen pontiac Mercedes renault bmw BMW BMW bmw peugeot Audi Audi mercedes audi Citroen bmw audi Chevrolet mercedes Peugeot BMW fiat Renault citroen Citroen Chevrolet BMW Pontiac Mercedes Audi Chevrolet Pontiac Fiat Mercedes BMW bmw audi mercedes WeST Steffen Staab Topic Detection - TNG fiat citroen 11 of 25
  • 12. Geographical network construction Data points Spatial region centroids Geographical network WeST Steffen Staab Topic Detection - TNG 12 of 25 10/09/2012 12
  • 13. Topic detection Topic assignments WeST Steffen Staab Topic Detection - TNG 13 of 25
  • 14. Topic detection Topic assignments WeST Steffen Staab Topic Detection - TNG 14 of 25
  • 15. Topic detection Topic assignments WeST Steffen Staab Topic Detection - TNG 15 of 25
  • 16. Topic detection Topic assignments WeST Steffen Staab Topic Detection - TNG 16 of 25
  • 17. Topic detection Topic assignments WeST Steffen Staab Topic Detection - TNG 17 of 25
  • 18. Topic detection Topic exchange between adjacent clusters: Pontiac Chevrolet BMW BMW Pontiac Chevrolet WeST Steffen Staab Topic Detection - TNG 18 of 25
  • 19. Topic detection Topic exchange between adjacent clusters: spatial region A spatial region B Pontiac spatial region B D A Chevrolet 1 D BMW C BMW Pontiac Chevrolet document1 spatial region C WeST Steffen Staab Topic Detection - TNG 19 of 25
  • 20. Topic detection Topic exchange between adjacent clusters: spatial region A spatial region B Pontiac spatial region B D A Chevrolet 1 D BMW C BMW Pontiac Chevrolet document1 spatial region C WeST Steffen Staab Topic Detection - TNG 20 of 25
  • 21. Topic detection Pontiac B A Chevrolet 1 D BMW C BMW Pontiac Chevrolet A B 1 1 C 1 is drawn from with equal probability WeST Steffen Staab Topic Detection - TNG 21 of 25
  • 22. Visualisation chevrolet 0.35 bmw 0.29 bmw 0.18 audi 0.18 cadillac 0.16 fiat 0.10 pontiac 0.09 citroen 0.09 gmc 0.07 renault 0.09 buick 0.06 peugeot 0.08 audi 0.05 mercedesbenz 0.06 chevrolet 0.05 WeST Steffen Staab Topic Detection - TNG 22 of 25
  • 23. Visualisation bmw 0.63 fiat 0.66 renault pontiac 0.92 mercedesbenz 0.17 bmw 0.10 0.28 audi 0.13 citroen 0.09 citroen renault 0.05 0.22 peugeot 0.15 bmw 0.10 audi 0.09 fiat 0.07 WeST Steffen Staab Topic Detection - TNG 23 of 25
  • 24. Topic Detection: The next generation GeoMTD • Better understandability: „nicer regions“ • Improved quality • Better explanation of the data • Measured in terms of reduced perplexity • about half compared to related work WeST Steffen Staab Topic Detection - TNG 24 of 25
  • 25. Topic Detection: The next generation Other next generation mechanisms for understanding social media: • Opinions • adding vocabularies with meaning (LIWC, POMS,...) • Diversity • maximizing for spread of topics and opinions • Author-topic-time... Need to balance between complexity of model and sparsity of data! WeST Steffen Staab Topic Detection - TNG 25 of 25
  • 26. Web Science & Technologies University of Koblenz ▪ Landau, Germany Thank you for your attention!
  • 27. References Hierarchical Dirichlet processes by: Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei In: Journal of the American Statistical Association, Vol. 101 (2006) , p. 1566-1581. GeoFolk: latent spatial semantics in web 2.0 social media. by: Sergej Sizov In: WSDM ACM (2010) , p. 281-290. Geographical topic discovery and comparison. by: Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas S. Huang In: WWW ACM (2011) , p. 247-256. A Nonparametric Bayesian Model of Multi-Level Category Learning. by: Kevin Robert Canini, and Thomas L. Griffiths In: AAAI AAAI Press (2011) . Naveed, Nasir; Gottron, Thomas; Sizov, Sergej; Staab, Steffen (2012): FREuD: Feature-Centric Sentiment Diversification of Online Discussions. In: WebSci'12: Proceedings of the 4th International Conference on Web Science. ACM, 2012. Nasir Naveed, Sergej Sizov, Steffen Staab: ATTention: Understanding Authors and Topics in Context of Temporal Evolution. European Conference on Information Retrieval 2011: 733-737. Springer, 2011. Further papers about our work currently in preparation. Contact us if interested WeST Steffen Staab Topic Detection - TNG 27 of 25