SlideShare une entreprise Scribd logo
1  sur  30
Télécharger pour lire hors ligne
Adapting Ajax Solr to Compare
 different sets of documents
 Joan Codina-Filba, Barcelona Media-Innovation Center
                      19-Oct-2011
What I Will Cover
 More than just Facets
 We do research on opinion mining
 To Give an overview of thousands of
  opinions
 After analyzing the contents of them:
  • How to explore the key concepts expressed
    in the opinions.
  • Are the key concepts the most common
    ones, or the most differentiating?
 Use of Solr and Ajax Solr for a more
  intelligent data navigation
                      2
My Background
   Joan Codina Filbà
   Barcelona Media – Innovation Center
   Senior Researcher
   Experience on
    •   Data => Text => Web => Opinion MINING
    •   Integration of NLP tools.
    •   Multimodal search
    •   Some teaching




                            3
The Challenge
 Companies want to know what does the web 2.0
  say about them:
 Too many writers, in many places, not enough time
  to read.
   – Detect hot topics (Quality, Price, service?)
   – Detect strengths / weakness
   – Knowledge about the market
 Use NLP to detect key concepts in text
 User generated content, noisy, irony, dispersion
 From data to information and knowledge
                        4
Use Case Example
 Analyze the users in MySpace
 Combination of demographic data and
  analyzed text
 Possibility to "navigate" the data to get
  “knowledge” of the users.
 Exploring thousands of messages, before
  sending them to a “blind” , “automatic” text
  mining tool to find relevant aspects.
 An “OLAP” for data and text, a kind cube view
  of the opinions

                        5
Solr Facets
 Simple statistic...
   – For fields with a single value:
       SELECT field, count(x) FROM table GROUP BY field

      • Simple aggregation statistic
   – For text fields ?
 Facets vs IDF
   – The most common facets give less
     information
   – The rare ones, are not useful


                           6
Information given by Facets
8                                                                       1
                          Facets
                                                                        0,9
7                         idf
                          information                                   0,8
6
                                                                        0,7
5                                                                       0,6

4                                                                       0,5

                                                                        0,4
3
                                                                        0,3
2
                                                                        0,2
1                                                                       0,1

0                                                                       0
    0   0,1   0,2   0,3   0,4    0,5        0,6   0,7   0,8   0,9   1



                                        7
Information given by Facets
8                                                                      1

                             Facets
                                                                       0,9
7                            idf
                             information                               0,8

6
                                                                       0,7


5                                                                      0,6



4       Too                                                Too         0,5


        rare                                             Common        0,4
3
                                                                       0,3

2
                                                                       0,2


1                                                                      0,1



0                                                                      0

    0      0,1   0,2   0,3   0,4    0,5    0,6   0,7   0,8   0,9   1


                                       8
The Information is in the
     “not so common" terms
 Doing a search and looking at facets to
  characterize the data:
   – For most of the queries the most common
     facets do not change significantly
   – Sorting by frequency:
      • The order almost is the same.
      • Difficult to see the changes
 The information is not on the term's frequency
  but in the relative changes on these frequencies

                         9
Example

 Exploring MySpace posts
 Demographic Data: Age, Gender, Country
 Text Content




                      10
Example

 After a query (country = MX)




                        11
Example

 After a query (country = MX)




             Find the 10 differences !!




                           12
Example

 After a query (gender= Male)




                        13
Example

 After a query (gender= Male)




             Find the 10 differences !!




                           14
The Information is on the
            differences
 What do we want to know:
  – Which are the terms that use the Males and/or
    people from Mexico ?
  – OR
  – Which are the terms that better identify or
    differentiate them?




                        15
Example

 After a query (country = MX)




                Substract them !!




                         16
Example
    After a query (country = MX)
Faceted Results: The most used words by the mexicans




Statistical Results: The words that differentiate the mexicans
                   from the rest of the world




                                 17
Example
    After a query (gender = M)
Faceted Results: The words that Males use most




Statistical Results: The words that differentiate the males from
                          the females




                                 18
The information is on the
                difference
 We can compare a subset with the full set:
 But we can compare different subsets:




                           19
The information is on the
                difference
 We can compare by gender, doing different
  queries




                          20
Data Charts
 We can apply differences to charts




                           21
Data Charts
 We can apply differences to charts: And show the
  difference with the global set




                           22
Data Charts
 Data Charts can show the differences




                        Males




                        Females




                          23
Data Charts
 Even through queries




                         Males




                         Females



                           24
Ajax Solr
    Single Query / Resultset




              Query
                                 Ajax
Interface                               SOLR

            Resultset

       User Browser
     Javascript front end               Server


                            25
Ajax Solr Extension
    Multiple Queries / Resultsets




              Query
                                 Ajax
Interface                               SOLR

             Resultset

       User Browser
     Javascript front end               Server


                            26
Ajax Solr Extension:
            Multiple Sets
 Each set has its own query
   – A query is composed of different query terms
   – The terms that define the set, can not be
     deleted
   – Query terms can be copied/deleted through
     the different sets
 Widgets can view/compare data from the
  different result sets.

                        27
Ajax Solr Extension:
             Multiple Sets
                        Set 0
                 Independent widgets
                      Base Data



          Set 1                              Set 2
Widgets related to this set        Widgets related to this set
   Own Filter Query                   Own Filter Query




                              28
Conclusions
 Facets give basic statistics
 The information is on the differences
 Ajax Solr can be adapted to manage different
  sets of data
 Charts and graphics help us to have an
  overview of the data: information.




                        29
Sources
  The code is not ready to be published
     • Was done for SolrJs
     • Migrating to AjaxSolr (Lucene-Solr 4.0)
   But it will be...
 Links
   – http://www.barcelonamedia.org/personal/joan.co
     dina/en




                           30

Contenu connexe

Similaire à Comparing Document Sets with Ajax Solr

OutlierAnalysisIDIO071216.pptx.otliers is the main
OutlierAnalysisIDIO071216.pptx.otliers is the mainOutlierAnalysisIDIO071216.pptx.otliers is the main
OutlierAnalysisIDIO071216.pptx.otliers is the mainRamlalMeena5
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...Gilles Vandewiele
 
Growing Intelligence by Properly Storing and Mining Call Center Data
Growing Intelligence by Properly Storing and Mining Call Center DataGrowing Intelligence by Properly Storing and Mining Call Center Data
Growing Intelligence by Properly Storing and Mining Call Center DataBay Bridge Decision Technologies
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Dave Stokes
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Data Science London
 
Brave New World: The End of Survey Research
Brave New World: The End of Survey ResearchBrave New World: The End of Survey Research
Brave New World: The End of Survey ResearchMichael Bystry
 
Semantic Media Wiki Approach To Metadata
Semantic Media Wiki Approach To MetadataSemantic Media Wiki Approach To Metadata
Semantic Media Wiki Approach To Metadatathompland777
 
Data science see what your eyes can't
Data science see what your eyes can'tData science see what your eyes can't
Data science see what your eyes can'tInnoTech
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofNurfadhlina Mohd Sharef
 
Data visualization story telling
Data visualization   story tellingData visualization   story telling
Data visualization story tellingSreenivas Ravi
 
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisBayesia USA
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!Mark Wilkinson
 
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesLinked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesPrateek Jain
 
ComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsAccumulo Summit
 

Similaire à Comparing Document Sets with Ajax Solr (17)

OutlierAnalysisIDIO071216.pptx.otliers is the main
OutlierAnalysisIDIO071216.pptx.otliers is the mainOutlierAnalysisIDIO071216.pptx.otliers is the main
OutlierAnalysisIDIO071216.pptx.otliers is the main
 
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
[ESWC2017 - PhD Symposium] Enhancing white-box machine learning processes by ...
 
Growing Intelligence by Properly Storing and Mining Call Center Data
Growing Intelligence by Properly Storing and Mining Call Center DataGrowing Intelligence by Properly Storing and Mining Call Center Data
Growing Intelligence by Properly Storing and Mining Call Center Data
 
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016Why Your Database Queries Stink -SeaGl.org November 11th, 2016
Why Your Database Queries Stink -SeaGl.org November 11th, 2016
 
lsntap1707dataviztools-170726205609.pdf
lsntap1707dataviztools-170726205609.pdflsntap1707dataviztools-170726205609.pdf
lsntap1707dataviztools-170726205609.pdf
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
Brave New World: The End of Survey Research
Brave New World: The End of Survey ResearchBrave New World: The End of Survey Research
Brave New World: The End of Survey Research
 
Semantic Media Wiki Approach To Metadata
Semantic Media Wiki Approach To MetadataSemantic Media Wiki Approach To Metadata
Semantic Media Wiki Approach To Metadata
 
Data Visualization Tools
Data Visualization Tools Data Visualization Tools
Data Visualization Tools
 
Data Visualization Tools
Data Visualization ToolsData Visualization Tools
Data Visualization Tools
 
Data science see what your eyes can't
Data science see what your eyes can'tData science see what your eyes can't
Data science see what your eyes can't
 
Self adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation ofSelf adaptive based natural language interface for disambiguation of
Self adaptive based natural language interface for disambiguation of
 
Data visualization story telling
Data visualization   story tellingData visualization   story telling
Data visualization story telling
 
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor AnalysisProbabilistic Latent Factor Induction and
 Statistical Factor Analysis
Probabilistic Latent Factor Induction and
 Statistical Factor Analysis
 
SADI SWSIP '09 'cause you can't always GET what you want!
SADI SWSIP '09  'cause you can't always GET what you want!SADI SWSIP '09  'cause you can't always GET what you want!
SADI SWSIP '09 'cause you can't always GET what you want!
 
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based TechniquesLinked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
Linked Open Data Alignment and Enrichment Using Bootstrapping Based Techniques
 
ComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and GraphsComputableFacts: a Secure System to Store Documents and Graphs
ComputableFacts: a Secure System to Store Documents and Graphs
 

Plus de lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadooplucenerevolution
 

Plus de lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 
The First Class Integration of Solr with Hadoop
The First Class Integration of Solr with HadoopThe First Class Integration of Solr with Hadoop
The First Class Integration of Solr with Hadoop
 

Dernier

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Comparing Document Sets with Ajax Solr

  • 1. Adapting Ajax Solr to Compare different sets of documents Joan Codina-Filba, Barcelona Media-Innovation Center 19-Oct-2011
  • 2. What I Will Cover  More than just Facets  We do research on opinion mining  To Give an overview of thousands of opinions  After analyzing the contents of them: • How to explore the key concepts expressed in the opinions. • Are the key concepts the most common ones, or the most differentiating?  Use of Solr and Ajax Solr for a more intelligent data navigation 2
  • 3. My Background  Joan Codina Filbà  Barcelona Media – Innovation Center  Senior Researcher  Experience on • Data => Text => Web => Opinion MINING • Integration of NLP tools. • Multimodal search • Some teaching 3
  • 4. The Challenge  Companies want to know what does the web 2.0 say about them:  Too many writers, in many places, not enough time to read. – Detect hot topics (Quality, Price, service?) – Detect strengths / weakness – Knowledge about the market  Use NLP to detect key concepts in text  User generated content, noisy, irony, dispersion  From data to information and knowledge 4
  • 5. Use Case Example  Analyze the users in MySpace  Combination of demographic data and analyzed text  Possibility to "navigate" the data to get “knowledge” of the users.  Exploring thousands of messages, before sending them to a “blind” , “automatic” text mining tool to find relevant aspects.  An “OLAP” for data and text, a kind cube view of the opinions 5
  • 6. Solr Facets  Simple statistic... – For fields with a single value: SELECT field, count(x) FROM table GROUP BY field • Simple aggregation statistic – For text fields ?  Facets vs IDF – The most common facets give less information – The rare ones, are not useful 6
  • 7. Information given by Facets 8 1 Facets 0,9 7 idf information 0,8 6 0,7 5 0,6 4 0,5 0,4 3 0,3 2 0,2 1 0,1 0 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 7
  • 8. Information given by Facets 8 1 Facets 0,9 7 idf information 0,8 6 0,7 5 0,6 4 Too Too 0,5 rare Common 0,4 3 0,3 2 0,2 1 0,1 0 0 0 0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 8
  • 9. The Information is in the “not so common" terms  Doing a search and looking at facets to characterize the data: – For most of the queries the most common facets do not change significantly – Sorting by frequency: • The order almost is the same. • Difficult to see the changes  The information is not on the term's frequency but in the relative changes on these frequencies 9
  • 10. Example  Exploring MySpace posts  Demographic Data: Age, Gender, Country  Text Content 10
  • 11. Example  After a query (country = MX) 11
  • 12. Example  After a query (country = MX) Find the 10 differences !! 12
  • 13. Example  After a query (gender= Male) 13
  • 14. Example  After a query (gender= Male) Find the 10 differences !! 14
  • 15. The Information is on the differences  What do we want to know: – Which are the terms that use the Males and/or people from Mexico ? – OR – Which are the terms that better identify or differentiate them? 15
  • 16. Example  After a query (country = MX) Substract them !! 16
  • 17. Example  After a query (country = MX) Faceted Results: The most used words by the mexicans Statistical Results: The words that differentiate the mexicans from the rest of the world 17
  • 18. Example  After a query (gender = M) Faceted Results: The words that Males use most Statistical Results: The words that differentiate the males from the females 18
  • 19. The information is on the difference  We can compare a subset with the full set:  But we can compare different subsets: 19
  • 20. The information is on the difference  We can compare by gender, doing different queries 20
  • 21. Data Charts  We can apply differences to charts 21
  • 22. Data Charts  We can apply differences to charts: And show the difference with the global set 22
  • 23. Data Charts  Data Charts can show the differences Males Females 23
  • 24. Data Charts  Even through queries Males Females 24
  • 25. Ajax Solr  Single Query / Resultset Query Ajax Interface SOLR Resultset User Browser Javascript front end Server 25
  • 26. Ajax Solr Extension  Multiple Queries / Resultsets Query Ajax Interface SOLR Resultset User Browser Javascript front end Server 26
  • 27. Ajax Solr Extension: Multiple Sets  Each set has its own query – A query is composed of different query terms – The terms that define the set, can not be deleted – Query terms can be copied/deleted through the different sets  Widgets can view/compare data from the different result sets. 27
  • 28. Ajax Solr Extension: Multiple Sets Set 0 Independent widgets Base Data Set 1 Set 2 Widgets related to this set Widgets related to this set Own Filter Query Own Filter Query 28
  • 29. Conclusions  Facets give basic statistics  The information is on the differences  Ajax Solr can be adapted to manage different sets of data  Charts and graphics help us to have an overview of the data: information. 29
  • 30. Sources The code is not ready to be published • Was done for SolrJs • Migrating to AjaxSolr (Lucene-Solr 4.0) But it will be...  Links – http://www.barcelonamedia.org/personal/joan.co dina/en 30