SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Text Analytics in Enterprise Search
         Daniel Ling (Findwise)
What will I cover?
   Intro
   About Text Analytics
   Benefits and possibilities
   Examples
   Solution Techniques to Examples
   Conclusions




                            3
My Background
   Daniel Ling
   Findwise
   Enterprise Search and Findability Consultant
   Experience and expertise
      5+ years of Enterprise Search Experience
      20+ enterprise search implementations, ranging industries
      Lucene, FAST ESP, Solr
      Apache Solr my primary search platform
      Focus areas includes Findability and Search Architecture and
       Implementation, Text Analytics, Document Processing.




                                    4
About Text Analytics




          5
Text Analytics in the Enterprise
Challenges:
 80% of data in the Enterprise is unstructured.
 Reduce the time looking for information (currently 9.6 hours per week)
 Reduce the time reading documents / e-mails (currently 14.5 hours per
  week)

Benefits:
 More predictable scale and domain
 Well-understood domain
 Supporting content for analytics can be identified




                                   6
Text Analytics
The definition


   A set of linguistic, statistical and machine learning techniques
   used to model and structure information content of textual
   source.

      - Wikipedia.org




                                7
Types of Applications


•   Entity Extraction
•   Document Categorization
•   Sentiment Analysis
•   Summarization




                              8
Frameworks and Techniques


Framework                          Techniques

Solr                               Statistics, Lingustics

Mallet, Classifier4j, etc, etc..   Statistical natural language processing

Mahout (Hadoop)                    Machine Learning, Statistics

GATE                               General language processing framework

UIMA                               Content analytics, text mining, pipeline

OpenNLP                            Machine learning toolkit for NLP


                                              9
Benefits and possibilities




            10
Benefits and possibilities

 Text analytics can bring some structure to the unstructured content
 Enhance discovery and findability of content
   • Works well together with search
 Increase relevance and precision with extracted keywords and meta-
  data
 Generating content for dynamic pages / topic pages
   • Selection of documents and extracts from documents
 Track and discover sentiments
 Reduce the time for user to analyze content




                                 11
Examples




   12
Entity Extraction

 Types of Entities for Extraction:
   • Dates
   • Places
   • Companies
   • Objects (Product names, etc)
   • People
   • Events




                                  13
Example – Presenting the data




               14
Example – Presenting the data




              15
Example – Facets on the data




               16
Example Solution: Entity Extraction
 Rule-based entity extraction
    Combination of lists and regular expressions
 Works within well-understood domains.
 Requires maintaining lists.
 Lists from: Country lists from World Factbook, Public Companies from
  Google Finance, Customers from CRM.
 Workflow: Document for indexing > Update Request Handler >
  Update Chain (lookup and match entities) > Writes to index



             Update Chain
                     (processor)                                   Lucene Index
        (lists | input fields | entity fields)
                                                 (entity fields)




                                                          17
Example Solution: Entity Extraction
 Register a custom class to lookup resources and extract found entities
  to specific Solr fields, setup in solrconfig.xml:




                                     18
Document Categorization

   To assign a label to the document / content / data.
   Labels for the category or for the sentiment.
   Threshold values for matching a category before labeling.
   Statistics and “knowledge” from previous examples can be used.




                                  19
Example – Facets from Categories




                 20
Example Solution: Document
                Categorization


                                               *

 Training the component, Mallet (Machine Learning for Language
  Toolkit).
   • Alternative components includes Lucene (TFIDF) index
      (MoreLikeThis), OpenNLP, Textcat, Classifier4j.
 Running the new documents against the model/index of trained
  documents.
 Training from interface, adhoc, or index pre-categorized.

* Figure from the book Taming Text.


                                      21
Example Solution: Document
             Categorization
 Mallet and the process of setup and train:




                                   22
Example Solution: Document
              Categorization
 Evaluation of new document:




 Setting the evaluated category tag to the document in pipeline:


            Update Chain
                 (processor)                        Lucene Index
              (input document)
                                 (category field)




                                            23
Document Summarization

 Summarize a document, at index time or on-demand.
 Leverage from the knowledge and term statistics of the document
  and the index.
 Picks the “most important” sentences based on the statistics and
  displays those.




                                 24
Example – Summarize content


Static Summaries




Dynamic Summaries




                    25
Example – Summarize content - 1




                   26
Example – Summarize content - 2




                  27
Example Solution: Document
           Summarization
 Custom RequestHandler that receives document ID and field to
  summarize.
 Custom Search Component making the selection of top sentences.
 Selecting a subset of sentences and sends these back in a field.




               RequestHandler                         Lucene Index
          (SearchComponent for summariziation)




                                                 28
Wrap Up

• Examples: Entity Extraction, Document Categorization,
  Summarization.
• Technology: You can take small steps and get a great
  deal of gain, since you can leverage from features and
  components of Solr and Lucene (as well as other open
  source NLP frameworks).
• Value: Benefits from text analytics includes the increase
  in discovery, findability and productivity from the
  solution.




                                29
Questions ?



daniel.ling@findwise.com
www.findabilityblog.com




            30

Contenu connexe

Tendances

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrievalKU Leuven
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrievalrchbeir
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trendsKU Leuven
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Kira
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval Tariq Hassan
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal databaseTPO TPO
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining AreaMahamudHasanCSE
 
ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)Arun Verma
 
ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics Arun Verma
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challengeGan Keng Hoon
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Trey Grainger
 

Tendances (19)

Tdm information retrieval
Tdm information retrievalTdm information retrieval
Tdm information retrieval
 
Information Retrieval
Information RetrievalInformation Retrieval
Information Retrieval
 
Tdm recent trends
Tdm recent trendsTdm recent trends
Tdm recent trends
 
Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)Tutorial 1 (information retrieval basics)
Tutorial 1 (information retrieval basics)
 
Techniques of information retrieval
Techniques of information retrieval Techniques of information retrieval
Techniques of information retrieval
 
Text Indexing and Retrieval
Text Indexing and RetrievalText Indexing and Retrieval
Text Indexing and Retrieval
 
Multidimensioal database
Multidimensioal  databaseMultidimensioal  database
Multidimensioal database
 
Text mining presentation in Data mining Area
Text mining presentation in Data mining AreaText mining presentation in Data mining Area
Text mining presentation in Data mining Area
 
ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)ATLAS.ti Training - Covering the Basics (Mac edition)
ATLAS.ti Training - Covering the Basics (Mac edition)
 
ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics ATLAS.ti training presentation: Covering the basics
ATLAS.ti training presentation: Covering the basics
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Text mining
Text miningText mining
Text mining
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
Information retrieval concept, practice and challenge
Information retrieval   concept, practice and challengeInformation retrieval   concept, practice and challenge
Information retrieval concept, practice and challenge
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...Crowdsourced query augmentation through the semantic discovery of domain spec...
Crowdsourced query augmentation through the semantic discovery of domain spec...
 

Similaire à Text Analytics in Enterprise Search

Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Cre-Aid
 
Scoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the JobScoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the JobJason Kaufman
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchJoshuaApolonio1
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxShanmugasundaram M
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Applying ocr to extract information : Text mining
Applying ocr to extract information  : Text miningApplying ocr to extract information  : Text mining
Applying ocr to extract information : Text miningSaurabh Singh
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Text mining and analytics v6 - p1
Text mining and analytics   v6 - p1Text mining and analytics   v6 - p1
Text mining and analytics v6 - p1Dave King
 
Abacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital RecordsAbacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital RecordsProjectAbaca
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...RINUSATHYAN
 

Similaire à Text Analytics in Enterprise Search (20)

Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
Jeroen Kleinhoven (Treparel), Turn Big Content into Business Insights - Data ...
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Scoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the JobScoping Level of Effort and Getting the Right Resources for the Job
Scoping Level of Effort and Getting the Right Resources for the Job
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 
qualitative.ppt
qualitative.pptqualitative.ppt
qualitative.ppt
 
Using Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative ResearchUsing Computer as a Research Assistant in Qualitative Research
Using Computer as a Research Assistant in Qualitative Research
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Self Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docxSelf Study Business Approach to DS_01022022.docx
Self Study Business Approach to DS_01022022.docx
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Applying ocr to extract information : Text mining
Applying ocr to extract information  : Text miningApplying ocr to extract information  : Text mining
Applying ocr to extract information : Text mining
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
IR with lucene
IR with luceneIR with lucene
IR with lucene
 
Text mining and analytics v6 - p1
Text mining and analytics   v6 - p1Text mining and analytics   v6 - p1
Text mining and analytics v6 - p1
 
Welsh Government Workshop
Welsh Government WorkshopWelsh Government Workshop
Welsh Government Workshop
 
Abacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital RecordsAbacá: Technically Assisted Sensitivity Review of Digital Records
Abacá: Technically Assisted Sensitivity Review of Digital Records
 
Dissertation literature search
Dissertation literature searchDissertation literature search
Dissertation literature search
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...Frameworks provide structure. The core objective of the Big Data Framework is...
Frameworks provide structure. The core objective of the Big Data Framework is...
 
Prototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional RepositoryPrototype Design of Open Access Institutional Repository
Prototype Design of Open Access Institutional Repository
 

Plus de Findwise

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017Findwise
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017Findwise
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017Findwise
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM WatsonFindwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findwise
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindwise
 
Findability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindwise
 
Findability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindwise
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindwise
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindwise
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findwise
 
Findability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindwise
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Findwise
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findwise
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindwise
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findwise
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...Findwise
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findwise
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...Findwise
 

Plus de Findwise (20)

White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017White Arkitekter - Findability Day Roadshow 2017
White Arkitekter - Findability Day Roadshow 2017
 
AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017AI och maskininlärning - Findability Day Roadshow 2017
AI och maskininlärning - Findability Day Roadshow 2017
 
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
De kognitiva eran med IBM Watson - Findability Day Roadshow 2017
 
Findwise and IBM Watson
Findwise and IBM WatsonFindwise and IBM Watson
Findwise and IBM Watson
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016Findability Day 2016 - Enterprise Search and Findability Survey 2016
Findability Day 2016 - Enterprise Search and Findability Survey 2016
 
Findability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learningFindability Day 2016 - Big data analytics and machine learning
Findability Day 2016 - Big data analytics and machine learning
 
Findability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaborationFindability Day 2016 - Enterprise social collaboration
Findability Day 2016 - Enterprise social collaboration
 
Findability Day 2016 - SKF case study
Findability Day 2016 - SKF case studyFindability Day 2016 - SKF case study
Findability Day 2016 - SKF case study
 
Findability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experienceFindability Day 2016 - Structuring content for user experience
Findability Day 2016 - Structuring content for user experience
 
Findability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligenceFindability Day 2016 - Augmented intelligence
Findability Day 2016 - Augmented intelligence
 
Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?Findability Day 2016 - What is GDPR?
Findability Day 2016 - What is GDPR?
 
Findability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPRFindability Day 2016 - Get started with GDPR
Findability Day 2016 - Get started with GDPR
 
Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365Digital workplace och informationshantering i office 365
Digital workplace och informationshantering i office 365
 
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
Findability Day 2015 - Mickel Grönroos - Findwise - How to increase safety on...
 
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any messFindability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
Findability Day 2015 - Abby Covert - Keynote - How to make sense of any mess
 
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
Findability Day 2015 - Noel Garry - IBM - Information governance and a 360 de...
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
 
Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!Findability Day 2015 - Martin White - The future is search!
Findability Day 2015 - Martin White - The future is search!
 
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...Findability Day 2015   Liam Holley - Dassault systems - Insight and discovery...
Findability Day 2015 Liam Holley - Dassault systems - Insight and discovery...
 

Dernier

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Dernier (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Text Analytics in Enterprise Search

  • 1. Text Analytics in Enterprise Search Daniel Ling (Findwise)
  • 2. What will I cover?  Intro  About Text Analytics  Benefits and possibilities  Examples  Solution Techniques to Examples  Conclusions 3
  • 3. My Background  Daniel Ling  Findwise  Enterprise Search and Findability Consultant  Experience and expertise  5+ years of Enterprise Search Experience  20+ enterprise search implementations, ranging industries  Lucene, FAST ESP, Solr  Apache Solr my primary search platform  Focus areas includes Findability and Search Architecture and Implementation, Text Analytics, Document Processing. 4
  • 5. Text Analytics in the Enterprise Challenges:  80% of data in the Enterprise is unstructured.  Reduce the time looking for information (currently 9.6 hours per week)  Reduce the time reading documents / e-mails (currently 14.5 hours per week) Benefits:  More predictable scale and domain  Well-understood domain  Supporting content for analytics can be identified 6
  • 6. Text Analytics The definition A set of linguistic, statistical and machine learning techniques used to model and structure information content of textual source. - Wikipedia.org 7
  • 7. Types of Applications • Entity Extraction • Document Categorization • Sentiment Analysis • Summarization 8
  • 8. Frameworks and Techniques Framework Techniques Solr Statistics, Lingustics Mallet, Classifier4j, etc, etc.. Statistical natural language processing Mahout (Hadoop) Machine Learning, Statistics GATE General language processing framework UIMA Content analytics, text mining, pipeline OpenNLP Machine learning toolkit for NLP 9
  • 10. Benefits and possibilities  Text analytics can bring some structure to the unstructured content  Enhance discovery and findability of content • Works well together with search  Increase relevance and precision with extracted keywords and meta- data  Generating content for dynamic pages / topic pages • Selection of documents and extracts from documents  Track and discover sentiments  Reduce the time for user to analyze content 11
  • 11. Examples 12
  • 12. Entity Extraction  Types of Entities for Extraction: • Dates • Places • Companies • Objects (Product names, etc) • People • Events 13
  • 13. Example – Presenting the data 14
  • 14. Example – Presenting the data 15
  • 15. Example – Facets on the data 16
  • 16. Example Solution: Entity Extraction  Rule-based entity extraction  Combination of lists and regular expressions  Works within well-understood domains.  Requires maintaining lists.  Lists from: Country lists from World Factbook, Public Companies from Google Finance, Customers from CRM.  Workflow: Document for indexing > Update Request Handler > Update Chain (lookup and match entities) > Writes to index Update Chain (processor) Lucene Index (lists | input fields | entity fields) (entity fields) 17
  • 17. Example Solution: Entity Extraction  Register a custom class to lookup resources and extract found entities to specific Solr fields, setup in solrconfig.xml: 18
  • 18. Document Categorization  To assign a label to the document / content / data.  Labels for the category or for the sentiment.  Threshold values for matching a category before labeling.  Statistics and “knowledge” from previous examples can be used. 19
  • 19. Example – Facets from Categories 20
  • 20. Example Solution: Document Categorization *  Training the component, Mallet (Machine Learning for Language Toolkit). • Alternative components includes Lucene (TFIDF) index (MoreLikeThis), OpenNLP, Textcat, Classifier4j.  Running the new documents against the model/index of trained documents.  Training from interface, adhoc, or index pre-categorized. * Figure from the book Taming Text. 21
  • 21. Example Solution: Document Categorization  Mallet and the process of setup and train: 22
  • 22. Example Solution: Document Categorization  Evaluation of new document:  Setting the evaluated category tag to the document in pipeline: Update Chain (processor) Lucene Index (input document) (category field) 23
  • 23. Document Summarization  Summarize a document, at index time or on-demand.  Leverage from the knowledge and term statistics of the document and the index.  Picks the “most important” sentences based on the statistics and displays those. 24
  • 24. Example – Summarize content Static Summaries Dynamic Summaries 25
  • 25. Example – Summarize content - 1 26
  • 26. Example – Summarize content - 2 27
  • 27. Example Solution: Document Summarization  Custom RequestHandler that receives document ID and field to summarize.  Custom Search Component making the selection of top sentences.  Selecting a subset of sentences and sends these back in a field. RequestHandler Lucene Index (SearchComponent for summariziation) 28
  • 28. Wrap Up • Examples: Entity Extraction, Document Categorization, Summarization. • Technology: You can take small steps and get a great deal of gain, since you can leverage from features and components of Solr and Lucene (as well as other open source NLP frameworks). • Value: Benefits from text analytics includes the increase in discovery, findability and productivity from the solution. 29