SlideShare une entreprise Scribd logo
1  sur  38
Search, Signals & Sense:
  An Analytics Fueled Vision

Seth Grimes
@sethgrimes
A Sense Making Story

                       New York Times,
                       September 30, 2012
Valium: Starting a Chain of Connections

                           New York Times,
                           September 8, 1957
H.P. Luhn

By H.P. Luhn, in
IBM Journal,
April, 1958

http://altaplana.com/ibm-
luhn58-LiteratureAbstracts.pdf
Modelling Text

                                                           Luhn’s analysis of
                                                           Messengers of the Nervous
                                                           System, a Scientific American
                                                           article
                                                                 http://wordle.net, applied
                                                                    to the NY Times article




“Statistical information derived from word frequency and distribution is
used by the machine to compute a relative measure of significance, first
for individual words and then for sentences. Sentences scoring highest in
significance are extracted and printed out to become the auto-abstract.”
 -- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
Luhn’s Example

                 New York Times,
                 September 8, 1957
Close Reading
Can Software Make the Connection?




               Mark Lombardi, George W. Bush, Harken Energy
                      and Jackson Stephens, c. 1979-90, Detail
There and Back Again: Modelling Text, 2


The text content of a document can be considered an
 unordered “bag of words.”
Particular documents are points in a high-dimensional
 vector space.
         Salton, Wong &
         Yang, “A Vector
         Space Model for
         Automatic
         Indexing,”
         November 1975.
Modelling Text, 3


We might construct a document-term matrix...
  • D1 = “I like databases”
  • D2 = “I hate hate databases”

                 I        like               hate              databases
       D1        1        1                  0                 1
       D2        1        0                  2                 1
                              http://en.wikipedia.org/wiki/Term-document_matrix


and use a weighting such as TF-IDF (term frequency–
 inverse document frequency)…
in computing the cosine of the angle between
  weighted doc-vectors to determine similarity.
Modelling Text, 4


In the form of query-document similarity, this is
  Information Retrieval 101.
  • See, for instance, Salton & Buckley, “Term-Weighting
    Approaches in Automatic Text Retrieval,” 1988.
  • A useful basic tech paper: Russ Albright, SAS, “Taming Text
    with the SVD,” 2004.
Given the complexity of human language, statistical
 models may fall short.
  “Reading from text in general is a hard problem, because it
  involves all of common sense knowledge.”
                -- Expert systems pioneer Edward A. Feigenbaum
From Text to Data: Features


Analytical methods make text tractable.
  Latent semantic indexing utilizing singular value
    decomposition for term reduction / feature selection.
Classification technologies / methods:
  • Naive Bayes.
  • Support Vector Machine.
  • K-nearest neighbor.
“Reading from Text is a Hard Problem”

 Eugène
 Delacroix,
 St. Michael
 Defeats the
 Devil




          Thus the Orb he roam'd
With narrow search; and with inspection deep
   Consider'd every Creature, which of all
   Most opportune might serve his Wiles.
                     -- John Milton, Paradise Lost
Data, Search, Analysis, and Discovery

   Eugène
   Delacroix,
   St. Michael
   Defeats the
   Devil

                                     Data
For                                  Space
features
                                            Analysis
            Thus the Orb he roam'd
  With narrow search; and with inspection deep
     Consider'd every Creature, which of all Intent,
     Most opportune might serve his Wiles. Goals
                       -- John Milton, Paradise Lost
The User Interface

“Search is the UI for data today.”
                  -- Grant Ingersoll, Chief Scientist, LucidWorks
                                                       Quoted by Gil Press in Forbes,
                                  “LucidWorks: Bringing Search to Big Data”
                    http://www.forbes.com/sites/gilpress/2012/09/24/lucidworks-bringing-search-to-big-data/




What’s beyond?
Search and Sensemaking

“It is convenient to divide the entire
information access process into two
main components: information retrieval
through searching and browsing, and
analysis and synthesis of results. This
broader process is often referred to in
the literature as sensemaking.
Sensemaking refers to an iterative
process of formulating a conceptual
representation from of a large volume
of information. Search plays only one
part in this process.”
                   -- Marti Hearst, 2009
                          http://searchuserinterfaces.com/
Senseless Search

New but old: Dumb and siloed
Searcher Supplied Sense

Better?
Siloed signals.

More better?
Semantic Search Engines

Meh.
Clustered Clarity

Carrot2.
(open source)
Semanticized (Web) Search




Google
Knowledge
Graph
Search Fronted Analysis & Discovery


                                  Fusions,
                                  Signals
Toward Semantic Search Sensemaking

Old Search                    Sensemaking
Search on: keywords           + identity, history & context
Sources: content/type silos   Unified
Indexed: terms                + metadata (properties)
Returned: hit lists           Categories / clusters /
                              answers first
Relevance: PageRank           (Inferred) intent
Prevalence: plenty of new     Plenty of established
 platforms with old(ish)       search with new(ish)
 search                        capabilities, also wanna-
                               bes.
The Back End

Platforms and ecosystems.
APIs and services.
Text and content analytics --
   Discerns and extracts features including relationships from
     source materials.
   Features = entities, key-value pairs, concepts, topics,
     events, sentiment, etc.
   Provide (for) BI on content-sourced data.
Data integration, record linkage, data fusion.
Text+ Technology Mashups

Text/content analytics generates semantics to bridge
   search, BI, and applications, enabling next-
   generation information systems.
 Semantic search                           Information access
 (search + text)                           (search + text + BI)


Search based         Search         BI
applications
                                           Integrated analytics
(search + text +
                                           (text + BI)
apps)
                         Applica-
    Text analytics        tions          NextGen CRM, EFM,
    (inner circle)                       MR, marketing, …
Analytical Assets (Open Source)




                        >>> import nltk
                        >>> sentence = """At eight o'clock on Thursday
                        morning... Arthur didn't feel very good."""
                        >>> tokens = nltk.word_tokenize(sentence)
                        >>> tokens
                        ['At', 'eight', "o'clock", 'on', 'Thursday', 'morning',
                        'Arthur', 'did', "n't", 'feel', 'very', 'good', '.']
                        >>> tagged = nltk.pos_tag(tokens)
                        >>> tagged[0:6]
                        [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'),
                        ('Thursday', 'NNP'), ('morning', 'NN')]

                                                        http://nltk.org/
tm: Text Mining Package
A framework for text mining
applications within R.
A Big Data Analytics Architecture

http://hpccsystems.com/ (GNU Affero GPL)




           http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
Commercial (Non-OS) Solutions Plug In
Drivers and Trends

Social media!
    … and personal-social-enterprise integration.
Via-API cloud services.
Big Data (even if you don’t like the term).
    Volume and velocity mean new analytical approaches.
    Variety: new types and a new fusion imperative.
Sentiment: Mood, opinions, emotions, intent.
Question answering.
Text Tech Initiatives

Now and near future.
    • Broader & deeper international language support.
    • Sentiment analysis, beyond polarity.
      Emotions, intent signals. etc.
    • Identity resolution & profile extraction.
      Online-social-enterprise data integration.
    • Semantic data integration, Complex Data.
    • Speech analytics.
    • Discourse analysis.
      Because isolated messages are not conversations.
    • Rich-media content analytics.
    • Augmented reality; new human-computer interfaces.
Personal. Mobile. Intelligent?




http://timoelliott.com/blog/2010/10/sap-businessobjects-augmented-
explorer-now-available-resources-to-test-it.html
A Focus on Information & Applications

Now and near future.
    • Signal detection.
      Sentiment, emotion, identity, intent.
    • Semanticized applications.
      Linkable, mashable, enrichable.
    • Rich information.
      Context sensitive, situational.
Σ = Sensemaking.
Onward… to Q&A
Search, Signals & Sense:
  An Analytics Fueled Vision

Seth Grimes
@sethgrimes

Contenu connexe

Tendances

Km cognitive computing overview by ken martin 19 jan2015
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015HCL Technologies
 
AI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningAI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningKarl Seiler
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrievalGeorge Ang
 
State Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products ProjectsState Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products ProjectsPascal Cottereau
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMMArjen de Vries
 
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...IOSR Journals
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)Svitlana volkova
 

Tendances (9)

Km cognitive computing overview by ken martin 19 jan2015
Km   cognitive computing overview by ken martin 19 jan2015Km   cognitive computing overview by ken martin 19 jan2015
Km cognitive computing overview by ken martin 19 jan2015
 
AI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine LearningAI Deep Learning - CF Machine Learning
AI Deep Learning - CF Machine Learning
 
wEb infomation retrieval
wEb infomation retrievalwEb infomation retrieval
wEb infomation retrieval
 
Knowledge acquisition using automated techniques
Knowledge acquisition using automated techniquesKnowledge acquisition using automated techniques
Knowledge acquisition using automated techniques
 
State Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products ProjectsState Of The Art - Part 2 Products Projects
State Of The Art - Part 2 Products Projects
 
Information Retrieval intro TMM
Information Retrieval intro TMMInformation Retrieval intro TMM
Information Retrieval intro TMM
 
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
Brain Bridge: A Comparative Study between Database Querying and Human Memory ...
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
140101.rjr.pubs
140101.rjr.pubs140101.rjr.pubs
140101.rjr.pubs
 

En vedette

Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersSeth Grimes
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social SentimentSeth Grimes
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content AnalyticsSeth Grimes
 
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersText/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersSeth Grimes
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social MediaSeth Grimes
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics TodaySeth Grimes
 
Design of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataDesign of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataLucie Šperková
 

En vedette (7)

Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics
 
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and ProvidersText/Content Analytics 2011: User Perspectives on Solutions and Providers
Text/Content Analytics 2011: User Perspectives on Solutions and Providers
 
Knowledge Extraction from Social Media
Knowledge Extraction from Social MediaKnowledge Extraction from Social Media
Knowledge Extraction from Social Media
 
Text Analytics Today
Text Analytics TodayText Analytics Today
Text Analytics Today
 
Design of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream dataDesign of multichannel attribution model using click stream data
Design of multichannel attribution model using click stream data
 

Similaire à Search, Signals & Sensemaking: An Analytics Vision

Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxstilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text minianhcrowley
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011Seth Grimes
 
Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...Alexander Decker
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1Sumit Sony
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataSeth Grimes
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextSeth Grimes
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.docbutest
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsSeth Grimes
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Studyijcsit
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibEl Habib NFAOUI
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic TechnolgyTalat Fakhri
 
Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Joe Lamantia
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Amit Sheth
 
Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-Taymoor Nazmy
 

Similaire à Search, Signals & Sensemaking: An Analytics Vision (20)

Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...Integrated approach for domain dimensional information retrieval system by us...
Integrated approach for domain dimensional information retrieval system by us...
 
Hci
HciHci
Hci
 
Text mining introduction-1
Text mining   introduction-1Text mining   introduction-1
Text mining introduction-1
 
Oss swot
Oss swotOss swot
Oss swot
 
Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]Computing for Human Experience [v3, Aug-Oct 2010]
Computing for Human Experience [v3, Aug-Oct 2010]
 
BrightTALK - Semantic AI
BrightTALK - Semantic AI BrightTALK - Semantic AI
BrightTALK - Semantic AI
 
Analysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ DataAnalysis of ‘Unstructured’ Data
Analysis of ‘Unstructured’ Data
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
text_mining.doc
text_mining.doctext_mining.doc
text_mining.doc
 
Introduction to Text Mining and Semantics
Introduction to Text Mining and SemanticsIntroduction to Text Mining and Semantics
Introduction to Text Mining and Semantics
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
 
Web_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_HabibWeb_Mining_Overview_Nfaoui_El_Habib
Web_Mining_Overview_Nfaoui_El_Habib
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Semantic Technolgy
Semantic TechnolgySemantic Technolgy
Semantic Technolgy
 
Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013Discovery and the Age of Insight: Walmart EIM Open House 2013
Discovery and the Age of Insight: Walmart EIM Open House 2013
 
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
Semantic Integration of Citizen Sensor Data and Multilevel Sensing: A compreh...
 
Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-Artificial intelligent Lec 1-ai-introduction-
Artificial intelligent Lec 1-ai-introduction-
 

Plus de Seth Grimes

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowSeth Grimes
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextSeth Grimes
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Seth Grimes
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonSeth Grimes
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AISeth Grimes
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market TrendsSeth Grimes
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPersSeth Grimes
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Seth Grimes
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AISeth Grimes
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case studySeth Grimes
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisSeth Grimes
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to PracticeSeth Grimes
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialSeth Grimes
 
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment AnalysisSeth Grimes
 
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseSeth Grimes
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewSeth Grimes
 

Plus de Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
 
Social Data Sentiment Analysis
Social Data Sentiment AnalysisSocial Data Sentiment Analysis
Social Data Sentiment Analysis
 
Global Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and SenseGlobal Analytics: Text, Speech, Sentiment, and Sense
Global Analytics: Text, Speech, Sentiment, and Sense
 
Text Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry ViewText Analytics Past, Present & Future: An Industry View
Text Analytics Past, Present & Future: An Industry View
 

Dernier

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Dernier (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Search, Signals & Sensemaking: An Analytics Vision

  • 1. Search, Signals & Sense: An Analytics Fueled Vision Seth Grimes @sethgrimes
  • 2. A Sense Making Story New York Times, September 30, 2012
  • 3. Valium: Starting a Chain of Connections New York Times, September 8, 1957
  • 4. H.P. Luhn By H.P. Luhn, in IBM Journal, April, 1958 http://altaplana.com/ibm- luhn58-LiteratureAbstracts.pdf
  • 5.
  • 6. Modelling Text Luhn’s analysis of Messengers of the Nervous System, a Scientific American article http://wordle.net, applied to the NY Times article “Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” -- H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
  • 7. Luhn’s Example New York Times, September 8, 1957
  • 9.
  • 10. Can Software Make the Connection? Mark Lombardi, George W. Bush, Harken Energy and Jackson Stephens, c. 1979-90, Detail
  • 11. There and Back Again: Modelling Text, 2 The text content of a document can be considered an unordered “bag of words.” Particular documents are points in a high-dimensional vector space. Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
  • 12. Modelling Text, 3 We might construct a document-term matrix... • D1 = “I like databases” • D2 = “I hate hate databases” I like hate databases D1 1 1 0 1 D2 1 0 2 1 http://en.wikipedia.org/wiki/Term-document_matrix and use a weighting such as TF-IDF (term frequency– inverse document frequency)… in computing the cosine of the angle between weighted doc-vectors to determine similarity.
  • 13. Modelling Text, 4 In the form of query-document similarity, this is Information Retrieval 101. • See, for instance, Salton & Buckley, “Term-Weighting Approaches in Automatic Text Retrieval,” 1988. • A useful basic tech paper: Russ Albright, SAS, “Taming Text with the SVD,” 2004. Given the complexity of human language, statistical models may fall short. “Reading from text in general is a hard problem, because it involves all of common sense knowledge.” -- Expert systems pioneer Edward A. Feigenbaum
  • 14. From Text to Data: Features Analytical methods make text tractable. Latent semantic indexing utilizing singular value decomposition for term reduction / feature selection. Classification technologies / methods: • Naive Bayes. • Support Vector Machine. • K-nearest neighbor.
  • 15. “Reading from Text is a Hard Problem” Eugène Delacroix, St. Michael Defeats the Devil Thus the Orb he roam'd With narrow search; and with inspection deep Consider'd every Creature, which of all Most opportune might serve his Wiles. -- John Milton, Paradise Lost
  • 16. Data, Search, Analysis, and Discovery Eugène Delacroix, St. Michael Defeats the Devil Data For Space features Analysis Thus the Orb he roam'd With narrow search; and with inspection deep Consider'd every Creature, which of all Intent, Most opportune might serve his Wiles. Goals -- John Milton, Paradise Lost
  • 17. The User Interface “Search is the UI for data today.” -- Grant Ingersoll, Chief Scientist, LucidWorks Quoted by Gil Press in Forbes, “LucidWorks: Bringing Search to Big Data” http://www.forbes.com/sites/gilpress/2012/09/24/lucidworks-bringing-search-to-big-data/ What’s beyond?
  • 18. Search and Sensemaking “It is convenient to divide the entire information access process into two main components: information retrieval through searching and browsing, and analysis and synthesis of results. This broader process is often referred to in the literature as sensemaking. Sensemaking refers to an iterative process of formulating a conceptual representation from of a large volume of information. Search plays only one part in this process.” -- Marti Hearst, 2009 http://searchuserinterfaces.com/
  • 19. Senseless Search New but old: Dumb and siloed
  • 25. Search Fronted Analysis & Discovery Fusions, Signals
  • 26. Toward Semantic Search Sensemaking Old Search Sensemaking Search on: keywords + identity, history & context Sources: content/type silos Unified Indexed: terms + metadata (properties) Returned: hit lists Categories / clusters / answers first Relevance: PageRank (Inferred) intent Prevalence: plenty of new Plenty of established platforms with old(ish) search with new(ish) search capabilities, also wanna- bes.
  • 27. The Back End Platforms and ecosystems. APIs and services. Text and content analytics -- Discerns and extracts features including relationships from source materials. Features = entities, key-value pairs, concepts, topics, events, sentiment, etc. Provide (for) BI on content-sourced data. Data integration, record linkage, data fusion.
  • 28. Text+ Technology Mashups Text/content analytics generates semantics to bridge search, BI, and applications, enabling next- generation information systems. Semantic search Information access (search + text) (search + text + BI) Search based Search BI applications Integrated analytics (search + text + (text + BI) apps) Applica- Text analytics tions NextGen CRM, EFM, (inner circle) MR, marketing, …
  • 29. Analytical Assets (Open Source) >>> import nltk >>> sentence = """At eight o'clock on Thursday morning... Arthur didn't feel very good.""" >>> tokens = nltk.word_tokenize(sentence) >>> tokens ['At', 'eight', "o'clock", 'on', 'Thursday', 'morning', 'Arthur', 'did', "n't", 'feel', 'very', 'good', '.'] >>> tagged = nltk.pos_tag(tokens) >>> tagged[0:6] [('At', 'IN'), ('eight', 'CD'), ("o'clock", 'JJ'), ('on', 'IN'), ('Thursday', 'NNP'), ('morning', 'NN')] http://nltk.org/ tm: Text Mining Package A framework for text mining applications within R.
  • 30. A Big Data Analytics Architecture http://hpccsystems.com/ (GNU Affero GPL) http://www.geeklawblog.com/2011/12/lexis-advance-platform-launch-two.html
  • 32. Drivers and Trends Social media! … and personal-social-enterprise integration. Via-API cloud services. Big Data (even if you don’t like the term). Volume and velocity mean new analytical approaches. Variety: new types and a new fusion imperative. Sentiment: Mood, opinions, emotions, intent. Question answering.
  • 33. Text Tech Initiatives Now and near future. • Broader & deeper international language support. • Sentiment analysis, beyond polarity. Emotions, intent signals. etc. • Identity resolution & profile extraction. Online-social-enterprise data integration. • Semantic data integration, Complex Data. • Speech analytics. • Discourse analysis. Because isolated messages are not conversations. • Rich-media content analytics. • Augmented reality; new human-computer interfaces.
  • 35. A Focus on Information & Applications Now and near future. • Signal detection. Sentiment, emotion, identity, intent. • Semanticized applications. Linkable, mashable, enrichable. • Rich information. Context sensitive, situational. Σ = Sensemaking.
  • 36.
  • 38. Search, Signals & Sense: An Analytics Fueled Vision Seth Grimes @sethgrimes