SlideShare une entreprise Scribd logo
1  sur  65
Analysis of ‘Unstructured’ Data Seth Grimes Alta Plana Corporation 301-270-0795 --  http://altaplana.com ASA Chicago Chapter Proliferation of Digital Information and Recent Uses in Statistical Applications  May 15, 2009
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Perspectives ,[object Object],[object Object],[object Object],[object Object],[object Object]
Context ,[object Object],[object Object],[object Object],[object Object],[object Object]
Are these data sets? “ Unstructured” data
Are these data sets? “ Unstructured” data
www.stanford.edu/%7ernusse/wntwindow.html Axin and Frat1 interact with dvl and GSK, bridging Dvl to GSK in Wnt-mediated regulation of LEF-1. Wnt proteins transduce their signals through dishevelled (Dvl) proteins to inhibit glycogen synthase kinase 3beta (GSK), leading to the accumulation of cytosolic beta-catenin and activation of TCF/LEF-1 transcription factors. To understand the mechanism by which Dvl acts through GSK to regulate LEF-1, we investigated the roles of Axin and Frat1 in Wnt-mediated activation of LEF-1 in mammalian cells. We found that Dvl interacts with Axin and with Frat1, both of which interact with GSK. Similarly, the Frat1 homolog GBP binds Xenopus Dishevelled in an interaction that requires GSK. We also found that Dvl, Axin and GSK can form a ternary complex bridged by Axin, and that Frat1 can be recruited into this complex probably by Dvl. The observation that the Dvl-binding domain of either Frat1 or Axin was able to inhibit Wnt-1-induced LEF-1 activation suggests that the interactions between Dvl and Axin and between Dvl and Frat may be important for this signaling pathway. Furthermore, Wnt-1 appeared to promote the disintegration of the Frat1-Dvl-GSK-Axin complex, resulting in the dissociation of GSK from Axin. Thus, formation of the quaternary complex may be an important step in Wnt signaling, by which Dvl recruits Frat1, leading to Frat1-mediated dissociation of GSK from Axin. www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10428961&dopt=Abstract More “unstructured” data
 
 
Text descriptive statistics ,[object Object],[object Object],[object Object],[object Object]
New York Times , September 8, 1957
“ Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” H.P. Luhn,  The Automatic Creation of Literature Abstracts ,  IBM Journal , 1958.
Text-BI:  Back to the Future ,[object Object],[object Object],[object Object]
Document input and processing Knowledge handling is key
 
Text modelling The text content of a document can be considered an unordered “bag of words.” Particular documents are points in a high-dimensional vector space. Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
Text modelling ,[object Object],[object Object],[object Object],[object Object],[object Object],http://en.wikipedia.org/wiki/Term-document_matrix I like hate databases D1 1 1 0 1 D2 1 0 2 1
Text modelling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Text modelling ,[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Unstructured sources
Unstructured sources ,[object Object],[object Object],[object Object]
The “unstructured” data challenge ,[object Object],[object Object],[object Object]
Applications ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search is not the answer Relevance? Concepts? Articles from a forum site Articles from 1987
Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Search ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Smarter search Text analytics enables results that suit the information and the user, e.g., answers –
Presentation of search results can be enhanced by discovery. This slide and the next show dynamic, clustered search results from Grokker… live.grokker.com/grokker.html?query=text%20analytics&Yahoo=true&Wikipedia=true&numResults=250
… with a zoomable display. Clustering here utilizes statistical (text) data mining techniques to identifying cohesive groupings of retrieved documents.
More results clustering... A dynamic network viz.: the Touch-Graph Google-Browser applet touchgraph.com/ TGGoogleBrowser.php ?start=text%20analytics
Beyond search
Data  Mining Text  Mining Data Retrieval Information Retrieval Search/Query (goal-oriented) ‏ Discovery (opportunistic) ‏ Fielded Data Documents Based on Je Wei Liang,  www.database.cis.nctu.edu.tw/seminars/2003F/TWM/slides/p.ppt Text mining
Semantic Search BI Search Data  Mining Text  Mining Data Retrieval Information Retrieval Search/Query (goal-oriented) Discovery (opportunistic) Fielded Data Documents Where’s Text Analytics? Text analytics
Text analytics ,[object Object],[object Object],[object Object],[object Object]
Text analytics ,[object Object],[object Object],[object Object],[object Object],[object Object]
Text analytics ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
New York Times , September 8, 1957 Anaphora / coreference External reference
 
 
 
Information extraction When we understand, for instance, parts of speech – <subject> <verb> <object> – we’re in a position to discern facts and relationships. Let's see text augmentation (tagging) in action.  We'll use GATE, an open-source tool...
 
 
 
 
Example: E-mail ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Example: E-mail ,[object Object],[object Object],[object Object]
Example: Survey The respondent is invited to explain his/her attitude:
Example: Survey ,[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Attitudinal data
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Attitudinal data
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Sentiment and opinion
[object Object],[object Object],[object Object],Example: law enforcement
Example: law enforcement An Attensity  law- enforcement  example –  NLP to  identify roles and relationships.
Example: law enforcement
[object Object],[object Object],[object Object],[object Object],Case study: IBM’s MedTAKMI
MEDLINE from the National Center for Biotechnology Information hosts links to many widely used information sources such as the  PubMed database of 18 million biomedical journal abstracts.  Visit  www.ncbi.nlm.nih.gov . Case study: IBM’s MedTAKMI
 
 
[object Object],[object Object],[object Object],VOC research study
Information analyzed
ROI measured, planned & achieved
Solution providers What should a prospective user look for? Response Percent deep sentiment/opinion extraction  80% ability to use specialized dictionaries or taxonomies 76% broad information extraction capability 60% adaptation for particular sectors, e.g., hospitality, retail, health care,  communications 56% predictive-analytics integration 48% BI (business intelligence) integration 48% support for multiple languages 48% ability to create custom workflows 32% low cost 32% hosted or &quot;as a service&quot; option 32% specialized VoC analysis interface 24%
Key message ,[object Object],[object Object],[object Object],[object Object],[object Object]
The vendor marketplace

Contenu connexe

Tendances

Modern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsModern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and Implementations
David J Rosenthal
 

Tendances (20)

Modern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsModern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and Implementations
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Predictive analytics
Predictive analytics Predictive analytics
Predictive analytics
 
Microsoft Power BI | Brief Introduction | PPT
Microsoft Power BI | Brief Introduction | PPTMicrosoft Power BI | Brief Introduction | PPT
Microsoft Power BI | Brief Introduction | PPT
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Business intelligence
Business intelligenceBusiness intelligence
Business intelligence
 
Sap BusinessObjects 4
Sap BusinessObjects 4Sap BusinessObjects 4
Sap BusinessObjects 4
 
A deep dive session on Tableau
A deep dive session on TableauA deep dive session on Tableau
A deep dive session on Tableau
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Business Intelligence: Data Warehouses
Business Intelligence: Data WarehousesBusiness Intelligence: Data Warehouses
Business Intelligence: Data Warehouses
 
Data Visualization - A Brief Overview
Data Visualization - A Brief OverviewData Visualization - A Brief Overview
Data Visualization - A Brief Overview
 
Data analytics
Data analyticsData analytics
Data analytics
 
Infra Migration Proposal Draft from Oracle to Snowflake
Infra Migration Proposal Draft from Oracle to SnowflakeInfra Migration Proposal Draft from Oracle to Snowflake
Infra Migration Proposal Draft from Oracle to Snowflake
 
The Business Value of Metadata for Data Governance
The Business Value of Metadata for Data GovernanceThe Business Value of Metadata for Data Governance
The Business Value of Metadata for Data Governance
 
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
 
Preparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guidePreparing a data migration plan: A practical guide
Preparing a data migration plan: A practical guide
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
2021 gartner mq dsml
2021 gartner mq dsml2021 gartner mq dsml
2021 gartner mq dsml
 

Similaire à Analysis of ‘Unstructured’ Data

Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
ijcsit
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Dan Keldsen
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
stilliegeorgiana
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
anhcrowley
 

Similaire à Analysis of ‘Unstructured’ Data (20)

Text Analytics for Dummies 2010
Text Analytics for Dummies 2010Text Analytics for Dummies 2010
Text Analytics for Dummies 2010
 
Text Analytics Overview, 2011
Text Analytics Overview, 2011Text Analytics Overview, 2011
Text Analytics Overview, 2011
 
Predictive Text Analytics
Predictive Text AnalyticsPredictive Text Analytics
Predictive Text Analytics
 
An Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentationAn Introduction to Text Analytics: 2013 Workshop presentation
An Introduction to Text Analytics: 2013 Workshop presentation
 
12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics12 Things the Semantic Web Should Know about Content Analytics
12 Things the Semantic Web Should Know about Content Analytics
 
Data Science - Part XI - Text Analytics
Data Science - Part XI - Text AnalyticsData Science - Part XI - Text Analytics
Data Science - Part XI - Text Analytics
 
AAUP 2008: Making XML Work (T. Kerner)
AAUP 2008: Making XML Work (T. Kerner)AAUP 2008: Making XML Work (T. Kerner)
AAUP 2008: Making XML Work (T. Kerner)
 
Clustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative StudyClustering of Deep WebPages: A Comparative Study
Clustering of Deep WebPages: A Comparative Study
 
Empowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic EnrichmentEmpowering Search Through 3RDi Semantic Enrichment
Empowering Search Through 3RDi Semantic Enrichment
 
Metadata and Analytics
Metadata and AnalyticsMetadata and Analytics
Metadata and Analytics
 
Text, Content, and Social Analytics: BI for the New World
Text, Content, and Social Analytics: BI for the New WorldText, Content, and Social Analytics: BI for the New World
Text, Content, and Social Analytics: BI for the New World
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”Smartlogic, Semaphore and Semantically Enhanced Search –  For “Discovery”
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
 
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...Information Architecture Primer - Integrating search,tagging, taxonomy and us...
Information Architecture Primer - Integrating search,tagging, taxonomy and us...
 
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
 
Post 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docxPost 1What is text analytics How does it differ from text mini.docx
Post 1What is text analytics How does it differ from text mini.docx
 
Post 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text miniPost 1What is text analytics How does it differ from text mini
Post 1What is text analytics How does it differ from text mini
 
Synthesys Technical Overview
Synthesys Technical OverviewSynthesys Technical Overview
Synthesys Technical Overview
 
Barga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learningBarga, roger. predictive analytics with microsoft azure machine learning
Barga, roger. predictive analytics with microsoft azure machine learning
 
Competitive Intelligence Made easy
Competitive Intelligence Made easyCompetitive Intelligence Made easy
Competitive Intelligence Made easy
 

Plus de Seth Grimes

Plus de Seth Grimes (20)

Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Creating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to KnowCreating an AI Startup: What You Need to Know
Creating an AI Startup: What You Need to Know
 
NLP 2020: What Works and What's Next
NLP 2020: What Works and What's NextNLP 2020: What Works and What's Next
NLP 2020: What Works and What's Next
 
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...Efficient Deep Learning in Natural Language Processing Production, with Moshe...
Efficient Deep Learning in Natural Language Processing Production, with Moshe...
 
From Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter DorringtonFrom Customer Emotions to Actionable Insights, with Peter Dorrington
From Customer Emotions to Actionable Insights, with Peter Dorrington
 
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AIIntro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
Intro to Deep Learning for Medical Image Analysis, with Dan Lee from Dentuit AI
 
Emotion AI
Emotion AIEmotion AI
Emotion AI
 
Text Analytics Market Trends
Text Analytics Market TrendsText Analytics Market Trends
Text Analytics Market Trends
 
Text Analytics for NLPers
Text Analytics for NLPersText Analytics for NLPers
Text Analytics for NLPers
 
Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges? Our FinTech Future – AI’s Opportunities and Challenges?
Our FinTech Future – AI’s Opportunities and Challenges?
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Fairness in Machine Learning and AI
Fairness in Machine Learning and AIFairness in Machine Learning and AI
Fairness in Machine Learning and AI
 
Classification with Memes–Uber case study
Classification with Memes–Uber case studyClassification with Memes–Uber case study
Classification with Memes–Uber case study
 
Aspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion AnalysisAspect Detection for Sentiment / Emotion Analysis
Aspect Detection for Sentiment / Emotion Analysis
 
Content AI: From Potential to Practice
Content AI: From Potential to PracticeContent AI: From Potential to Practice
Content AI: From Potential to Practice
 
Text Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's NextText Analytics Market Insights: What's Working and What's Next
Text Analytics Market Insights: What's Working and What's Next
 
An Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and SocialAn Industry Perspective on Subjectivity, Sentiment, and Social
An Industry Perspective on Subjectivity, Sentiment, and Social
 
The Insight Value of Social Sentiment
The Insight Value of Social SentimentThe Insight Value of Social Sentiment
The Insight Value of Social Sentiment
 
Text Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and ProvidersText Analytics 2014: User Perspectives on Solutions and Providers
Text Analytics 2014: User Perspectives on Solutions and Providers
 

Dernier

Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Sheetaleventcompany
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
lizamodels9
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
lizamodels9
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
lizamodels9
 

Dernier (20)

Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptxB.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
B.COM Unit – 4 ( CORPORATE SOCIAL RESPONSIBILITY ( CSR ).pptx
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Majnu Ka Tilla, Delhi Contact Us 8377877756
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort ServiceEluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
Eluru Call Girls Service ☎ ️93326-06886 ❤️‍🔥 Enjoy 24/7 Escort Service
 
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Nelamangala Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
RSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors DataRSA Conference Exhibitor List 2024 - Exhibitors Data
RSA Conference Exhibitor List 2024 - Exhibitors Data
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 

Analysis of ‘Unstructured’ Data

  • 1. Analysis of ‘Unstructured’ Data Seth Grimes Alta Plana Corporation 301-270-0795 -- http://altaplana.com ASA Chicago Chapter Proliferation of Digital Information and Recent Uses in Statistical Applications May 15, 2009
  • 2.
  • 3.
  • 4.
  • 5. Are these data sets? “ Unstructured” data
  • 6. Are these data sets? “ Unstructured” data
  • 7. www.stanford.edu/%7ernusse/wntwindow.html Axin and Frat1 interact with dvl and GSK, bridging Dvl to GSK in Wnt-mediated regulation of LEF-1. Wnt proteins transduce their signals through dishevelled (Dvl) proteins to inhibit glycogen synthase kinase 3beta (GSK), leading to the accumulation of cytosolic beta-catenin and activation of TCF/LEF-1 transcription factors. To understand the mechanism by which Dvl acts through GSK to regulate LEF-1, we investigated the roles of Axin and Frat1 in Wnt-mediated activation of LEF-1 in mammalian cells. We found that Dvl interacts with Axin and with Frat1, both of which interact with GSK. Similarly, the Frat1 homolog GBP binds Xenopus Dishevelled in an interaction that requires GSK. We also found that Dvl, Axin and GSK can form a ternary complex bridged by Axin, and that Frat1 can be recruited into this complex probably by Dvl. The observation that the Dvl-binding domain of either Frat1 or Axin was able to inhibit Wnt-1-induced LEF-1 activation suggests that the interactions between Dvl and Axin and between Dvl and Frat may be important for this signaling pathway. Furthermore, Wnt-1 appeared to promote the disintegration of the Frat1-Dvl-GSK-Axin complex, resulting in the dissociation of GSK from Axin. Thus, formation of the quaternary complex may be an important step in Wnt signaling, by which Dvl recruits Frat1, leading to Frat1-mediated dissociation of GSK from Axin. www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed&cmd=Retrieve&list_uids=10428961&dopt=Abstract More “unstructured” data
  • 8.  
  • 9.  
  • 10.
  • 11. New York Times , September 8, 1957
  • 12. “ Statistical information derived from word frequency and distribution is used by the machine to compute a relative measure of significance, first for individual words and then for sentences. Sentences scoring highest in significance are extracted and printed out to become the auto-abstract.” H.P. Luhn, The Automatic Creation of Literature Abstracts , IBM Journal , 1958.
  • 13.
  • 14. Document input and processing Knowledge handling is key
  • 15.  
  • 16. Text modelling The text content of a document can be considered an unordered “bag of words.” Particular documents are points in a high-dimensional vector space. Salton, Wong & Yang, “A Vector Space Model for Automatic Indexing,” November 1975.
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Search is not the answer Relevance? Concepts? Articles from a forum site Articles from 1987
  • 25.
  • 26.
  • 27. Smarter search Text analytics enables results that suit the information and the user, e.g., answers –
  • 28. Presentation of search results can be enhanced by discovery. This slide and the next show dynamic, clustered search results from Grokker… live.grokker.com/grokker.html?query=text%20analytics&Yahoo=true&Wikipedia=true&numResults=250
  • 29. … with a zoomable display. Clustering here utilizes statistical (text) data mining techniques to identifying cohesive groupings of retrieved documents.
  • 30. More results clustering... A dynamic network viz.: the Touch-Graph Google-Browser applet touchgraph.com/ TGGoogleBrowser.php ?start=text%20analytics
  • 32. Data Mining Text Mining Data Retrieval Information Retrieval Search/Query (goal-oriented) ‏ Discovery (opportunistic) ‏ Fielded Data Documents Based on Je Wei Liang, www.database.cis.nctu.edu.tw/seminars/2003F/TWM/slides/p.ppt Text mining
  • 33. Semantic Search BI Search Data Mining Text Mining Data Retrieval Information Retrieval Search/Query (goal-oriented) Discovery (opportunistic) Fielded Data Documents Where’s Text Analytics? Text analytics
  • 34.
  • 35.
  • 36.
  • 37. New York Times , September 8, 1957 Anaphora / coreference External reference
  • 38.  
  • 39.  
  • 40.  
  • 41. Information extraction When we understand, for instance, parts of speech – <subject> <verb> <object> – we’re in a position to discern facts and relationships. Let's see text augmentation (tagging) in action. We'll use GATE, an open-source tool...
  • 42.  
  • 43.  
  • 44.  
  • 45.  
  • 46.
  • 47.
  • 48. Example: Survey The respondent is invited to explain his/her attitude:
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54. Example: law enforcement An Attensity law- enforcement example – NLP to identify roles and relationships.
  • 56.
  • 57. MEDLINE from the National Center for Biotechnology Information hosts links to many widely used information sources such as the PubMed database of 18 million biomedical journal abstracts. Visit www.ncbi.nlm.nih.gov . Case study: IBM’s MedTAKMI
  • 58.  
  • 59.  
  • 60.
  • 62. ROI measured, planned & achieved
  • 63. Solution providers What should a prospective user look for? Response Percent deep sentiment/opinion extraction 80% ability to use specialized dictionaries or taxonomies 76% broad information extraction capability 60% adaptation for particular sectors, e.g., hospitality, retail, health care, communications 56% predictive-analytics integration 48% BI (business intelligence) integration 48% support for multiple languages 48% ability to create custom workflows 32% low cost 32% hosted or &quot;as a service&quot; option 32% specialized VoC analysis interface 24%
  • 64.

Notes de l'éditeur

  1. This course is, in essence, about the information enterprises have and how they use it and how they could better use it. First we look at enterprise information in light of business goals in order to characterize the “unstructured” information gap. We then look at how that information, or at least the textual variety, may be structured for use. Then we look at a few uses, at enriching search, surely one of today’s killer apps, and at enhancing business intelligence via search.
  2. This course is, in essence, about the information enterprises have and how they use it and how they could better use it. First we look at enterprise information in light of business goals in order to characterize the “unstructured” information gap. We then look at how that information, or at least the textual variety, may be structured for use. Then we look at a few uses, at enriching search, surely one of today’s killer apps, and at enhancing business intelligence via search.
  3. We earlier used a diagram that showed the relationship between search and discovery and operations on fielded data and on free-text documents. We will take those two methods, search and discovery, and add a third, analysis to the picture. In the intersection of search and analysis we have BI search and in the intersection of search and discovery we have semantic search.