SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Unstructured Text in Big Data
The Elephant in the Room
David Milward

ICIC, October 2013
Click toto edit Master title style
Unstructured Big style
Click edit Master titleData

• Big Data
– Volume, Variety, Velocity

80%

• Estimated 80% of all data
is unstructured
– Need to be able to make
decisions based on this
data

• Ever increasing amount of
data makes it harder and
harder to filter by hand
– Need more automation

2

© 2013 Linguamatics Ltd.

20%
Click toto edit Master Data style
From edit Master title style
Click Unstructured title to Structured

Identify
concepts and
relationships
Structured, relational data
Normalized concept
identifiers

Traditional text mining works in batch mode
Unstructured or
Semi-Structured
Content Sources

Linguamatics Confidential

Requires each extraction to be programmed in,
or learnt from a large amount of annotated
material
Click Agile Text Mining
I2E to edit Master title style
Click to edit Master title style

Querying

Results

Information
Consumers

Confidential
Click toto edit Master title style
Increasing Range style
Click edit Master titleOf Applications

Sentiment
Analysis in
Social
Media
Vocabulary
Development

Mining
Conference
FDA Drug
In the beginning...
Labels Abstract

Mining
Electronic
Medical
Records

Mining

SAR Competitive
ProteinExtractionIntelligenceProtein
Target
Interactions

Identification
&
Prioritization

Safety/Tox

Confidential

Plant
Metabolic
Pathways

Patent
Analysis
Extracting
Chemical Numerical
Search
and
Experimental
Data

Biomarker
Discovery
In-licensing
Opportunities

Key Opinion
Leader
Identification Patent FTO

Systems
Biology

Drug
Repositioning
Click toto edit Master title style
The edit Master title style
ClickDatabase Approach
• Can we predetermine what is required?
• One approach is to try to extract everything … but in practice have
to make assumptions, whether by hand or automatically
– Do we include negative statements?
– Do we include speculation?
– What context do we include?

• Need a good idea of how the data will be used to know what needs
to be extracted
• There are cases where this works well e.g.
– have useful structured data and want to add to it from free text
– have new records structured, but want to bring in information from legacy
free text
6

© 2013 Linguamatics Ltd.
Click toto edit Master title style
Extracting for a Relational Database
Click edit Master title style

• Extracting all relevant information from pathology reports

7

© 2013 Linguamatics Ltd.
Click toto edit Master title style
Extracting for a Semantic
Click edit Master title style Store

• Output as a set of triples

• Because all the parts have URIs, the parts are webaddressable
Treat
Peptide

Psoriasis

Express
Associate

8

© 2013 Linguamatics Ltd.

Web

Infectious
Disease
Click toto Hoc Approach
The edit Master title style
ClickAd edit Master title style
• No need to build a database
• Regard text mining queries as ways to create a database on
the fly

• Keep all the unstructured free text, and query for what you
need when you need it
• Even index structured data with text mining to link
information together
• Used very successfully by knowledge professionals:
– Just like a search engine, there are thousands of questions people
want to ask of the data
– You can’t predict them all beforehand
9

© 2013 Linguamatics Ltd.
ClickHoc Querying style style
Ad to edit Master title
Click to edit Master title
• Find relationships that you want
for any particular information
request

• Treat the text mining as an
“agile” database to answer
thousands of different questions
e.g. metabolites from fruit

10

© 2013 Linguamatics Ltd.
ClickWe NeedMaster title style
Do to edit Master title style
Click to edit More?

• I2E Agile Text Mining has opened up text mining to the end
user, but it is still mainly used by knowledge professionals
• Given the importance of unstructured big data, can we bring
the benefits of text mining to an even wider audience?

11

© 2013 Linguamatics Ltd.
Click toto edit Master title style
Workflow Automation
Click edit Master title style
• Successful queries converted into regular
workflows using Pipeline Pilot or KNIME
• Real-time workflows for up-to-date dashboards,
alerts
• Further analysis, visualization, integration with
other data

Pipeline Pilot

12
Click toto Trials Analysis from I2E
Clinical edit Master title
Click edit Master title style style

Text Mining Results
Colours
indicate
Phase

Using Pipeline
Pilot visualization

Dates
13

© 2013 Linguamatics Ltd.
Click toto edit I2Etitle style Web Apps
Embedding Master title
Click edit Master within style

10/21/2013
14
4734_8.pptx
Click toto edit Master title style
Click edit Master title style
Form-Based Querying

• Create web interfaces
connecting to I2E server
• Information
professionals develop
sophisticated queries

• Queries are
parameterized to allow
end users to customize
• Javascript allows
portability to mobile
devices such as iPads
Click toto edit Master title style
Enhancing Enterprise
Click edit Master title styleSearch

• Use text mining to annotate concepts to feed into a search
engine to:
– provide concept search
– provide concepts for facets
– provide intelligent thumbnails for documents, pulling out key
information

16

© 2013 Linguamatics Ltd.
Value

Click toto edit Master titledocuments
Semantic annotation
Click edit Master title styleof style

Relationships

Concepts

Keywords
Search Engine

NLP-based Text Mining
Click toto edit Master title style
Provide Concepts style
Click edit Master titleas Facets for

Enterprise Search

Document
Refine the search
information and
based on
teaser
terminologies

18

© 2013 Linguamatics Ltd.

Document preview
and additional
information
Click toto edit Master title style
Dictionary Matching Click edit Master title style Companies
Click toto edit MasterInstitutions
Pattern Matching title
Click edit Master title -style style

• Not a fixed list: can find previously unknown institutions
Click toto edit MasterMutations
Pattern Matching title
Click edit Master title -style style
Click toto edit MasterChemicals
Pattern Matching title
Click edit Master title -style style
• Assign structure to novel chemicals
• Distinguish exemplified compounds
• Distinguish compounds given properties
Click toto edit Master title style
Whatever the title style
Click edit MasterContent...
• ... I2E can mine and extract with precision
Scientific literature

Twitter
Click totoWherever style style
…. and edit Master
Click edit Master title title

Internal Content

MEDLINE Patents

Drug Labels
Clinical Trials
Content
Provider

Publisher

I2E OnDemand

I2E Enterprise
I2E Platform

External
Content

I2E Platform

• I2E 4.1 Linked Servers
• Users can query local information or information on the cloud
• Proprietary content can reside within Enterprise
• Standard sources e.g. patents can be hosted on the cloud, and shared by all users
• Data from content providers can reside on their sites
24
Click toto edit Master title style
Bringing Master title style
Click editthe Elephant Down to

Size

• Data warehouses for data that you know you need

• Embedding text mining in other applications e.g. Enterprise
Search to provide benefits of semantic search
• Building specialized new interfaces to provide self-service
applications for end-users
• Continuing role for ad hoc text mining
– text mining can ask almost any question of the data
– you can never predict every question

25

© 2013 Linguamatics Ltd.

Contenu connexe

Tendances

Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Pistoia Alliance
 
Security and Compliance in Office 365
Security and Compliance in Office 365Security and Compliance in Office 365
Security and Compliance in Office 365Joel Jeffery
 
Reproducible machine learning
Reproducible machine learningReproducible machine learning
Reproducible machine learningStephanie Locke
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureTim Williams
 
Recognizing the Standalone, and Reliable Journals
Recognizing the Standalone, and Reliable JournalsRecognizing the Standalone, and Reliable Journals
Recognizing the Standalone, and Reliable JournalsNabeel Salih Ali
 
Empowering the business for eDiscovery in Office 365 - BRK2112
Empowering the business for eDiscovery in Office 365 - BRK2112Empowering the business for eDiscovery in Office 365 - BRK2112
Empowering the business for eDiscovery in Office 365 - BRK2112Joanne Klein
 
Presentation forpd bj_1
Presentation forpd bj_1Presentation forpd bj_1
Presentation forpd bj_1Maori Ito
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with searchJean Graef
 
Modeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivityModeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivityJean Graef
 
Sharepoint metadata workshop
Sharepoint metadata workshopSharepoint metadata workshop
Sharepoint metadata workshopSam Marshall
 

Tendances (13)

Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
 
SciBite
SciBiteSciBite
SciBite
 
IC-SDV 2019: OntoChem
IC-SDV 2019: OntoChemIC-SDV 2019: OntoChem
IC-SDV 2019: OntoChem
 
Security and Compliance in Office 365
Security and Compliance in Office 365Security and Compliance in Office 365
Security and Compliance in Office 365
 
Reproducible machine learning
Reproducible machine learningReproducible machine learning
Reproducible machine learning
 
Knowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data FutureKnowledge Graphs : Shaping Our Data Future
Knowledge Graphs : Shaping Our Data Future
 
Recognizing the Standalone, and Reliable Journals
Recognizing the Standalone, and Reliable JournalsRecognizing the Standalone, and Reliable Journals
Recognizing the Standalone, and Reliable Journals
 
Empowering the business for eDiscovery in Office 365 - BRK2112
Empowering the business for eDiscovery in Office 365 - BRK2112Empowering the business for eDiscovery in Office 365 - BRK2112
Empowering the business for eDiscovery in Office 365 - BRK2112
 
Presentation forpd bj_1
Presentation forpd bj_1Presentation forpd bj_1
Presentation forpd bj_1
 
Using metadata repositories with search
Using metadata repositories with searchUsing metadata repositories with search
Using metadata repositories with search
 
Modeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivityModeling & managing metadata for greater productivity
Modeling & managing metadata for greater productivity
 
Sharepoint metadata workshop
Sharepoint metadata workshopSharepoint metadata workshop
Sharepoint metadata workshop
 

En vedette

ICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNDr. Haxel Consult
 
ICIC 2013 New Product Introductions Stellarix
ICIC 2013 New Product Introductions StellarixICIC 2013 New Product Introductions Stellarix
ICIC 2013 New Product Introductions StellarixDr. Haxel Consult
 
ICIC 2013 New Product Introductions FIZ Karlsruhe
ICIC 2013 New Product Introductions FIZ KarlsruheICIC 2013 New Product Introductions FIZ Karlsruhe
ICIC 2013 New Product Introductions FIZ KarlsruheDr. Haxel Consult
 
ICIC 2013 New Product Introductions Parthys Reverse Informatics
ICIC 2013 New Product Introductions Parthys Reverse InformaticsICIC 2013 New Product Introductions Parthys Reverse Informatics
ICIC 2013 New Product Introductions Parthys Reverse InformaticsDr. Haxel Consult
 
ICIC 2013 Conference Proceedings Jutta Hausser EPO
ICIC 2013 Conference Proceedings Jutta Hausser EPOICIC 2013 Conference Proceedings Jutta Hausser EPO
ICIC 2013 Conference Proceedings Jutta Hausser EPODr. Haxel Consult
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTDr. Haxel Consult
 

En vedette (7)

ICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STNICIC 2013 Conference Proceedings Jan Baur STN
ICIC 2013 Conference Proceedings Jan Baur STN
 
ICIC 2013 New Product Introductions Stellarix
ICIC 2013 New Product Introductions StellarixICIC 2013 New Product Introductions Stellarix
ICIC 2013 New Product Introductions Stellarix
 
ICIC 2013 New Product Introductions FIZ Karlsruhe
ICIC 2013 New Product Introductions FIZ KarlsruheICIC 2013 New Product Introductions FIZ Karlsruhe
ICIC 2013 New Product Introductions FIZ Karlsruhe
 
ICIC 2013 New Product Introductions Parthys Reverse Informatics
ICIC 2013 New Product Introductions Parthys Reverse InformaticsICIC 2013 New Product Introductions Parthys Reverse Informatics
ICIC 2013 New Product Introductions Parthys Reverse Informatics
 
ICIC 2013 Conference Proceedings Jutta Hausser EPO
ICIC 2013 Conference Proceedings Jutta Hausser EPOICIC 2013 Conference Proceedings Jutta Hausser EPO
ICIC 2013 Conference Proceedings Jutta Hausser EPO
 
ICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPTICIC 2013 New Product Introductions CEPT
ICIC 2013 New Product Introductions CEPT
 
II-SDV 2016 Linguamatics
II-SDV 2016 LinguamaticsII-SDV 2016 Linguamatics
II-SDV 2016 Linguamatics
 

Similaire à ICIC 2013 Conference Proceedings David Milward Linguamatics

II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...
II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...
II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...Dr. Haxel Consult
 
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Open Data Group
 
How to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyHow to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyAccess Innovations, Inc.
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?SAS Canada
 
conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...
conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...
conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...Concept Searching, Inc
 
Model Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinModel Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinEmbarcadero Technologies
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarDatameer
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarConcept Searching, Inc
 
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy Webinar
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy WebinarThe Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy Webinar
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy WebinarConcept Searching, Inc
 
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...Concept Searching, Inc
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersRevolution Analytics
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010ERwin Modeling
 
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Henry Ong
 
Harnessing search engines for KM
Harnessing search engines for KMHarnessing search engines for KM
Harnessing search engines for KMInvotra
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? Datameer
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementDATAVERSITY
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEOIXIASOFT
 
data_blending
data_blendingdata_blending
data_blendingsubit1615
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo UnstructuredCambridge Semantics
 
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...Jonathan Ralton
 

Similaire à ICIC 2013 Conference Proceedings David Milward Linguamatics (20)

II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...
II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...
II-SDV 2014 The Road to Federated Text Mining: Are we there yet? (Guy Singh -...
 
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018 Comcast Labs Connect - PHLAI Conference Philadelphia 2018
Comcast Labs Connect - PHLAI Conference Philadelphia 2018
 
How to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content AutomaticallyHow to Apply Your Taxonomy to Your Content Automatically
How to Apply Your Taxonomy to Your Content Automatically
 
What is the Value of SAS Analytics?
What is the Value of SAS Analytics?What is the Value of SAS Analytics?
What is the Value of SAS Analytics?
 
conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...
conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...
conceptTermStoreManager – The Native SharePoint Utility to Manage Term Sets W...
 
Model Confidence for Master Data with David Loshin
Model Confidence for Master Data with David LoshinModel Confidence for Master Data with David Loshin
Model Confidence for Master Data with David Loshin
 
Analyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop WebinarAnalyzing Unstructured Data in Hadoop Webinar
Analyzing Unstructured Data in Hadoop Webinar
 
Climbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations WebinarClimbing the Slippery Slope of SharePoint Migrations Webinar
Climbing the Slippery Slope of SharePoint Migrations Webinar
 
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy Webinar
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy WebinarThe Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy Webinar
The Nuts and Bolts of Metadata Tagging and Taxonomies Made Easy Webinar
 
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
SharePoint Saturday London - The Nuts and Bolts of Metadata Tagging and Taxon...
 
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster AnswersR+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
R+Hadoop - Ask Bigger (and New) Questions and Get Better, Faster Answers
 
Creating enterprise standards 09302010
Creating enterprise standards 09302010Creating enterprise standards 09302010
Creating enterprise standards 09302010
 
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
Playing Tag: Managed Metadata and Taxonomies in SharePoint 2010
 
Harnessing search engines for KM
Harnessing search engines for KMHarnessing search engines for KM
Harnessing search engines for KM
 
BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics? BI, Hive or Big Data Analytics?
BI, Hive or Big Data Analytics?
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data Management
 
DITA and SEO
DITA and SEODITA and SEO
DITA and SEO
 
data_blending
data_blendingdata_blending
data_blending
 
Introduction to Anzo Unstructured
Introduction to Anzo UnstructuredIntroduction to Anzo Unstructured
Introduction to Anzo Unstructured
 
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...NHSPUG June 2015  - Must Love Term Sets: The New and Improved Managed Metadat...
NHSPUG June 2015 - Must Love Term Sets: The New and Improved Managed Metadat...
 

Plus de Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult
 

Plus de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementAI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management
 
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...
 
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...
 
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...
 
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...
 
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...
 
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...AI-SDV 2022: Machine learning based patent categorization: A success story in...
AI-SDV 2022: Machine learning based patent categorization: A success story in...
 
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...
 
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...
 
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...
 
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...
 
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...
 
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...
 
AI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance CenterAI-SDV 2022: Copyright Clearance Center
AI-SDV 2022: Copyright Clearance Center
 
AI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IPAI-SDV 2022: Lighthouse IP
AI-SDV 2022: Lighthouse IP
 
AI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOCAI-SDV 2022: New Product Introductions: CENTREDOC
AI-SDV 2022: New Product Introductions: CENTREDOC
 
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...
 
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...
 

Dernier

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Dernier (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

ICIC 2013 Conference Proceedings David Milward Linguamatics

  • 1. Unstructured Text in Big Data The Elephant in the Room David Milward ICIC, October 2013
  • 2. Click toto edit Master title style Unstructured Big style Click edit Master titleData • Big Data – Volume, Variety, Velocity 80% • Estimated 80% of all data is unstructured – Need to be able to make decisions based on this data • Ever increasing amount of data makes it harder and harder to filter by hand – Need more automation 2 © 2013 Linguamatics Ltd. 20%
  • 3. Click toto edit Master Data style From edit Master title style Click Unstructured title to Structured Identify concepts and relationships Structured, relational data Normalized concept identifiers Traditional text mining works in batch mode Unstructured or Semi-Structured Content Sources Linguamatics Confidential Requires each extraction to be programmed in, or learnt from a large amount of annotated material
  • 4. Click Agile Text Mining I2E to edit Master title style Click to edit Master title style Querying Results Information Consumers Confidential
  • 5. Click toto edit Master title style Increasing Range style Click edit Master titleOf Applications Sentiment Analysis in Social Media Vocabulary Development Mining Conference FDA Drug In the beginning... Labels Abstract Mining Electronic Medical Records Mining SAR Competitive ProteinExtractionIntelligenceProtein Target Interactions Identification & Prioritization Safety/Tox Confidential Plant Metabolic Pathways Patent Analysis Extracting Chemical Numerical Search and Experimental Data Biomarker Discovery In-licensing Opportunities Key Opinion Leader Identification Patent FTO Systems Biology Drug Repositioning
  • 6. Click toto edit Master title style The edit Master title style ClickDatabase Approach • Can we predetermine what is required? • One approach is to try to extract everything … but in practice have to make assumptions, whether by hand or automatically – Do we include negative statements? – Do we include speculation? – What context do we include? • Need a good idea of how the data will be used to know what needs to be extracted • There are cases where this works well e.g. – have useful structured data and want to add to it from free text – have new records structured, but want to bring in information from legacy free text 6 © 2013 Linguamatics Ltd.
  • 7. Click toto edit Master title style Extracting for a Relational Database Click edit Master title style • Extracting all relevant information from pathology reports 7 © 2013 Linguamatics Ltd.
  • 8. Click toto edit Master title style Extracting for a Semantic Click edit Master title style Store • Output as a set of triples • Because all the parts have URIs, the parts are webaddressable Treat Peptide Psoriasis Express Associate 8 © 2013 Linguamatics Ltd. Web Infectious Disease
  • 9. Click toto Hoc Approach The edit Master title style ClickAd edit Master title style • No need to build a database • Regard text mining queries as ways to create a database on the fly • Keep all the unstructured free text, and query for what you need when you need it • Even index structured data with text mining to link information together • Used very successfully by knowledge professionals: – Just like a search engine, there are thousands of questions people want to ask of the data – You can’t predict them all beforehand 9 © 2013 Linguamatics Ltd.
  • 10. ClickHoc Querying style style Ad to edit Master title Click to edit Master title • Find relationships that you want for any particular information request • Treat the text mining as an “agile” database to answer thousands of different questions e.g. metabolites from fruit 10 © 2013 Linguamatics Ltd.
  • 11. ClickWe NeedMaster title style Do to edit Master title style Click to edit More? • I2E Agile Text Mining has opened up text mining to the end user, but it is still mainly used by knowledge professionals • Given the importance of unstructured big data, can we bring the benefits of text mining to an even wider audience? 11 © 2013 Linguamatics Ltd.
  • 12. Click toto edit Master title style Workflow Automation Click edit Master title style • Successful queries converted into regular workflows using Pipeline Pilot or KNIME • Real-time workflows for up-to-date dashboards, alerts • Further analysis, visualization, integration with other data Pipeline Pilot 12
  • 13. Click toto Trials Analysis from I2E Clinical edit Master title Click edit Master title style style Text Mining Results Colours indicate Phase Using Pipeline Pilot visualization Dates 13 © 2013 Linguamatics Ltd.
  • 14. Click toto edit I2Etitle style Web Apps Embedding Master title Click edit Master within style 10/21/2013 14 4734_8.pptx
  • 15. Click toto edit Master title style Click edit Master title style Form-Based Querying • Create web interfaces connecting to I2E server • Information professionals develop sophisticated queries • Queries are parameterized to allow end users to customize • Javascript allows portability to mobile devices such as iPads
  • 16. Click toto edit Master title style Enhancing Enterprise Click edit Master title styleSearch • Use text mining to annotate concepts to feed into a search engine to: – provide concept search – provide concepts for facets – provide intelligent thumbnails for documents, pulling out key information 16 © 2013 Linguamatics Ltd.
  • 17. Value Click toto edit Master titledocuments Semantic annotation Click edit Master title styleof style Relationships Concepts Keywords Search Engine NLP-based Text Mining
  • 18. Click toto edit Master title style Provide Concepts style Click edit Master titleas Facets for Enterprise Search Document Refine the search information and based on teaser terminologies 18 © 2013 Linguamatics Ltd. Document preview and additional information
  • 19. Click toto edit Master title style Dictionary Matching Click edit Master title style Companies
  • 20. Click toto edit MasterInstitutions Pattern Matching title Click edit Master title -style style • Not a fixed list: can find previously unknown institutions
  • 21. Click toto edit MasterMutations Pattern Matching title Click edit Master title -style style
  • 22. Click toto edit MasterChemicals Pattern Matching title Click edit Master title -style style • Assign structure to novel chemicals • Distinguish exemplified compounds • Distinguish compounds given properties
  • 23. Click toto edit Master title style Whatever the title style Click edit MasterContent... • ... I2E can mine and extract with precision Scientific literature Twitter
  • 24. Click totoWherever style style …. and edit Master Click edit Master title title Internal Content MEDLINE Patents Drug Labels Clinical Trials Content Provider Publisher I2E OnDemand I2E Enterprise I2E Platform External Content I2E Platform • I2E 4.1 Linked Servers • Users can query local information or information on the cloud • Proprietary content can reside within Enterprise • Standard sources e.g. patents can be hosted on the cloud, and shared by all users • Data from content providers can reside on their sites 24
  • 25. Click toto edit Master title style Bringing Master title style Click editthe Elephant Down to Size • Data warehouses for data that you know you need • Embedding text mining in other applications e.g. Enterprise Search to provide benefits of semantic search • Building specialized new interfaces to provide self-service applications for end-users • Continuing role for ad hoc text mining – text mining can ask almost any question of the data – you can never predict every question 25 © 2013 Linguamatics Ltd.