II-SDV 2015, 20 - 21 April, in Nice

•

1 j'aime•1,768 vues

Dr. Haxel Consult

Internet

Text and Data Mining at CCC
Solving the Content Retrieval and Licensing Conundrums for TDM
Dr. Haralambos Marmanis
CTO & VP, Engineering
Copyright Clearance Center

Making Copyright Work – CCC and RightsDirect
Rightsholders Content Users
600+ million rights
from:
• Publishers
• Authors
• Creators
• 35,000 companies
• Employees
worldwide
• Users in 180
countries
• Licensing
Solutions
• Rights
Management
• Content Delivery
• Copyright
Education
4/22/2015

What Is Text and Data Mining?
• Automate the extraction of “Entities” from Text
• Find Relationships and Patterns
• Produce hypotheses of interest
• Drive decision making
4/22/20155

Applications
• Biomarker discovery
• Drug repurposing
• Drug safety
• Competitive intelligence
• Sentiment analysis
• …….
4/22/20156

The General Problem & Our Solution
Through An Example
4/22/20157

“Drug Discovery” Process
• Goal: Develop new treatments for diseases
through hypothesis formation.
• Methodology:
– Keyword/Database Searching
– Review Literature
– Find relationships
– Develop hypothesis
– Test
– Product development
Etc.
4/22/20158

General Overview of the Process
1. Identify a set of resources that are relevant to a
particular research objective
2. Analyze and extract information specific to the
research objective
3. Develop and explore the various relations between
extracted objects of interest
4/22/20159

Data Processing Workflow:
Information Retrieval and Knowledge Discovery
4/22/201510 *http://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining
Software Platforms for TDM
Information
Retrieval
Knowledge
Discovery

Problem: Too Much Research
• 53M Records in Scopus
• 800,000 Journal Articles published per year
4/22/201511

More Problems…
• Many sources of content
• Many formats
• Difficult to obtain full-text in XML
• Difficult to integrate content into TDM software.
• Hard to negotiate and manage licenses and feeds from
all publishers.
4/22/201512

The DirectPath Solution
• Speed up time to obtain properly licensed content for
text mining
• Discover and download full-text in XML, not just
abstracts
• Main corpus includes Subscribed and Not-Subscribed
content
• Normalize XML format across many publishers
• Provide a Web UI and RESTful API services
4/22/201513

4/22/201514
2. Researchers create
content sets by using
search or other
discovery criteria
XML
Article
corpus
TDM Software
3. Researchers slice and
dice results and identify
an appropriate corpus for
their project
4. XML corpus
can be
imported into
various TDM
tools
1. Publishers
provide
content
and rights
<XML>
<XML>
<XML>
Publishers Researchers

RESTful Services Based on Open Standards
4/22/201522

Unique Features
• Custom analysis/indexing for each Project
– Custom stop-word lists; synonyms/dictionaries
– Custom analyzers
– The finest granularity at the analysis and indexing level
• Build by design with multilingual support in mind
– Based on Lucene
• Search beyond TFIDF (e.g. document ranking by citation)
• Retrieval beyond Search (e.g. nearest neighbors)
• Cost and Quality Optimization (roadmap/patent pending)
• Integration with text mining tools like Linguamatics I2E
4/22/201524

TDM Product Roadmap
• Augment and Enrich the Inventory
• Workflow Integrations with 3rd Party Support
• Expand and enhance Metadata Normalization
• Introduce Content Metrics for Retrieval
• Cost Optimization
• Information Content Optimization
4/22/201525

Contenu connexe

Tendances

II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult

Optimising Content Spending with AnalyticsDr. Haxel Consult

II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult

II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web SearchDr. Haxel Consult

ICIC 2017: Publication Analysis and Publication Strategy Dr. Haxel Consult

New Product Introductions - MinesoftDr. Haxel Consult

II-SDV 2016 VantagePointDr. Haxel Consult

RightsDirektDr. Haxel Consult

II-SDV 2016 Questel IntellixirDr. Haxel Consult

II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult

ICIC 2017: New product presentation minesoftDr. Haxel Consult

ICIC 2017: Product presentations FIZ KarlsruheDr. Haxel Consult

The Enterprise Search Market in a NutshellDr. Haxel Consult

ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...Dr. Haxel Consult

II-SDV 2016 Aalt van de Kuilen - The Art of Patent LandscapingDr. Haxel Consult

ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities Dr. Haxel Consult

ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...Dr. Haxel Consult

ICIC 2017: New product presentationsLighthouse IPDr. Haxel Consult

Smart Data Applications powered by the Wikidata Knowledge GraphPeter Haase

II-SDV 2016 IRIX Software EngineeringDr. Haxel Consult

Tendances (20)

II-SDV 2015, 20 - 21 April, in Nice

Optimising Content Spending with Analytics

II-SDV 2015, 20 - 21 April, in Nice

II-SDV 2016 Aleksandar Kapisoda, Klaus Kater - Deep Web Search

ICIC 2017: Publication Analysis and Publication Strategy

New Product Introductions - Minesoft

II-SDV 2016 VantagePoint

RightsDirekt

II-SDV 2016 Questel Intellixir

II-SDV 2015, 20 - 21 April, in Nice

ICIC 2017: New product presentation minesoft

ICIC 2017: Product presentations FIZ Karlsruhe

The Enterprise Search Market in a Nutshell

ICIC 2017: Technology Scouting: Decision Support in Strategic Analyses for Te...

II-SDV 2016 Aalt van de Kuilen - The Art of Patent Landscaping

ICIC 2014 Chemical Patent Curation and Management – New Tools and Capabilities

ICIC 2017: Building a Linked Data Knowledge Graph for the Scholarly Publishin...

ICIC 2017: New product presentationsLighthouse IP

Smart Data Applications powered by the Wikidata Knowledge Graph

II-SDV 2016 IRIX Software Engineering

En vedette

II-SDV 2015, 20 - 21 April, in NiceDr. Haxel Consult

II-SDV 2015, 21 - 21 April, in NiceDr. Haxel Consult

II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014 Dr. Haxel Consult

II-SDV 2016 Deep SEARCH 9Dr. Haxel Consult

II-SDV 2016 CentredocDr. Haxel Consult

II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...Dr. Haxel Consult

II-SDV 2016 GQ Life SciencesDr. Haxel Consult

II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...Dr. Haxel Consult

II-SDV 2016 GRIDLOGICSDr. Haxel Consult

II-SDV 2016 LinguamaticsDr. Haxel Consult

II-SDV 2016 Bob Stembridge We have all the Time in the World; a Review of ho...Dr. Haxel Consult

En vedette (17)

II-SDV 2015, 20 - 21 April, in Nice

II-SDV 2015, 21 - 21 April, in Nice

II-SDV Emmanuelle Fortune - SMEs as Patent Applicants in France in 2014

II-SDV 2016 Deep SEARCH 9

II-SDV 2016 Centredoc

II-SDV 2016 Stefan Geißler Navigating complex information landscapes – Semant...

II-SDV 2016 GQ Life Sciences

II-SDV 2016 Denis Bayada - Concomitant Ontology-Driven Patent and Non-Patent ...

II-SDV 2016 GRIDLOGICS

II-SDV 2016 Linguamatics

II-SDV 2016 Bob Stembridge We have all the Time in the World; a Review of ho...

Similaire à II-SDV 2015, 20 - 21 April, in Nice

Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...Chris Shillum

Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Frank Oellien

Building blocks for success: criteria for trusted institutional repositoriesAcademy of Science of South Africa (ASSAf)

ORCID - UK PIDs for Open Access - progress updateJisc

MOBILE DEVICE FORENSICS USING NLPAnkita Jadhao

OpenChain at EOLE 2017Shane Coughlan

Common Protocol Template Executive SummaryTransCelerateBioPharma

Online Journal Management using Open Journal Systems (OJS)Ina Smith

ufsojs-161024084446 (1).pdfTeshome Oljira

Webinar@AIMS on RIOXXAIMS (Agricultural Information Management Standards)

OpenKM commercialgpalmerpujol

Supporting the uptake of TDMopenminted_eu

What Do Records Managers Need to Know About Open Source, Open Standards, Open...Cheryl McKinnon

What You Need to Know Before Upgrading to SharePoint 2013Perficient, Inc.

Software management plans in research softwareShoaib Sufi

CRC-STC May 2013 Summit Presentationcrcstc

Building blocks for success: criteria for trusted institutional repositoriesIna Smith

Introduction to Competitive Intelligence PortalsComintelli

Climbing the Slippery Slope of SharePoint Migrations WebinarConcept Searching, Inc

Similaire à II-SDV 2015, 20 - 21 April, in Nice (20)

Presentation from ALA Midwinter 2014 on Elsevier's new Text and Data Mining P...

Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)

Building blocks for success: criteria for trusted institutional repositories

ORCID - UK PIDs for Open Access - progress update

MOBILE DEVICE FORENSICS USING NLP

OpenChain at EOLE 2017

Common Protocol Template Executive Summary

Online Journal Management using Open Journal Systems (OJS)

ufsojs-161024084446 (1).pdf

Webinar@AIMS on RIOXX

OpenKM commercial

Supporting the uptake of TDM

What Do Records Managers Need to Know About Open Source, Open Standards, Open...

What You Need to Know Before Upgrading to SharePoint 2013

Software management plans in research software

CRC-STC May 2013 Summit Presentation

Building blocks for success: criteria for trusted institutional repositories

Introduction to Competitive Intelligence Portals

Climbing the Slippery Slope of SharePoint Migrations Webinar

Plus de Dr. Haxel Consult

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering ManagementDr. Haxel Consult

AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...Dr. Haxel Consult

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...Dr. Haxel Consult

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...Dr. Haxel Consult

AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...Dr. Haxel Consult

AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...Dr. Haxel Consult

AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...Dr. Haxel Consult

AI-SDV 2022: Machine learning based patent categorization: A success story in...Dr. Haxel Consult

AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...Dr. Haxel Consult

AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...Dr. Haxel Consult

AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...Dr. Haxel Consult

AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...Dr. Haxel Consult

AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...Dr. Haxel Consult

AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...Dr. Haxel Consult

AI-SDV 2022: Copyright Clearance CenterDr. Haxel Consult

AI-SDV 2022: Lighthouse IPDr. Haxel Consult

AI-SDV 2022: New Product Introductions: CENTREDOCDr. Haxel Consult

AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...Dr. Haxel Consult

AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...Dr. Haxel Consult

Plus de Dr. Haxel Consult (20)

AI-SDV 2022: Henry Chang Patent Intelligence and Engineering Management

AI-SDV 2022: Creation and updating of large Knowledge Graphs through NLP Anal...

AI-SDV 2022: The race to net zero: Tracking the green industrial revolution t...

AI-SDV 2022: Accommodating the Deep Learning Revolution by a Development Proc...

AI-SDV 2022: Domain Knowledge makes Artificial Intelligence Smart Linda Ander...

AI-SDV 2022: Embedding-based Search Vs. Relevancy Search: comparing the new w...

AI-SDV 2022: Rolling out web crawling at Boehringer Ingelheim - 10 years of e...

AI-SDV 2022: Machine learning based patent categorization: A success story in...

AI-SDV 2022: Finding the WHAT – Will AI help? Nils Newman (Search Technology,...

AI-SDV 2022: New Insights from Trademarks with Natural Language Processing Al...

AI-SDV 2022: Extracting information from tables in documents Holger Keibel (K...

AI-SDV 2022: Scientific publishing in the age of data mining and artificial i...

AI-SDV 2022: AI developments and usability Linus Wretblad (IPscreener / Uppdr...

AI-SDV 2022: Where’s the one about…? Looney Tunes® Revisited Jay Ven Eman (CE...

AI-SDV 2022: Lighthouse IP

AI-SDV 2022: New Product Introductions: CENTREDOC

AI-SDV 2022: Possibilities and limitations of AI-boosted multi-categorization...

AI-SDV 2022: Big data analytics platform at Bayer – Turning bits into insight...

Dernier

Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan

young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书rnrncn29

Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb

SCM Symposium PPT Format Customer loyalty is predieusebiomeyer

NSX-T and Service Interfaces presentationMarko4394

Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco

Font Performance - NYC WebPerf Meetup April '24Paul Calvano

Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29

PHP-based rendering of TYPO3 DocumentationLinaWolf1

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan

Film cover research (1).pptxsdasdasdasdasdasa494f574xmv

Contact Rya Baby for Call Girls New Delhimiss dipika

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss

Dernier (17)

Call Girls Near The Suryaa Hotel New Delhi 9873777170

young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service

『澳洲文凭』买詹姆士库克大学毕业证书成绩单办理澳洲JCU文凭学位证书

Top 10 Interactive Website Design Trends in 2024.pptx

Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作

SCM Symposium PPT Format Customer loyalty is predi

NSX-T and Service Interfaces presentation

Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service

办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书

Font Performance - NYC WebPerf Meetup April '24

Q4-1-Illustrating-Hypothesis-Testing.pptx

『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书

PHP-based rendering of TYPO3 Documentation

Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170

Film cover research (1).pptxsdasdasdasdasdasa

Contact Rya Baby for Call Girls New Delhi

办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一

II-SDV 2015, 20 - 21 April, in Nice

1. Text and Data Mining at CCC Solving the Content Retrieval and Licensing Conundrums for TDM Dr. Haralambos Marmanis CTO & VP, Engineering Copyright Clearance Center

2. Introduction 4/22/20152

3. Making Copyright Work – CCC and RightsDirect Rightsholders Content Users 600+ million rights from: • Publishers • Authors • Creators • 35,000 companies • Employees worldwide • Users in 180 countries • Licensing Solutions • Rights Management • Content Delivery • Copyright Education 4/22/2015

4. Who Am I? 4/22/20154

5. What Is Text and Data Mining? • Automate the extraction of “Entities” from Text • Find Relationships and Patterns • Produce hypotheses of interest • Drive decision making 4/22/20155

6. Applications • Biomarker discovery • Drug repurposing • Drug safety • Competitive intelligence • Sentiment analysis • ……. 4/22/20156

7. The General Problem & Our Solution Through An Example 4/22/20157

8. “Drug Discovery” Process • Goal: Develop new treatments for diseases through hypothesis formation. • Methodology: – Keyword/Database Searching – Review Literature – Find relationships – Develop hypothesis – Test – Product development Etc. 4/22/20158

9. General Overview of the Process 1. Identify a set of resources that are relevant to a particular research objective 2. Analyze and extract information specific to the research objective 3. Develop and explore the various relations between extracted objects of interest 4/22/20159

10. Data Processing Workflow: Information Retrieval and Knowledge Discovery 4/22/201510 *http://www.jisc.ac.uk/reports/value-and-benefits-of-text-mining Software Platforms for TDM Information Retrieval Knowledge Discovery

11. Problem: Too Much Research • 53M Records in Scopus • 800,000 Journal Articles published per year 4/22/201511

12. More Problems… • Many sources of content • Many formats • Difficult to obtain full-text in XML • Difficult to integrate content into TDM software. • Hard to negotiate and manage licenses and feeds from all publishers. 4/22/201512

13. The DirectPath Solution • Speed up time to obtain properly licensed content for text mining • Discover and download full-text in XML, not just abstracts • Main corpus includes Subscribed and Not-Subscribed content • Normalize XML format across many publishers • Provide a Web UI and RESTful API services 4/22/201513

14. 4/22/201514 2. Researchers create content sets by using search or other discovery criteria XML Article corpus TDM Software 3. Researchers slice and dice results and identify an appropriate corpus for their project 4. XML corpus can be imported into various TDM tools 1. Publishers provide content and rights <XML> <XML> <XML> Publishers Researchers

15. Application Walkthrough 4/22/201515

16. 4/22/201516

17. 4/22/201517

18. 4/22/201518

19. 4/22/201519

20. 4/22/201520

21. 4/22/201521

22. RESTful Services Based on Open Standards 4/22/201522

23. 4/22/201523

24. Unique Features • Custom analysis/indexing for each Project – Custom stop-word lists; synonyms/dictionaries – Custom analyzers – The finest granularity at the analysis and indexing level • Build by design with multilingual support in mind – Based on Lucene • Search beyond TFIDF (e.g. document ranking by citation) • Retrieval beyond Search (e.g. nearest neighbors) • Cost and Quality Optimization (roadmap/patent pending) • Integration with text mining tools like Linguamatics I2E 4/22/201524

25. TDM Product Roadmap • Augment and Enrich the Inventory • Workflow Integrations with 3rd Party Support • Expand and enhance Metadata Normalization • Introduce Content Metrics for Retrieval • Cost Optimization • Information Content Optimization 4/22/201525

26. Thank You!

II-SDV 2015, 20 - 21 April, in Nice

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (17)

Similaire à II-SDV 2015, 20 - 21 April, in Nice

Similaire à II-SDV 2015, 20 - 21 April, in Nice (20)

Plus de Dr. Haxel Consult

Plus de Dr. Haxel Consult (20)

Dernier

Dernier (17)

II-SDV 2015, 20 - 21 April, in Nice