SlideShare une entreprise Scribd logo
1  sur  17
Télécharger pour lire hors ligne
OpenMinTeD
Building an Open
Text and Data Mining
Infrastructure
• Stelios Piperidis
• spip@ilsp.gr
• Institute for Language & Speech Processing
• Athena Research & Innovation Centre
● > 1,08 billion websites and 3,46 billion internet users, on 25 September 2016.
● > 24 million wireless sensors and actuators worldwide (553% up, between 2011 and
2016).
● > 16 zettabytes of useful data (16 Trillion GB) by 2020.
● YouTube claims to upload 24 hours of video every minute, making the site a hugely
significant data aggregator.
● “Every second, on average, around 6,000 tweets are tweeted on Twitter, which
corresponds to over 350,000 tweets sent per minute, >500 million tweets per day
and around 200 billion tweets per year”.
● 74,200,000 pages existed on Facebook, with 7 million apps and websites integrated
with Facebook on 30/5/2016.
The global research community generates over 1.5 million new scholarly
articles per annum.
e STM report (2009)
… some 90% of papers … are never cited.
… 50% of papers are never read by anyone other than their authors,
referees and journal editors
… one paper published every 30 seconds
… 70,000 papers published on a single protein, the tumor suppressor p53
e STM report (2009)
3
process textual sources, organise and classify in various dimensions, extract
main (indexical) information items
identify and extract entities and relations between entities, facilitate the
transformation of unstructured textual sources into structured data
enable the multidimensional analysis of structured data to extract meaningful
insights and improve the ability to predict
Text Types
Newswire
Scientific Literature
Tweets/blogs
Patents
Clinical/medical records
Textbooks, monographs
Online forums
….
Languages
English
French
German
Spanish
Portuguese
Italian
Polish
Tasks
Translation
Information Extraction
Semantic Search
Question Answering
Sentiment Analysis
Summarization
Knowledge Discovery
Domains
Finance/Business
Health
Biology
Social Sciences
Humanities
….
OpenMinteD Project - building a TDM infrastructure
Establish an open and sustainable Text and Data
Mining (TDM) platform and infrastructure where
researchers can collaboratively create, discover, share
and re-use knowledge from a wide range of text based
scientific and scholarly related sources.
7
OpenMinteD Project - building a TDM infrastructure
Text Mining Researchers
Content Providers
End UsersComputing Infrastructures
10
ACCESSIBLE
CONTENT
DISCOVERABLE
SERVICES
EFFICIENT
PROCESSING
RESEARCH
COMMUNITIES
VALUE ADDED APPS
Via standardised programmatic
interfaces and access rules
Well-documented easily
discoverable text mining services
and workflows which process,
analyse and annotate text
Operate on public e-Infrastructures
via standarized APIs
Different scientific communities
have different challenges
Community-driven applications to
illustrate the value of the
infastructure. Engage with industry.
10
From the very beginning…
Requirements, content, barriers, expected outcomes.
… to the very end
Create applications, validate and evaluate the results.
• Document literature content, language/knowledge resources, data categories taxonomies,
provenance information
• Document language processing/text mining services and workflows
• Generic and domain-specific metadata descriptions
• Combine services into workflows
• Combine content and language resources with services and workflows
• Combine automatic and manual/crowdsourcing annotation services
• Study IPR restrictions for reuse of sources as well as possible exceptions
• Promote clarity and standardisation of legal rights and obligations
• Translate the legal & policy aspects into specifications for lawful user-to-service and
service-to-service interactions
•
documenting, depositing, managing, publishing and sharing scientific content and
data, text and data mining software tools, services and workflows, language and
knowledge resources
•
to enable both technically but also legally the linking and pipelining of text mining
tools, services and workflows, as well as language and knowledge resources
•
automatic analysis, annotation and extraction of important information out of
scientific content
•
composing, scheduling and orchestrating new processing workflows by combining
existing text mining services and language/knowledge resources
•
services for advising on lawful use and combination of content, language resource
and text mining services
1. End users
- Researchers, data base curators, …
- Novice: use services to advance their science
- Advanced: use TDM services into complex workflows
14
2. Content and service providers
- Publishers, libraries, scientific data base centres, …
- TDM researchers
- SME’s
OpenMinteD Project - building a TDM infrastructure
OpenMinteD Project - building a TDM infrastructure
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus
twitter.com/openminted_eu
facebook.com/openminted
bit.do/openmintedlinkedin
vimeo.com/openminted
bit.do/openmintedplus

Contenu connexe

Tendances

The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mineopenminted_eu
 
G14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibraryG14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibraryevaminerva
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceOpenAIRE
 
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...LIBER Europe
 
Use it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructuresUse it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructuresVince Smith
 
Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011
Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011
Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011EDINA, University of Edinburgh
 
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...EDINA, University of Edinburgh
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeopenminted_eu
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Peter Neish
 
Probabilistic indexing for archival holdings - possibilities and limits
Probabilistic indexing for archival holdings - possibilities and limitsProbabilistic indexing for archival holdings - possibilities and limits
Probabilistic indexing for archival holdings - possibilities and limitsUniversité Libre de Bruxelles
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?openminted_eu
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015KISK FF MU
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu
 
Semantic Interoperability Issues and Approaches in the IoT.est Project
Semantic Interoperability Issues and Approaches in the IoT.est ProjectSemantic Interoperability Issues and Approaches in the IoT.est Project
Semantic Interoperability Issues and Approaches in the IoT.est Projectiotest
 
Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106jeffreylancaster
 

Tendances (20)

2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
G14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibraryG14 eyal reuven_nli_theopenlibrary
G14 eyal reuven_nli_theopenlibrary
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
Connecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open scienceConnecting the dots - e-Infra services for open science
Connecting the dots - e-Infra services for open science
 
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
Digital Humanities Clinics – Leading Dutch Librarians into DH. Lotte Wilms, N...
 
Use it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructuresUse it or lose it: a hybrid model for sustaining e-infrastructures
Use it or lose it: a hybrid model for sustaining e-infrastructures
 
Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011
Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011
Using OpenURL Activity Data for Activity Data Programme Meeting 05 July 2011
 
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
 
Webmining Overview
Webmining OverviewWebmining Overview
Webmining Overview
 
Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011Inmagic user group meeting Melbourne june 2011
Inmagic user group meeting Melbourne june 2011
 
Probabilistic indexing for archival holdings - possibilities and limits
Probabilistic indexing for archival holdings - possibilities and limitsProbabilistic indexing for archival holdings - possibilities and limits
Probabilistic indexing for archival holdings - possibilities and limits
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
Linda Treude, Sabine Wolf: Features for the Future Library #bcs2015
 
National Library Network Services
National Library Network ServicesNational Library Network Services
National Library Network Services
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
Semantic Interoperability Issues and Approaches in the IoT.est Project
Semantic Interoperability Issues and Approaches in the IoT.est ProjectSemantic Interoperability Issues and Approaches in the IoT.est Project
Semantic Interoperability Issues and Approaches in the IoT.est Project
 
Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106Freedman Center for Digital Scholarship Colloquium - 14_1106
Freedman Center for Digital Scholarship Colloquium - 14_1106
 
20200901 ECCB M. Kutmon
20200901 ECCB M. Kutmon20200901 ECCB M. Kutmon
20200901 ECCB M. Kutmon
 

En vedette

Lesson 1 physiological and psychological charactertistics of responses to s...
Lesson 1   physiological and psychological charactertistics of responses to s...Lesson 1   physiological and psychological charactertistics of responses to s...
Lesson 1 physiological and psychological charactertistics of responses to s...coburgpsych
 
USG_BSE Case Statement
USG_BSE Case StatementUSG_BSE Case Statement
USG_BSE Case StatementAnnais Zarate
 
Η νομιμοποίηση των γάμων των ομοφυλοφίλων στην Ελλάδα
Η νομιμοποίηση των γάμων των ομοφυλοφίλων στην ΕλλάδαΗ νομιμοποίηση των γάμων των ομοφυλοφίλων στην Ελλάδα
Η νομιμοποίηση των γάμων των ομοφυλοφίλων στην ΕλλάδαAlexia-Nefeli Dumas
 
CV Khaled-JULY 2014 Latest
CV Khaled-JULY 2014 LatestCV Khaled-JULY 2014 Latest
CV Khaled-JULY 2014 LatestKhaled Al-Doghry
 
A pattern language for microservices - Chris Richardson
A pattern language for microservices - Chris RichardsonA pattern language for microservices - Chris Richardson
A pattern language for microservices - Chris RichardsonJAXLondon_Conference
 
John Cole Director of the Health Estates Investment Group NI and Richard Mur...
John Cole Director of the Health Estates Investment Group NI and  Richard Mur...John Cole Director of the Health Estates Investment Group NI and  Richard Mur...
John Cole Director of the Health Estates Investment Group NI and Richard Mur...Architects for Health
 
General Presentation
General PresentationGeneral Presentation
General PresentationMartin Yates
 
Email of Doom: New phishing attacks that threaten your clients
Email of Doom: New phishing attacks that threaten your clientsEmail of Doom: New phishing attacks that threaten your clients
Email of Doom: New phishing attacks that threaten your clientsCalyptix Security
 
L'exception TDM dans la loi numérique : mérites, limites et perspectives
L'exception TDM dans la loi numérique : mérites, limites et perspectivesL'exception TDM dans la loi numérique : mérites, limites et perspectives
L'exception TDM dans la loi numérique : mérites, limites et perspectivesCalimaq S.I.Lex
 
Oligopoly and Collusion
Oligopoly and CollusionOligopoly and Collusion
Oligopoly and Collusiontutor2u
 
Aaron Higbee - The Humanity of Phishing Attack & Defense
Aaron Higbee - The Humanity of Phishing Attack & DefenseAaron Higbee - The Humanity of Phishing Attack & Defense
Aaron Higbee - The Humanity of Phishing Attack & DefenseJason Luttrell, CISSP, CISM
 

En vedette (13)

Lesson 1 physiological and psychological charactertistics of responses to s...
Lesson 1   physiological and psychological charactertistics of responses to s...Lesson 1   physiological and psychological charactertistics of responses to s...
Lesson 1 physiological and psychological charactertistics of responses to s...
 
USG_BSE Case Statement
USG_BSE Case StatementUSG_BSE Case Statement
USG_BSE Case Statement
 
Η νομιμοποίηση των γάμων των ομοφυλοφίλων στην Ελλάδα
Η νομιμοποίηση των γάμων των ομοφυλοφίλων στην ΕλλάδαΗ νομιμοποίηση των γάμων των ομοφυλοφίλων στην Ελλάδα
Η νομιμοποίηση των γάμων των ομοφυλοφίλων στην Ελλάδα
 
CV Khaled-JULY 2014 Latest
CV Khaled-JULY 2014 LatestCV Khaled-JULY 2014 Latest
CV Khaled-JULY 2014 Latest
 
A pattern language for microservices - Chris Richardson
A pattern language for microservices - Chris RichardsonA pattern language for microservices - Chris Richardson
A pattern language for microservices - Chris Richardson
 
John Cole Director of the Health Estates Investment Group NI and Richard Mur...
John Cole Director of the Health Estates Investment Group NI and  Richard Mur...John Cole Director of the Health Estates Investment Group NI and  Richard Mur...
John Cole Director of the Health Estates Investment Group NI and Richard Mur...
 
General Presentation
General PresentationGeneral Presentation
General Presentation
 
Email of Doom: New phishing attacks that threaten your clients
Email of Doom: New phishing attacks that threaten your clientsEmail of Doom: New phishing attacks that threaten your clients
Email of Doom: New phishing attacks that threaten your clients
 
Implicit object.pptx
Implicit object.pptxImplicit object.pptx
Implicit object.pptx
 
L'exception TDM dans la loi numérique : mérites, limites et perspectives
L'exception TDM dans la loi numérique : mérites, limites et perspectivesL'exception TDM dans la loi numérique : mérites, limites et perspectives
L'exception TDM dans la loi numérique : mérites, limites et perspectives
 
Oligopoly and Collusion
Oligopoly and CollusionOligopoly and Collusion
Oligopoly and Collusion
 
Emotional intelligence
Emotional intelligenceEmotional intelligence
Emotional intelligence
 
Aaron Higbee - The Humanity of Phishing Attack & Defense
Aaron Higbee - The Humanity of Phishing Attack & DefenseAaron Higbee - The Humanity of Phishing Attack & Defense
Aaron Higbee - The Humanity of Phishing Attack & Defense
 

Similaire à OpenMinteD Project - building a TDM infrastructure

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017openminted_eu
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015Sebastian Hellmann
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Semantic Web in the Plateau of Productivity
Semantic Web in the Plateau of ProductivitySemantic Web in the Plateau of Productivity
Semantic Web in the Plateau of ProductivityIoannis Stavrakantonakis
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Accessopenminted_eu
 
Liberate Your Library Building A Scottish Consortium November 16th 2009
Liberate Your Library   Building A Scottish Consortium November 16th 2009Liberate Your Library   Building A Scottish Consortium November 16th 2009
Liberate Your Library Building A Scottish Consortium November 16th 2009Jonathan Field
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...OpenAIRE
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformTrevor Owens
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformTrevor Owens
 
Opening Up The BL's Metadata
Opening Up The BL's MetadataOpening Up The BL's Metadata
Opening Up The BL's Metadatanw13
 
Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?Shawn Day
 
Subject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionSubject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionkmusthu
 
Better software, better service, better research: The Software Sustainabilit...
Better software, better service, better research: The Software Sustainabilit...Better software, better service, better research: The Software Sustainabilit...
Better software, better service, better research: The Software Sustainabilit...Carole Goble
 
Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017SusanMRob
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining Chris Shillum
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Anna Fensel
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?openminted_eu
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us? Andrea Volpini
 

Similaire à OpenMinteD Project - building a TDM infrastructure (20)

OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015LD4LT Roadmap session 19_02_2015
LD4LT Roadmap session 19_02_2015
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Semantic Web in the Plateau of Productivity
Semantic Web in the Plateau of ProductivitySemantic Web in the Plateau of Productivity
Semantic Web in the Plateau of Productivity
 
Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
Liberate Your Library Building A Scottish Consortium November 16th 2009
Liberate Your Library   Building A Scottish Consortium November 16th 2009Liberate Your Library   Building A Scottish Consortium November 16th 2009
Liberate Your Library Building A Scottish Consortium November 16th 2009
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital Platform
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Next Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital PlatformNext Steps for IMLS's National Digital Platform
Next Steps for IMLS's National Digital Platform
 
Opening Up The BL's Metadata
Opening Up The BL's MetadataOpening Up The BL's Metadata
Opening Up The BL's Metadata
 
Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?Does DH Scholarship Take Place in the Lab?
Does DH Scholarship Take Place in the Lab?
 
Subject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introductionSubject information gateway in information technology (sigit) an introduction
Subject information gateway in information technology (sigit) an introduction
 
Better software, better service, better research: The Software Sustainabilit...
Better software, better service, better research: The Software Sustainabilit...Better software, better service, better research: The Software Sustainabilit...
Better software, better service, better research: The Software Sustainabilit...
 
Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017Apo presentation research librarians day feb 2017
Apo presentation research librarians day feb 2017
 
A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining A Pragmatic Approach to Facilitating Text and Data Mining
A Pragmatic Approach to Facilitating Text and Data Mining
 
Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)Towards Semantic APIs for Research Data Services (Invited Talk)
Towards Semantic APIs for Research Data Services (Invited Talk)
 
co:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlbergerco:op-READ-Convention Marburg - Günter Mühlberger
co:op-READ-Convention Marburg - Günter Mühlberger
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 
What do we want computers to do for us?
What do we want computers to do for us? What do we want computers to do for us?
What do we want computers to do for us?
 

Plus de FutureTDM

FutureTDM Roadmap
FutureTDM RoadmapFutureTDM Roadmap
FutureTDM RoadmapFutureTDM
 
Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...FutureTDM
 
FutureTDM Symposium: Skills & Education
FutureTDM Symposium: Skills & EducationFutureTDM Symposium: Skills & Education
FutureTDM Symposium: Skills & EducationFutureTDM
 
FutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOSFutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOSFutureTDM
 
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...FutureTDM
 
The economic potential of data analytics
The economic potential of data analyticsThe economic potential of data analytics
The economic potential of data analyticsFutureTDM
 
Introduction to the FutureTDM project
Introduction to the FutureTDM projectIntroduction to the FutureTDM project
Introduction to the FutureTDM projectFutureTDM
 
FutureTDM Workshop II 29 March
FutureTDM Workshop II 29 MarchFutureTDM Workshop II 29 March
FutureTDM Workshop II 29 MarchFutureTDM
 
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...FutureTDM
 
The legal factors
The legal factorsThe legal factors
The legal factorsFutureTDM
 
What have we learned from talking with the TDM community?
What have we learned from talking with the TDM community?What have we learned from talking with the TDM community?
What have we learned from talking with the TDM community?FutureTDM
 
So where are we now? The TDM landscape
So where are we now? The TDM landscapeSo where are we now? The TDM landscape
So where are we now? The TDM landscapeFutureTDM
 

Plus de FutureTDM (12)

FutureTDM Roadmap
FutureTDM RoadmapFutureTDM Roadmap
FutureTDM Roadmap
 
Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...Technologies and infrastructures supporting text and data analytics: Challeng...
Technologies and infrastructures supporting text and data analytics: Challeng...
 
FutureTDM Symposium: Skills & Education
FutureTDM Symposium: Skills & EducationFutureTDM Symposium: Skills & Education
FutureTDM Symposium: Skills & Education
 
FutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOSFutureTDM Symposium_DEMOS
FutureTDM Symposium_DEMOS
 
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...
 
The economic potential of data analytics
The economic potential of data analyticsThe economic potential of data analytics
The economic potential of data analytics
 
Introduction to the FutureTDM project
Introduction to the FutureTDM projectIntroduction to the FutureTDM project
Introduction to the FutureTDM project
 
FutureTDM Workshop II 29 March
FutureTDM Workshop II 29 MarchFutureTDM Workshop II 29 March
FutureTDM Workshop II 29 March
 
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
 
The legal factors
The legal factorsThe legal factors
The legal factors
 
What have we learned from talking with the TDM community?
What have we learned from talking with the TDM community?What have we learned from talking with the TDM community?
What have we learned from talking with the TDM community?
 
So where are we now? The TDM landscape
So where are we now? The TDM landscapeSo where are we now? The TDM landscape
So where are we now? The TDM landscape
 

Dernier

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)Data & Analytics Magazin
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptaigil2
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.JasonViviers2
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024Becky Burwell
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 

Dernier (17)

CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)AI for Sustainable Development Goals (SDGs)
AI for Sustainable Development Goals (SDGs)
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
MEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .pptMEASURES OF DISPERSION I BSc Botany .ppt
MEASURES OF DISPERSION I BSc Botany .ppt
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.YourView Panel Book.pptx YourView Panel Book.
YourView Panel Book.pptx YourView Panel Book.
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024SFBA Splunk Usergroup meeting March 13, 2024
SFBA Splunk Usergroup meeting March 13, 2024
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 

OpenMinteD Project - building a TDM infrastructure

  • 1. OpenMinTeD Building an Open Text and Data Mining Infrastructure • Stelios Piperidis • spip@ilsp.gr • Institute for Language & Speech Processing • Athena Research & Innovation Centre
  • 2. ● > 1,08 billion websites and 3,46 billion internet users, on 25 September 2016. ● > 24 million wireless sensors and actuators worldwide (553% up, between 2011 and 2016). ● > 16 zettabytes of useful data (16 Trillion GB) by 2020. ● YouTube claims to upload 24 hours of video every minute, making the site a hugely significant data aggregator. ● “Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, >500 million tweets per day and around 200 billion tweets per year”. ● 74,200,000 pages existed on Facebook, with 7 million apps and websites integrated with Facebook on 30/5/2016.
  • 3. The global research community generates over 1.5 million new scholarly articles per annum. e STM report (2009) … some 90% of papers … are never cited. … 50% of papers are never read by anyone other than their authors, referees and journal editors … one paper published every 30 seconds … 70,000 papers published on a single protein, the tumor suppressor p53 e STM report (2009) 3
  • 4. process textual sources, organise and classify in various dimensions, extract main (indexical) information items identify and extract entities and relations between entities, facilitate the transformation of unstructured textual sources into structured data enable the multidimensional analysis of structured data to extract meaningful insights and improve the ability to predict
  • 5. Text Types Newswire Scientific Literature Tweets/blogs Patents Clinical/medical records Textbooks, monographs Online forums …. Languages English French German Spanish Portuguese Italian Polish Tasks Translation Information Extraction Semantic Search Question Answering Sentiment Analysis Summarization Knowledge Discovery Domains Finance/Business Health Biology Social Sciences Humanities ….
  • 7. Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific and scholarly related sources. 7
  • 9. Text Mining Researchers Content Providers End UsersComputing Infrastructures
  • 10. 10 ACCESSIBLE CONTENT DISCOVERABLE SERVICES EFFICIENT PROCESSING RESEARCH COMMUNITIES VALUE ADDED APPS Via standardised programmatic interfaces and access rules Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text Operate on public e-Infrastructures via standarized APIs Different scientific communities have different challenges Community-driven applications to illustrate the value of the infastructure. Engage with industry. 10
  • 11. From the very beginning… Requirements, content, barriers, expected outcomes. … to the very end Create applications, validate and evaluate the results.
  • 12. • Document literature content, language/knowledge resources, data categories taxonomies, provenance information • Document language processing/text mining services and workflows • Generic and domain-specific metadata descriptions • Combine services into workflows • Combine content and language resources with services and workflows • Combine automatic and manual/crowdsourcing annotation services • Study IPR restrictions for reuse of sources as well as possible exceptions • Promote clarity and standardisation of legal rights and obligations • Translate the legal & policy aspects into specifications for lawful user-to-service and service-to-service interactions
  • 13. • documenting, depositing, managing, publishing and sharing scientific content and data, text and data mining software tools, services and workflows, language and knowledge resources • to enable both technically but also legally the linking and pipelining of text mining tools, services and workflows, as well as language and knowledge resources • automatic analysis, annotation and extraction of important information out of scientific content • composing, scheduling and orchestrating new processing workflows by combining existing text mining services and language/knowledge resources • services for advising on lawful use and combination of content, language resource and text mining services
  • 14. 1. End users - Researchers, data base curators, … - Novice: use services to advance their science - Advanced: use TDM services into complex workflows 14 2. Content and service providers - Publishers, libraries, scientific data base centres, … - TDM researchers - SME’s