SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Infrastructure
crossroads
Richard Eckart de Castilho
UKP LAB
Technische Universität Darmstadt
...and the way we walked them in dkpro
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
PRESENTER
Dr. Richard
Eckart de Castilho
•  Interoperability WP lead @ OpenMinTeD
•  Technical Lead @ UKP
•  Java developer
•  Open source guy
•  NLP software infrastructure researcher
•  Apache UIMA developer
•  DKPro person
@i_am_rec
https://github.com/reckart
Ubiquitous Knowledge Processing Lab
Technische Universität Darmstadt
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Ubiquitous knowledge
Processing LAB
•  Argumentation Mining
•  Language Technology for Digital Humanities
•  Lexical-Semantic Resources &Algorithms
•  Text Mining & Analytics
•  Writing Assistance and Language Learning
@UKPLab
http://www.ukp.tu-darmstadt.de
Prof. Dr. Iryna Gurevych
Technische Universität Darmstadt
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
DKPro – reuse not reinvent
•  What?
•  Collection of open-source projects related to NLP
•  Community of communities
•  Interoperability between projects
•  Target group: programmers, researchers, application developers
•  Why?
•  Flexibility and control – liberal licensing and redistributable software
•  Sustainability – open community not bound to specific grants
•  Replicability – portable software distributed through repositories
•  Usability – the the edge out of installation
•  Projects
•  DKPro Core – linguistic preprocessing, interoperable third-party tools
•  DKPro TC – text classification experimentation suite
•  UBY – unified semantic resource
•  CSniper – integrated search and annotaton
•  … https://github.io/dkpro
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
… but why like this?
… how else could it be done?
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Analytics
•  Analytics layer
•  Analytics tools (tagger, parser, etc.)
•  Interoperability layer
•  Input/output conversion
•  Tool wrappers
•  Pivot data model
•  Workflow layer
•  Workflow descriptions
•  Workflow engines
•  UI layer
•  Workflow editors
•  Annotation editors
•  Exploration / visualization
Complete!
Solution!
Analytics stack
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Automatic text analysis
•  pragmatic
•  Gain insight about a particular field of interest
•  Investigate data
•  Use latest data available
•  Results relevant for the moment
•  No need for reproducibility
•  principled
•  Interest in reproducibility
•  Investigate methods
•  Use a fixed data set
•  Results should be reproducible
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Manual text analysis
•  pragmatic
•  Collaborative analysis
•  Get as much done as quickly as possible
•  All see/edit the same data / annotations
•  No means of measuring quality / single truth
•  Principled
•  Training data for supervised machine learning
•  Evaluation of automatic methods
•  Distributed analysis
•  Guideline-driven process
•  Multiple independent analyses/annotations
•  Inter-annotator agreement as quality indicator
•  Human in the loop
•  Analytics make suggestions / guide human
•  Human input guides analystics
Human!Machine!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
deployment
•  Distributed / static
•  Service oriented
•  High network traffic
•  Running cost
•  Risk of decay / limited availability of older versions
•  More control to providers
•  Localized / dynamic
•  Cloud computing
•  Reduced cost
•  Data locality
•  Scalability
•  Large freedom choosing a version
•  More control to users
•  Gateways
•  Make dynamic setup appear static
•  Handle input/output and workflow management
•  Walled garden vs. convenience
Software!
Repository!
Gateway!
Gateway!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
“openness”
•  Open
•  Liberal licensing
•  Freedom to choose deployment
•  Integrate custom resources/analytics
•  Control to the user
•  Not open/closed
•  Copyleft/proprietary licensing
•  Prescribed deployment
•  Difficult to customize for the user
•  Control to the provider
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
A peek at the landscape
Service-based
•  ARGO*
•  Pipeline builder, annotation editor
•  Online platform accessible through
gateway
•  Internally dynamic deployment (afaik)
•  Closed source
•  Weblicht / Alveo / LAPPS
•  Pipeline builder
•  Online platform accessible through
gateway
•  Many services distributed over multiple
locations/stakeholders
•  Some offer access to non-public
content/analytics
•  Some are partially open source
Software-based
•  DKPro Core* / ClearTK
•  Component collection
•  Pipeline scripting / programming
•  Repository-based
•  Easy to deploy/embed anywhere
•  Open source
•  GATE workbench*
•  Pipeline builder, annotation editor,
+++
•  Desktop application
•  GATE Cloud
•  Open source
•  …
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
DKPro Core – Runnable example
#!/usr/bin/env groovy
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl',
version='1.5.0')
import de.tudarmstadt.ukp.dkpro.core.opennlp.*;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.fit.pipeline.SimplePipeline;
import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*;
import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*;
import static org.apache.uima.fit.util.JCasUtil.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;
def jcas = JCasFactory.createJCas();
jcas.documentText = "This is a test";
jcas.documentLanguage = "en";
SimplePipeline.runPipeline(jcas,
createEngineDescription(OpenNlpSegmenter),
createEngineDescription(OpenNlpPosTagger),
createEngineDescription(OpenNlpParser,
OpenNlpParser.PARAM_WRITE_PENN_TREE, true));
select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" }
select(jcas, PennTree).each { println it.pennTree }
Fetches all required!
dependencies!
No manual installation!!
Input!
Analytics pipeline.!
Language-specific!
resources fetched !
automatically!
Output!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
DKPro Core – Runnable example
#!/usr/bin/env groovy
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl',
version='1.5.0')
import de.tudarmstadt.ukp.dkpro.core.opennlp.*;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.fit.pipeline.SimplePipeline;
import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*;
import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*;
import static org.apache.uima.fit.util.JCasUtil.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;
def jcas = JCasFactory.createJCas();
jcas.documentText = "This is a test";
jcas.documentLanguage = "en";
SimplePipeline.runPipeline(jcas,
createEngineDescription(OpenNlpSegmenter),
createEngineDescription(OpenNlpPosTagger),
createEngineDescription(OpenNlpParser,
OpenNlpParser.PARAM_WRITE_PENN_TREE, true));
select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" }
select(jcas, PennTree).each { println it.pennTree }
Fetches all required!
dependencies!
No manual installation!!
Input!
Analytics pipeline.!
Language-specific!
resources fetched !
automatically!
Output!
Why is this cool?!
This is an actual running example!!
Requires only !
JVM + Groovy (+ Internet connection)!
Easy to parallelize / scale!
Trivial to embed in applications!
Trivial to wrap as a service!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Conclusion / Challenges
•  Data is growing / analytics get more complex
•  Need more powerful systems to process it
•  Human in the loop
•  Human interaction influences analytics and vice versa
•  Need to move data and analytics around
•  Often conflicts with interest in protection of investment
•  Need interoperability
•  To discover data, resources, and analytics
•  To access data and resources
•  To deploy analytics
•  To retrieve and further use results
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
What comes next?
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
tomorrow @ the hague: interoperability
Data
Conversion!
Analysis!
Automatic Step /!
Analysis
Component /!
Nested workflow!
Human Annotation/
Human Correction!
Resource
repository!
(Auxiliary Data)!
Data!
Source!
Data!
Sink!
Provenance!
WG1!
WG2!
WG3!
WG4!
Data
Conversion!
A
P
I!
A
P
I!
API!
Software
repository!
API!
ID / Version!
ID / Version!
New ID / Version!
Desktop / Server!
Cloud !
resource!
| | | | | | | | | | | | | | | | !
Cluster!
Portability / Scalability / Sustainability!
Analysis!
Service!
API!
Rights and restrictions aggregation!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Thanks
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
References
•  Alveo http://alveo.edu.au/
•  Argo http://argo.nactem.ac.uk
•  CLEARTK http://cleartk.github.io/cleartk/
•  DKPro https://dkpro.github.io
•  Gate https://gate.ac.uk
•  Lapps http://www.lappsgrid.org
•  UIMA http://uima.apache.org
•  Weblicht https://weblicht.sfs.uni-tuebingen.de/

Contenu connexe

Tendances

FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...Charlie Hull
 
The Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelThe Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelBernhard Haslhofer
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly CommunicationBernhard Haslhofer
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioOpen Knowledge Belgium
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...Charlie Hull
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollinkSSSW
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsPeter Haase
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncFranz Inc. - AllegroGraph
 
20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patengeKarin Patenge
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patengeKarin Patenge
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareCharlie Hull
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseCharlie Hull
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Developmentjexp
 
Scalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OScalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OSri Ambati
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
Intranet show and_tell_2010
Intranet show and_tell_2010Intranet show and_tell_2010
Intranet show and_tell_2010Charlie Hull
 

Tendances (20)

FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
 
The Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelThe Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) Model
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly Communication
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
 
20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search software
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterprise
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Development
 
Scalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OScalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2O
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Intranet show and_tell_2010
Intranet show and_tell_2010Intranet show and_tell_2010
Intranet show and_tell_2010
 

En vedette

Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Accessopenminted_eu
 
Towards a European Research Information Infrastructure
Towards a European Research Information InfrastructureTowards a European Research Information Infrastructure
Towards a European Research Information InfrastructureOpenAIRE
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?openminted_eu
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?openminted_eu
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveopenminted_eu
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesopenminted_eu
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeopenminted_eu
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mineopenminted_eu
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK thesesopenminted_eu
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilitiesopenminted_eu
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Miningopenminted_eu
 

En vedette (13)

Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
Towards a European Research Information Infrastructure
Towards a European Research Information InfrastructureTowards a European Research Information Infrastructure
Towards a European Research Information Infrastructure
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
 

Similaire à Text and Data Mining Challenges in Europe

Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldOpenSource Connections
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsOpenSource Connections
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Péter Király
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 

Similaire à Text and Data Mining Challenges in Europe (20)

Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data World
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patents
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 

Plus de openminted_eu

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDMopenminted_eu
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017openminted_eu
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncopenminted_eu
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...openminted_eu
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlandsopenminted_eu
 

Plus de openminted_eu (6)

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 

Dernier

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 

Dernier (20)

Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 

Text and Data Mining Challenges in Europe

  • 1. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Infrastructure crossroads Richard Eckart de Castilho UKP LAB Technische Universität Darmstadt ...and the way we walked them in dkpro
  • 2. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague PRESENTER Dr. Richard Eckart de Castilho •  Interoperability WP lead @ OpenMinTeD •  Technical Lead @ UKP •  Java developer •  Open source guy •  NLP software infrastructure researcher •  Apache UIMA developer •  DKPro person @i_am_rec https://github.com/reckart Ubiquitous Knowledge Processing Lab Technische Universität Darmstadt
  • 3. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Ubiquitous knowledge Processing LAB •  Argumentation Mining •  Language Technology for Digital Humanities •  Lexical-Semantic Resources &Algorithms •  Text Mining & Analytics •  Writing Assistance and Language Learning @UKPLab http://www.ukp.tu-darmstadt.de Prof. Dr. Iryna Gurevych Technische Universität Darmstadt
  • 4. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague DKPro – reuse not reinvent •  What? •  Collection of open-source projects related to NLP •  Community of communities •  Interoperability between projects •  Target group: programmers, researchers, application developers •  Why? •  Flexibility and control – liberal licensing and redistributable software •  Sustainability – open community not bound to specific grants •  Replicability – portable software distributed through repositories •  Usability – the the edge out of installation •  Projects •  DKPro Core – linguistic preprocessing, interoperable third-party tools •  DKPro TC – text classification experimentation suite •  UBY – unified semantic resource •  CSniper – integrated search and annotaton •  … https://github.io/dkpro
  • 5. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague … but why like this? … how else could it be done? Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
  • 6. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Analytics •  Analytics layer •  Analytics tools (tagger, parser, etc.) •  Interoperability layer •  Input/output conversion •  Tool wrappers •  Pivot data model •  Workflow layer •  Workflow descriptions •  Workflow engines •  UI layer •  Workflow editors •  Annotation editors •  Exploration / visualization Complete! Solution! Analytics stack
  • 7. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Automatic text analysis •  pragmatic •  Gain insight about a particular field of interest •  Investigate data •  Use latest data available •  Results relevant for the moment •  No need for reproducibility •  principled •  Interest in reproducibility •  Investigate methods •  Use a fixed data set •  Results should be reproducible
  • 8. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Manual text analysis •  pragmatic •  Collaborative analysis •  Get as much done as quickly as possible •  All see/edit the same data / annotations •  No means of measuring quality / single truth •  Principled •  Training data for supervised machine learning •  Evaluation of automatic methods •  Distributed analysis •  Guideline-driven process •  Multiple independent analyses/annotations •  Inter-annotator agreement as quality indicator •  Human in the loop •  Analytics make suggestions / guide human •  Human input guides analystics Human!Machine!
  • 9. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague deployment •  Distributed / static •  Service oriented •  High network traffic •  Running cost •  Risk of decay / limited availability of older versions •  More control to providers •  Localized / dynamic •  Cloud computing •  Reduced cost •  Data locality •  Scalability •  Large freedom choosing a version •  More control to users •  Gateways •  Make dynamic setup appear static •  Handle input/output and workflow management •  Walled garden vs. convenience Software! Repository! Gateway! Gateway!
  • 10. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague “openness” •  Open •  Liberal licensing •  Freedom to choose deployment •  Integrate custom resources/analytics •  Control to the user •  Not open/closed •  Copyleft/proprietary licensing •  Prescribed deployment •  Difficult to customize for the user •  Control to the provider
  • 11. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague A peek at the landscape Service-based •  ARGO* •  Pipeline builder, annotation editor •  Online platform accessible through gateway •  Internally dynamic deployment (afaik) •  Closed source •  Weblicht / Alveo / LAPPS •  Pipeline builder •  Online platform accessible through gateway •  Many services distributed over multiple locations/stakeholders •  Some offer access to non-public content/analytics •  Some are partially open source Software-based •  DKPro Core* / ClearTK •  Component collection •  Pipeline scripting / programming •  Repository-based •  Easy to deploy/embed anywhere •  Open source •  GATE workbench* •  Pipeline builder, annotation editor, +++ •  Desktop application •  GATE Cloud •  Open source •  …
  • 12. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague DKPro Core – Runnable example #!/usr/bin/env groovy @Grab(group='de.tudarmstadt.ukp.dkpro.core', module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl', version='1.5.0') import de.tudarmstadt.ukp.dkpro.core.opennlp.*; import org.apache.uima.fit.factory.JCasFactory; import org.apache.uima.fit.pipeline.SimplePipeline; import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*; import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*; import static org.apache.uima.fit.util.JCasUtil.*; import static org.apache.uima.fit.factory.AnalysisEngineFactory.*; def jcas = JCasFactory.createJCas(); jcas.documentText = "This is a test"; jcas.documentLanguage = "en"; SimplePipeline.runPipeline(jcas, createEngineDescription(OpenNlpSegmenter), createEngineDescription(OpenNlpPosTagger), createEngineDescription(OpenNlpParser, OpenNlpParser.PARAM_WRITE_PENN_TREE, true)); select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" } select(jcas, PennTree).each { println it.pennTree } Fetches all required! dependencies! No manual installation!! Input! Analytics pipeline.! Language-specific! resources fetched ! automatically! Output!
  • 13. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague DKPro Core – Runnable example #!/usr/bin/env groovy @Grab(group='de.tudarmstadt.ukp.dkpro.core', module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl', version='1.5.0') import de.tudarmstadt.ukp.dkpro.core.opennlp.*; import org.apache.uima.fit.factory.JCasFactory; import org.apache.uima.fit.pipeline.SimplePipeline; import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*; import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*; import static org.apache.uima.fit.util.JCasUtil.*; import static org.apache.uima.fit.factory.AnalysisEngineFactory.*; def jcas = JCasFactory.createJCas(); jcas.documentText = "This is a test"; jcas.documentLanguage = "en"; SimplePipeline.runPipeline(jcas, createEngineDescription(OpenNlpSegmenter), createEngineDescription(OpenNlpPosTagger), createEngineDescription(OpenNlpParser, OpenNlpParser.PARAM_WRITE_PENN_TREE, true)); select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" } select(jcas, PennTree).each { println it.pennTree } Fetches all required! dependencies! No manual installation!! Input! Analytics pipeline.! Language-specific! resources fetched ! automatically! Output! Why is this cool?! This is an actual running example!! Requires only ! JVM + Groovy (+ Internet connection)! Easy to parallelize / scale! Trivial to embed in applications! Trivial to wrap as a service!
  • 14. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Conclusion / Challenges •  Data is growing / analytics get more complex •  Need more powerful systems to process it •  Human in the loop •  Human interaction influences analytics and vice versa •  Need to move data and analytics around •  Often conflicts with interest in protection of investment •  Need interoperability •  To discover data, resources, and analytics •  To access data and resources •  To deploy analytics •  To retrieve and further use results
  • 15. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague What comes next? Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
  • 16. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague tomorrow @ the hague: interoperability Data Conversion! Analysis! Automatic Step /! Analysis Component /! Nested workflow! Human Annotation/ Human Correction! Resource repository! (Auxiliary Data)! Data! Source! Data! Sink! Provenance! WG1! WG2! WG3! WG4! Data Conversion! A P I! A P I! API! Software repository! API! ID / Version! ID / Version! New ID / Version! Desktop / Server! Cloud ! resource! | | | | | | | | | | | | | | | | ! Cluster! Portability / Scalability / Sustainability! Analysis! Service! API! Rights and restrictions aggregation!
  • 17. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Thanks Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
  • 18. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague References •  Alveo http://alveo.edu.au/ •  Argo http://argo.nactem.ac.uk •  CLEARTK http://cleartk.github.io/cleartk/ •  DKPro https://dkpro.github.io •  Gate https://gate.ac.uk •  Lapps http://www.lappsgrid.org •  UIMA http://uima.apache.org •  Weblicht https://weblicht.sfs.uni-tuebingen.de/