SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Infrastructure
crossroads
Richard Eckart de Castilho
UKP LAB
Technische Universität Darmstadt
...and the way we walked them in dkpro
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
PRESENTER
Dr. Richard
Eckart de Castilho
•  Interoperability WP lead @ OpenMinTeD
•  Technical Lead @ UKP
•  Java developer
•  Open source guy
•  NLP software infrastructure researcher
•  Apache UIMA developer
•  DKPro person
@i_am_rec
https://github.com/reckart
Ubiquitous Knowledge Processing Lab
Technische Universität Darmstadt
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Ubiquitous knowledge
Processing LAB
•  Argumentation Mining
•  Language Technology for Digital Humanities
•  Lexical-Semantic Resources &Algorithms
•  Text Mining & Analytics
•  Writing Assistance and Language Learning
@UKPLab
http://www.ukp.tu-darmstadt.de
Prof. Dr. Iryna Gurevych
Technische Universität Darmstadt
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
DKPro – reuse not reinvent
•  What?
•  Collection of open-source projects related to NLP
•  Community of communities
•  Interoperability between projects
•  Target group: programmers, researchers, application developers
•  Why?
•  Flexibility and control – liberal licensing and redistributable software
•  Sustainability – open community not bound to specific grants
•  Replicability – portable software distributed through repositories
•  Usability – the the edge out of installation
•  Projects
•  DKPro Core – linguistic preprocessing, interoperable third-party tools
•  DKPro TC – text classification experimentation suite
•  UBY – unified semantic resource
•  CSniper – integrated search and annotaton
•  … https://github.io/dkpro
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
… but why like this?
… how else could it be done?
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Analytics
•  Analytics layer
•  Analytics tools (tagger, parser, etc.)
•  Interoperability layer
•  Input/output conversion
•  Tool wrappers
•  Pivot data model
•  Workflow layer
•  Workflow descriptions
•  Workflow engines
•  UI layer
•  Workflow editors
•  Annotation editors
•  Exploration / visualization
Complete!
Solution!
Analytics stack
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Automatic text analysis
•  pragmatic
•  Gain insight about a particular field of interest
•  Investigate data
•  Use latest data available
•  Results relevant for the moment
•  No need for reproducibility
•  principled
•  Interest in reproducibility
•  Investigate methods
•  Use a fixed data set
•  Results should be reproducible
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Manual text analysis
•  pragmatic
•  Collaborative analysis
•  Get as much done as quickly as possible
•  All see/edit the same data / annotations
•  No means of measuring quality / single truth
•  Principled
•  Training data for supervised machine learning
•  Evaluation of automatic methods
•  Distributed analysis
•  Guideline-driven process
•  Multiple independent analyses/annotations
•  Inter-annotator agreement as quality indicator
•  Human in the loop
•  Analytics make suggestions / guide human
•  Human input guides analystics
Human!Machine!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
deployment
•  Distributed / static
•  Service oriented
•  High network traffic
•  Running cost
•  Risk of decay / limited availability of older versions
•  More control to providers
•  Localized / dynamic
•  Cloud computing
•  Reduced cost
•  Data locality
•  Scalability
•  Large freedom choosing a version
•  More control to users
•  Gateways
•  Make dynamic setup appear static
•  Handle input/output and workflow management
•  Walled garden vs. convenience
Software!
Repository!
Gateway!
Gateway!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
“openness”
•  Open
•  Liberal licensing
•  Freedom to choose deployment
•  Integrate custom resources/analytics
•  Control to the user
•  Not open/closed
•  Copyleft/proprietary licensing
•  Prescribed deployment
•  Difficult to customize for the user
•  Control to the provider
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
A peek at the landscape
Service-based
•  ARGO*
•  Pipeline builder, annotation editor
•  Online platform accessible through
gateway
•  Internally dynamic deployment (afaik)
•  Closed source
•  Weblicht / Alveo / LAPPS
•  Pipeline builder
•  Online platform accessible through
gateway
•  Many services distributed over multiple
locations/stakeholders
•  Some offer access to non-public
content/analytics
•  Some are partially open source
Software-based
•  DKPro Core* / ClearTK
•  Component collection
•  Pipeline scripting / programming
•  Repository-based
•  Easy to deploy/embed anywhere
•  Open source
•  GATE workbench*
•  Pipeline builder, annotation editor,
+++
•  Desktop application
•  GATE Cloud
•  Open source
•  …
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
DKPro Core – Runnable example
#!/usr/bin/env groovy
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl',
version='1.5.0')
import de.tudarmstadt.ukp.dkpro.core.opennlp.*;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.fit.pipeline.SimplePipeline;
import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*;
import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*;
import static org.apache.uima.fit.util.JCasUtil.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;
def jcas = JCasFactory.createJCas();
jcas.documentText = "This is a test";
jcas.documentLanguage = "en";
SimplePipeline.runPipeline(jcas,
createEngineDescription(OpenNlpSegmenter),
createEngineDescription(OpenNlpPosTagger),
createEngineDescription(OpenNlpParser,
OpenNlpParser.PARAM_WRITE_PENN_TREE, true));
select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" }
select(jcas, PennTree).each { println it.pennTree }
Fetches all required!
dependencies!
No manual installation!!
Input!
Analytics pipeline.!
Language-specific!
resources fetched !
automatically!
Output!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
DKPro Core – Runnable example
#!/usr/bin/env groovy
@Grab(group='de.tudarmstadt.ukp.dkpro.core',
module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl',
version='1.5.0')
import de.tudarmstadt.ukp.dkpro.core.opennlp.*;
import org.apache.uima.fit.factory.JCasFactory;
import org.apache.uima.fit.pipeline.SimplePipeline;
import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*;
import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*;
import static org.apache.uima.fit.util.JCasUtil.*;
import static org.apache.uima.fit.factory.AnalysisEngineFactory.*;
def jcas = JCasFactory.createJCas();
jcas.documentText = "This is a test";
jcas.documentLanguage = "en";
SimplePipeline.runPipeline(jcas,
createEngineDescription(OpenNlpSegmenter),
createEngineDescription(OpenNlpPosTagger),
createEngineDescription(OpenNlpParser,
OpenNlpParser.PARAM_WRITE_PENN_TREE, true));
select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" }
select(jcas, PennTree).each { println it.pennTree }
Fetches all required!
dependencies!
No manual installation!!
Input!
Analytics pipeline.!
Language-specific!
resources fetched !
automatically!
Output!
Why is this cool?!
This is an actual running example!!
Requires only !
JVM + Groovy (+ Internet connection)!
Easy to parallelize / scale!
Trivial to embed in applications!
Trivial to wrap as a service!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Conclusion / Challenges
•  Data is growing / analytics get more complex
•  Need more powerful systems to process it
•  Human in the loop
•  Human interaction influences analytics and vice versa
•  Need to move data and analytics around
•  Often conflicts with interest in protection of investment
•  Need interoperability
•  To discover data, resources, and analytics
•  To access data and resources
•  To deploy analytics
•  To retrieve and further use results
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
What comes next?
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
tomorrow @ the hague: interoperability
Data
Conversion!
Analysis!
Automatic Step /!
Analysis
Component /!
Nested workflow!
Human Annotation/
Human Correction!
Resource
repository!
(Auxiliary Data)!
Data!
Source!
Data!
Sink!
Provenance!
WG1!
WG2!
WG3!
WG4!
Data
Conversion!
A
P
I!
A
P
I!
API!
Software
repository!
API!
ID / Version!
ID / Version!
New ID / Version!
Desktop / Server!
Cloud !
resource!
| | | | | | | | | | | | | | | | !
Cluster!
Portability / Scalability / Sustainability!
Analysis!
Service!
API!
Rights and restrictions aggregation!
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Thanks
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
References
•  Alveo http://alveo.edu.au/
•  Argo http://argo.nactem.ac.uk
•  CLEARTK http://cleartk.github.io/cleartk/
•  DKPro https://dkpro.github.io
•  Gate https://gate.ac.uk
•  Lapps http://www.lappsgrid.org
•  UIMA http://uima.apache.org
•  Weblicht https://weblicht.sfs.uni-tuebingen.de/

Contenu connexe

Tendances

FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...Charlie Hull
 
The Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelThe Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelBernhard Haslhofer
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly CommunicationBernhard Haslhofer
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioOpen Knowledge Belgium
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...Charlie Hull
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollinkSSSW
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsPeter Haase
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncFranz Inc. - AllegroGraph
 
20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patengeKarin Patenge
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patengeKarin Patenge
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczIoan Toma
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareCharlie Hull
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Jeff Z. Pan
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseCharlie Hull
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Developmentjexp
 
Scalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OScalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OSri Ambati
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsPeter Haase
 
Intranet show and_tell_2010
Intranet show and_tell_2010Intranet show and_tell_2010
Intranet show and_tell_2010Charlie Hull
 

Tendances (20)

FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
FIBEP WMIC 2015 - How Infomedia upgraded their closed-source search engine to...
 
The Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) ModelThe Open Annotation Collaboration (OAC) Model
The Open Annotation Collaboration (OAC) Model
 
Linked Data in Scholarly Communication
Linked Data in Scholarly CommunicationLinked Data in Scholarly Communication
Linked Data in Scholarly Communication
 
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLioDo it on your own - From 3 to 5 Star Linked Open Data with RMLio
Do it on your own - From 3 to 5 Star Linked Open Data with RMLio
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Hybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge GraphsHybrid Enterprise Knowledge Graphs
Hybrid Enterprise Knowledge Graphs
 
Why is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz IncWhy is JSON-LD Important to Businesses - Franz Inc
Why is JSON-LD Important to Businesses - Franz Inc
 
20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge20181019 code.talks graph_analytics_k_patenge
20181019 code.talks graph_analytics_k_patenge
 
SSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow TutorialSSSW2015 Data Workflow Tutorial
SSSW2015 Data Workflow Tutorial
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge20181123 dn2018 graph_analytics_k_patenge
20181123 dn2018 graph_analytics_k_patenge
 
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter BonczFOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
FOSDEM2014 - Social Network Benchmark (SNB) Graph Generator - Peter Boncz
 
Turning search upside down with powerful open source search software
Turning search upside down with powerful open source search softwareTurning search upside down with powerful open source search software
Turning search upside down with powerful open source search software
 
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
Linked Data and Knowledge Graphs -- Constructing and Understanding Knowledge ...
 
Flax ovum search-across_the_enterprise
Flax ovum search-across_the_enterpriseFlax ovum search-across_the_enterprise
Flax ovum search-across_the_enterprise
 
GraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-DevelopmentGraphQL - The new "Lingua Franca" for API-Development
GraphQL - The new "Lingua Franca" for API-Development
 
Scalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2OScalable and Automatic Machine Learning with H2O
Scalable and Automatic Machine Learning with H2O
 
ESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge GraphsESWC 2017 Tutorial Knowledge Graphs
ESWC 2017 Tutorial Knowledge Graphs
 
Intranet show and_tell_2010
Intranet show and_tell_2010Intranet show and_tell_2010
Intranet show and_tell_2010
 

En vedette

Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Accessopenminted_eu
 
Towards a European Research Information Infrastructure
Towards a European Research Information InfrastructureTowards a European Research Information Infrastructure
Towards a European Research Information InfrastructureOpenAIRE
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?openminted_eu
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?openminted_eu
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveopenminted_eu
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesopenminted_eu
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeopenminted_eu
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciencesopenminted_eu
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mineopenminted_eu
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK thesesopenminted_eu
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilitiesopenminted_eu
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Dataopenminted_eu
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Miningopenminted_eu
 

En vedette (13)

Text Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open AccessText Mining: the next data frontier. Beyond Open Access
Text Mining: the next data frontier. Beyond Open Access
 
Towards a European Research Information Infrastructure
Towards a European Research Information InfrastructureTowards a European Research Information Infrastructure
Towards a European Research Information Infrastructure
 
How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?How can repositories support the text mining of their content and why?
How can repositories support the text mining of their content and why?
 
The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?The Breakdown: What is OpenMinTeD?
The Breakdown: What is OpenMinTeD?
 
Experiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspectiveExperiences of Text Mining; the National Library of Austria perspective
Experiences of Text Mining; the National Library of Austria perspective
 
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiquesOpenMinTeD - Une infrastructure text-mining au service des scientifiques
OpenMinTeD - Une infrastructure text-mining au service des scientifiques
 
OpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledgeOpenMinTeD - Repositories in the centre of new scientific knowledge
OpenMinTeD - Repositories in the centre of new scientific knowledge
 
OpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social SciencesOpenMinted: It's Uses and Benefits for the Social Sciences
OpenMinted: It's Uses and Benefits for the Social Sciences
 
The Future is All Mine
The Future is All MineThe Future is All Mine
The Future is All Mine
 
Tentative steps in mining UK theses
Tentative steps in mining UK thesesTentative steps in mining UK theses
Tentative steps in mining UK theses
 
Jisc Text Mining Capabilities
Jisc Text Mining CapabilitiesJisc Text Mining Capabilities
Jisc Text Mining Capabilities
 
OpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of DataOpenMinTeD: Making Sense of Large Volumes of Data
OpenMinTeD: Making Sense of Large Volumes of Data
 
Legal issues Text and Data Mining
Legal issues Text and Data MiningLegal issues Text and Data Mining
Legal issues Text and Data Mining
 

Similaire à Text and Data Mining Challenges in Europe

Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPChristian Morbidoni
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and TechniquesBernhard Haslhofer
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldOpenSource Connections
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabSri Ambati
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsOpenSource Connections
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSSri Ambati
 
Big data berlin
Big data berlinBig data berlin
Big data berlinkammeyer
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...WARCnet
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSven Schlarb
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptxShree Shree
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Péter Király
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 

Similaire à Text and Data Mining Challenges in Europe (20)

Big Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLPBig Data Analytics course: Named Entities and Deep Learning for NLP
Big Data Analytics course: Named Entities and Deep Learning for NLP
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Open Data - Principles and Techniques
Open Data - Principles and TechniquesOpen Data - Principles and Techniques
Open Data - Principles and Techniques
 
Searching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data WorldSearching Chinese Patents Presentation at Enterprise Data World
Searching Chinese Patents Presentation at Enterprise Data World
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science LabScalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patents
 
Intro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWSIntro to Machine Learning with H2O and AWS
Intro to Machine Learning with H2O and AWS
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
Wednesday 6 May: Hand me the data! What you should know as a humanities resea...
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
HPCC Systems Engineering Summit: Community Use Case: Because Who Has Time for...
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/BelgiumSCAPE Presentation at the Elag2013 conference in Gent/Belgium
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)Nothing is created, nothing is lost, everything changes (ELAG, 2017)
Nothing is created, nothing is lost, everything changes (ELAG, 2017)
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 

Plus de openminted_eu

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDMopenminted_eu
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017openminted_eu
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncopenminted_eu
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...openminted_eu
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlandsopenminted_eu
 

Plus de openminted_eu (6)

Supporting the uptake of TDM
Supporting the uptake of TDMSupporting the uptake of TDM
Supporting the uptake of TDM
 
OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017OpenMinTeD, LIBER conference 2017
OpenMinTeD, LIBER conference 2017
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Seamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources syncSeamless access to the world's open access research papers via resources sync
Seamless access to the world's open access research papers via resources sync
 
Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...Webinar slides: Interoperability between resources involved in TDM at the lev...
Webinar slides: Interoperability between resources involved in TDM at the lev...
 
Text and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the NetherlandsText and Data Mining at the Royal Library in the Netherlands
Text and Data Mining at the Royal Library in the Netherlands
 

Dernier

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024Timothy Spann
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Dernier (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
April 2024 - NLIT Cloudera Real-Time LLM Streaming 2024
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

Text and Data Mining Challenges in Europe

  • 1. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Infrastructure crossroads Richard Eckart de Castilho UKP LAB Technische Universität Darmstadt ...and the way we walked them in dkpro
  • 2. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague PRESENTER Dr. Richard Eckart de Castilho •  Interoperability WP lead @ OpenMinTeD •  Technical Lead @ UKP •  Java developer •  Open source guy •  NLP software infrastructure researcher •  Apache UIMA developer •  DKPro person @i_am_rec https://github.com/reckart Ubiquitous Knowledge Processing Lab Technische Universität Darmstadt
  • 3. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Ubiquitous knowledge Processing LAB •  Argumentation Mining •  Language Technology for Digital Humanities •  Lexical-Semantic Resources &Algorithms •  Text Mining & Analytics •  Writing Assistance and Language Learning @UKPLab http://www.ukp.tu-darmstadt.de Prof. Dr. Iryna Gurevych Technische Universität Darmstadt
  • 4. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague DKPro – reuse not reinvent •  What? •  Collection of open-source projects related to NLP •  Community of communities •  Interoperability between projects •  Target group: programmers, researchers, application developers •  Why? •  Flexibility and control – liberal licensing and redistributable software •  Sustainability – open community not bound to specific grants •  Replicability – portable software distributed through repositories •  Usability – the the edge out of installation •  Projects •  DKPro Core – linguistic preprocessing, interoperable third-party tools •  DKPro TC – text classification experimentation suite •  UBY – unified semantic resource •  CSniper – integrated search and annotaton •  … https://github.io/dkpro
  • 5. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague … but why like this? … how else could it be done? Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
  • 6. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Analytics •  Analytics layer •  Analytics tools (tagger, parser, etc.) •  Interoperability layer •  Input/output conversion •  Tool wrappers •  Pivot data model •  Workflow layer •  Workflow descriptions •  Workflow engines •  UI layer •  Workflow editors •  Annotation editors •  Exploration / visualization Complete! Solution! Analytics stack
  • 7. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Automatic text analysis •  pragmatic •  Gain insight about a particular field of interest •  Investigate data •  Use latest data available •  Results relevant for the moment •  No need for reproducibility •  principled •  Interest in reproducibility •  Investigate methods •  Use a fixed data set •  Results should be reproducible
  • 8. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Manual text analysis •  pragmatic •  Collaborative analysis •  Get as much done as quickly as possible •  All see/edit the same data / annotations •  No means of measuring quality / single truth •  Principled •  Training data for supervised machine learning •  Evaluation of automatic methods •  Distributed analysis •  Guideline-driven process •  Multiple independent analyses/annotations •  Inter-annotator agreement as quality indicator •  Human in the loop •  Analytics make suggestions / guide human •  Human input guides analystics Human!Machine!
  • 9. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague deployment •  Distributed / static •  Service oriented •  High network traffic •  Running cost •  Risk of decay / limited availability of older versions •  More control to providers •  Localized / dynamic •  Cloud computing •  Reduced cost •  Data locality •  Scalability •  Large freedom choosing a version •  More control to users •  Gateways •  Make dynamic setup appear static •  Handle input/output and workflow management •  Walled garden vs. convenience Software! Repository! Gateway! Gateway!
  • 10. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague “openness” •  Open •  Liberal licensing •  Freedom to choose deployment •  Integrate custom resources/analytics •  Control to the user •  Not open/closed •  Copyleft/proprietary licensing •  Prescribed deployment •  Difficult to customize for the user •  Control to the provider
  • 11. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague A peek at the landscape Service-based •  ARGO* •  Pipeline builder, annotation editor •  Online platform accessible through gateway •  Internally dynamic deployment (afaik) •  Closed source •  Weblicht / Alveo / LAPPS •  Pipeline builder •  Online platform accessible through gateway •  Many services distributed over multiple locations/stakeholders •  Some offer access to non-public content/analytics •  Some are partially open source Software-based •  DKPro Core* / ClearTK •  Component collection •  Pipeline scripting / programming •  Repository-based •  Easy to deploy/embed anywhere •  Open source •  GATE workbench* •  Pipeline builder, annotation editor, +++ •  Desktop application •  GATE Cloud •  Open source •  …
  • 12. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague DKPro Core – Runnable example #!/usr/bin/env groovy @Grab(group='de.tudarmstadt.ukp.dkpro.core', module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl', version='1.5.0') import de.tudarmstadt.ukp.dkpro.core.opennlp.*; import org.apache.uima.fit.factory.JCasFactory; import org.apache.uima.fit.pipeline.SimplePipeline; import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*; import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*; import static org.apache.uima.fit.util.JCasUtil.*; import static org.apache.uima.fit.factory.AnalysisEngineFactory.*; def jcas = JCasFactory.createJCas(); jcas.documentText = "This is a test"; jcas.documentLanguage = "en"; SimplePipeline.runPipeline(jcas, createEngineDescription(OpenNlpSegmenter), createEngineDescription(OpenNlpPosTagger), createEngineDescription(OpenNlpParser, OpenNlpParser.PARAM_WRITE_PENN_TREE, true)); select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" } select(jcas, PennTree).each { println it.pennTree } Fetches all required! dependencies! No manual installation!! Input! Analytics pipeline.! Language-specific! resources fetched ! automatically! Output!
  • 13. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague DKPro Core – Runnable example #!/usr/bin/env groovy @Grab(group='de.tudarmstadt.ukp.dkpro.core', module='de.tudarmstadt.ukp.dkpro.core.opennlp-asl', version='1.5.0') import de.tudarmstadt.ukp.dkpro.core.opennlp.*; import org.apache.uima.fit.factory.JCasFactory; import org.apache.uima.fit.pipeline.SimplePipeline; import de.tudarmstadt.ukp.dkpro.core.api.segmentation.type.*; import de.tudarmstadt.ukp.dkpro.core.api.syntax.type.*; import static org.apache.uima.fit.util.JCasUtil.*; import static org.apache.uima.fit.factory.AnalysisEngineFactory.*; def jcas = JCasFactory.createJCas(); jcas.documentText = "This is a test"; jcas.documentLanguage = "en"; SimplePipeline.runPipeline(jcas, createEngineDescription(OpenNlpSegmenter), createEngineDescription(OpenNlpPosTagger), createEngineDescription(OpenNlpParser, OpenNlpParser.PARAM_WRITE_PENN_TREE, true)); select(jcas, Token).each { println "${it.coveredText} ${it.pos.posValue}" } select(jcas, PennTree).each { println it.pennTree } Fetches all required! dependencies! No manual installation!! Input! Analytics pipeline.! Language-specific! resources fetched ! automatically! Output! Why is this cool?! This is an actual running example!! Requires only ! JVM + Groovy (+ Internet connection)! Easy to parallelize / scale! Trivial to embed in applications! Trivial to wrap as a service!
  • 14. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Conclusion / Challenges •  Data is growing / analytics get more complex •  Need more powerful systems to process it •  Human in the loop •  Human interaction influences analytics and vice versa •  Need to move data and analytics around •  Often conflicts with interest in protection of investment •  Need interoperability •  To discover data, resources, and analytics •  To access data and resources •  To deploy analytics •  To retrieve and further use results
  • 15. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague What comes next? Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
  • 16. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague tomorrow @ the hague: interoperability Data Conversion! Analysis! Automatic Step /! Analysis Component /! Nested workflow! Human Annotation/ Human Correction! Resource repository! (Auxiliary Data)! Data! Source! Data! Sink! Provenance! WG1! WG2! WG3! WG4! Data Conversion! A P I! A P I! API! Software repository! API! ID / Version! ID / Version! New ID / Version! Desktop / Server! Cloud ! resource! | | | | | | | | | | | | | | | | ! Cluster! Portability / Scalability / Sustainability! Analysis! Service! API! Rights and restrictions aggregation!
  • 17. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague Thanks Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague
  • 18. Text and Data Mining in Europe: Defining the Challenges and Actions @ The Hague References •  Alveo http://alveo.edu.au/ •  Argo http://argo.nactem.ac.uk •  CLEARTK http://cleartk.github.io/cleartk/ •  DKPro https://dkpro.github.io •  Gate https://gate.ac.uk •  Lapps http://www.lappsgrid.org •  UIMA http://uima.apache.org •  Weblicht https://weblicht.sfs.uni-tuebingen.de/