SlideShare une entreprise Scribd logo
1  sur  72
Télécharger pour lire hors ligne
An Ecosystem for Linked Humanities Data
Rinke Hoekstra

Vrije Universiteit Amsterdam/University of Amsterdam

rinke.hoekstra@vu.nl



Albert Meroño-Peñuela, Kathrin Dentler, Auke Rijpma, Richard Zijdeman and Ivo Zandhuis
legenddatalegenddata
The Promise of Digital Humanities
The Promise of Digital Humanities
http://schoolofherring.com
http://science-all.com/fishing.html
The Problem of Digital Humanities
Pacific Barreleye, http://imgur.com/gallery/Mzyb5
(can rotate its eyes forwards or upwards to look through the transparent head to prey above)
http://www.asergeev.com/pictures/archives/compress/2012/1034/24.htm
The Cost of Data Preparation
Common Motifs in Scientific Workflows:
An Empirical Analysis
Daniel Garijo⇤, Pinar Alper †, Khalid Belhajjame†, Oscar Corcho⇤, Yolanda Gil‡, Carole Goble†
⇤Ontology Engineering Group, Universidad Polit´ecnica de Madrid. {dgarijo, ocorcho}@fi.upm.es
†School of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu
Abstract—While workflow technology has gained momentum
in the last decade as a means for specifying and enacting compu-
tational experiments in modern science, reusing and repurposing
existing workflows to build new scientific experiments is still a
daunting task. This is partly due to the difficulty that scientists
experience when attempting to understand existing workflows,
which contain several data preparation and adaptation steps in
addition to the scientifically significant analysis steps. One way
to tackle the understandability problem is through providing
abstractions that give a high-level view of activities undertaken
within workflows. As a first step towards abstractions, we report
in this paper on the results of a manual analysis performed over
a set of real-world scientific workflows from Taverna and Wings
systems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that are
observed in workflows (data oriented motifs), and ii) the different
manners in which activities are implemented within workflows
(workflow oriented motifs). These motifs can be useful to inform
workflow designers on the good and bad practices for workflow
development, to inform the design of automated tools for the
generation of workflow abstractions, etc.
I. INTRODUCTION
Scientific workflows have been increasingly used in the last
decade as an instrument for data intensive scientific analysis.
In these settings, workflows serve a dual function: first as
detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain
data item) and second as re-usable, executable artifacts for
data-intensive analysis. Workflows stitch together a variety
of data manipulation activities such as data movement, data
transformation or data visualization to serve the goals of the
scientific study. The stitching is realized by the constructs
made available by the workflow system used and is largely
shaped by the environment in which the system operates and
the function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]
serving several scientific disciplines. A workflow is a software
[14] and CrowdLabs [8] have made publishing and finding
workflows easier, but scientists still face the challenges of re-
use, which amounts to fully understanding and exploiting the
available workflows/fragments. One difficulty in understanding
workflows is their complex nature. A workflow may contain
several scientifically-significant analysis steps, combined with
various other data preparation activities, and in different
implementation styles depending on the environment and
context in which the workflow is executed. The difficulty in
understanding causes workflow developers to revert to starting
from scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientific
workflow development, we could gain insights on the creation
of understandable and more effectively re-usable workflows.
Specifically, we propose an analysis with the following objec-
tives:
1) To reverse-engineer the set of current practices in work-
flow development through an analysis of empirical evi-
dence.
2) To identify workflow abstractions that would facilitate
understandability and therefore effective re-use.
3) To detect potential information sources and heuristics
that can be used to inform the development of tools for
creating workflow abstractions.
In this paper we present the result of an empirical analysis
performed over 177 workflow descriptions from Taverna [10]
and Wings [3]. Based on this analysis, we propose a catalogue
of scientific workflow motifs. Motifs are provided through i)
a characterization of the kinds of data-oriented activities that
are carried out within workflows, which we refer to as data-
oriented motifs, and ii) a characterization of the different man-
ners in which those activity motifs are realized/implemented
within workflows, which we refer to as workflow-oriented
motifs. It is worth mentioning that, although important, motifs
Fig. 3. Distribution of Data-Oriented Motifs per domain
Fig. 3. Distribution of Data-Oriented Motifs per domain Fig. 5. Data Preparation Motifs in the Genomics Workflows
We do this repeatedly for the same datasets
Top Down: Big Micro Data(sets)
• North Atlantic Population Project (NAPP)
• Integrated Public Use Microdata Series (IPUMS)
• Mosaic

Top Down: Big Micro Data(sets)
• North Atlantic Population Project (NAPP)
• Integrated Public Use Microdata Series (IPUMS)
• Mosaic

• Only data slices can be downloaded
• Standardisation leads to loss of detail
• Results are not mutually compatible
• Large scale efforts are very expensive
Top Down: Big Micro Data(sets)
• North Atlantic Population Project (NAPP)
• Integrated Public Use Microdata Series (IPUMS)
• Mosaic

• Only data slices can be downloaded
• Standardisation leads to loss of detail
• Results are not mutually compatible
• Large scale efforts are very expensive
… and they do not solve the problem!
… the current workflow
… the current workflow
… the current workflow
Do adverse conditions (Great Depression) around birth or
early in life affect socioeconomic and health outcomes?
… the current workflow
Do adverse conditions (Great Depression) around birth or
early in life affect socioeconomic and health outcomes?
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the current workflow
Do adverse conditions (Great Depression) around birth or
early in life affect socioeconomic and health outcomes?
Dutch “Hunger-winter” studies (cf Lindeboom)
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the current workflow
Do adverse conditions (Great Depression) around birth or
early in life affect socioeconomic and health outcomes?
Thomasson and Fishback. 2014. “Hard Times in the Land of Plenty: The Effect on Income and
Disability Later in Life for People Born during the Great Depression.” Expl in Eco Hist 54: 64–78.
Dutch “Hunger-winter” studies (cf Lindeboom)
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the current workflow
bryr AGE OCCHISCO hiscocode hiscam gdppc
1870 21 98560 9-85.55 48.70 1694.525258
1870 21 99120 9-99.10 47.88 1694.525258
1873 18 53220 5-32.10 51.65 1841.878773
1870 21 13210 1-30.00 77.29 1694.525258
1873 18 54010 5-40.90 53.27 1841.878773
1874 17 61110 6-11.10 52.61 1853.715852
… the current workflow
bryr AGE OCCHISCO hiscocode hiscam gdppc
1870 21 98560 9-85.55 48.70 1694.525258
1870 21 99120 9-99.10 47.88 1694.525258
1873 18 53220 5-32.10 51.65 1841.878773
1870 21 13210 1-30.00 77.29 1694.525258
1873 18 54010 5-40.90 53.27 1841.878773
1874 17 61110 6-11.10 52.61 1853.715852
1. Gather and enter own data
2. Find data on multiple repositories
3. Download
4. Clean and reshape
5. Merge
6. Clean and reshape…
7. Analyse
… the current workflow
bryr AGE OCCHISCO hiscocode hiscam gdppc
1870 21 98560 9-85.55 48.70 1694.525258
1870 21 99120 9-99.10 47.88 1694.525258
1873 18 53220 5-32.10 51.65 1841.878773
1870 21 13210 1-30.00 77.29 1694.525258
1873 18 54010 5-40.90 53.27 1841.878773
1874 17 61110 6-11.10 52.61 1853.715852
Link occupations in census micro data…
… to standardised occupations …
… to appropriate occupational status scores …
… to country level GDP at birth year
1. Gather and enter own data
2. Find data on multiple repositories
3. Download
4. Clean and reshape
5. Merge
6. Clean and reshape…
7. Analyse
… the current workflow
… the current workflow
… the current workflow
… the current workflow
… the current workflow
Not a very complicated research question…
… the current workflow
Not a very complicated research question…
… only one sample …
… the current workflow
Not a very complicated research question…
… only one sample …
What if we want to answer more involved questions?
"Studies that have plotted data set size against the number of data sources reliably uncover a skewed
distribution. Well-organized big science efforts featuring homogenous, well-organized data represent only a
small proportion of the total data collected by scientists. A very large proportion of scientific data falls in the
long-tail of the distribution, with numerous small independent research efforts yielding a rich variety of specialty
research data sets. The extreme right portion of the long tail includes data that are unpublished; such as siloed
databases, null findings, laboratory notes, animal care records, etc. These dark data hold a potential wealth
of knowledge but are often inaccessible to the outside world."
In the fast moving data analysis industry, real-time traceability
could help identify supply chain, brand and repetitional risks
Our Goals
• Empower individual researchers to
• Code and harmonize individual datasets according to best practices of the community
(e.g. HISCO, SDMX, World Bank, etc.) or against their colleagues
• Share their own code lists with fellow researchers
• Align code lists across datasets
• Publish their standards-compliant datasets
• Perform analyses across multiple datasets at the same time
• While tracking provenance of both data and analyses
A Linked Data Handbook for Historians? Nah…
Exists
Frequency Table
Variable does not yet existVariables
Mappings
Publish
Augment
Includes both external Linked Data and
standard vocabularies, e.g. World Bank
External (Meta) Data
Existing Variables
& Codes
Provenance tracking of all data
External Datasets
Structured Data Hub
legenddatalegenddata
Exists
Frequency Table
Variable does not yet existVariables
Mappings
Publish
Augment
Includes both external Linked Data and
standard vocabularies, e.g. World Bank
External (Meta) Data
Existing Variables
& Codes
Provenance tracking of all data
External Datasets
Structured Data Hub
legenddatalegenddata
Linked Statistical Dimensions
Dedicated Pipelines
NAPP
surname age occupation sex
Fumes 20 cigar maker female
Bridges 45 civil engineer female
Moves 17 dancer male
surname age occupation sex
Fumes 20 cigar maker female
Bridges 45 civil engineer female
Moves 17 dancer male
achternaam leeftijd beroep geslacht
Fumes 20 sigarenmaker v
Bridges 45 ingenieur v
Moves 17 danser m
surname age occupation sex
Fumes 20 cigar maker female
Bridges 45 civil engineer female
Moves 17 dancer male
achternaam leeftijd beroep geslacht
Fumes 20 sigarenmaker v
Bridges 45 ingenieur v
Moves 17 danser m
surname age occupation sex
Fumes 20 cigar maker female
Bridges 45 civil engineer female
Moves 17 dancer male
achternaam leeftijd beroep geslacht
Fumes 20 sigarenmaker v
Bridges 45 ingenieur v
Moves 17 danser m
surname age occupation sex
Fumes 20 cigar maker female
Bridges 45 civil engineer female
Moves 17 dancer male
achternaam leeftijd beroep geslacht
Fumes 20 sigarenmaker v
Bridges 45 ingenieur v
Moves 17 danser m
achternaam leeftijd beroep sdmx:Sex
Fumes 20 sigarenmaker sdmx:F
Bridges 45 ingenieur sdmx:F
Moves 17 danser sdmx:M
surname age occupation sdmx:Sex
Fumes 20 cigar maker sdmx:F
Bridges 45 civil engineer sdmx:F
Moves 17 dancer sdmx:M
surname age occupation sex
Fumes 20 cigar maker female
Bridges 45 civil engineer female
Moves 17 dancer male
achternaam leeftijd beroep geslacht
Fumes 20 sigarenmaker v
Bridges 45 ingenieur v
Moves 17 danser m
achternaam leeftijd beroep sdmx:Sex
Fumes 20 sigarenmaker sdmx:F
Bridges 45 ingenieur sdmx:F
Moves 17 danser sdmx:M
surname age occupation sdmx:Sex
Fumes 20 cigar maker sdmx:F
Bridges 45 civil engineer sdmx:F
Moves 17 dancer sdmx:M
Utrecht 1829 Utrecht 1839
Utrecht 1829 Utrecht 1839
An ecosystem is a community of living organisms in conjunction
with the nonliving components of their environment (things like
air, water and mineral soil), interacting as a system.

- Wikipedia
… the current workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
●
●
●
●
●
●
●
●
●
●
●
●●
●
20 30 40 50 60 70
3.984.004.024.04
Canada
age
log(hiscam)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6.8 7.0 7.2 7.4
3.984.004.024.04 Canada
log(gdppc)
log(hiscam)
log(hiscam) log(hiscam)
(Intercept) 4.420*** 3.616***
(0,039) (0,134)
log(gdppc) -0.058*** 0.036**
(0,005) (0,018)
I(age^2) -0.000***
0,000
age 0.007***
0,000
R2 0,003 0,013
Adj. R2 0,003 0,012
Num. obs. 36201 36201
RMSE 0,142 0,142
… the current workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
●
●
●
●
●
●
●
●
●
●
●
●●
●
20 30 40 50 60 70
3.984.004.024.04
Canada
age
log(hiscam)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6.8 7.0 7.2 7.4
3.984.004.024.04 Canada
log(gdppc)
log(hiscam)
log(hiscam) log(hiscam)
(Intercept) 4.420*** 3.616***
(0,039) (0,134)
log(gdppc) -0.058*** 0.036**
(0,005) (0,018)
I(age^2) -0.000***
0,000
age 0.007***
0,000
R2 0,003 0,013
Adj. R2 0,003 0,012
Num. obs. 36201 36201
RMSE 0,142 0,142
Identify locally, extrapolate globally?
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
1. Discover data on datalegend
2. Explore
3. Build or reuse a query
4. Analyse
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
1. Discover data on datalegend
2. Explore
3. Build or reuse a query
4. Analyse
http://data.socialhistory.org/resource/napp/OCCHISCO/54020
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
1. Discover data on datalegend
2. Explore
3. Build or reuse a query
4. Analyse
http://data.socialhistory.org/resource/napp/OCCHISCO/54020
http://yasgui.org
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
1. Discover data on datalegend
2. Explore
3. Build or reuse a query
4. Analyse
http://data.socialhistory.org/resource/napp/OCCHISCO/54020
http://yasgui.org
http://grlc.clariah-sdh.eculture.labs.vu.nl/clariah/wp4-queries/api-docs
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the new workflow
canada sweden
(Intercept) 3.616*** 4.430***
(0,134) (0,033)
log(gdppc) 0.036** -0.070***
(0,018) (0,004)
I(age^2) -0.000*** -0.000***
0,000 0,000
age 0.007*** 0.001***
0,000 0,000
R2 0,013 0,021
Adj. R2 0,012 0,021
Num. obs. 36201 275127
RMSE 0,142 0,102
●
●
●
●
●
●
●
●
●
●
●
●●
●
20 30 40 50 60 70
3.984.004.024.04
Canada
age
log(hiscam)
●
●
●
●
●
●
●
●
●
●
●
●
●
●
6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5
3.984.004.024.04
Canada
log(gdppc)
log(hiscam)
●
●
●
●●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●●
●
●●
●●●●
●
●
●
●
●●●●
●
●
●
●●●
●
●●●
●
●
●
●
●
●●
●
●
20 30 40 50 60 70
3.903.943.984.02
Sweden
age
log(hiscam)
●
●●●
●
●●●
●●
●
●●●
●
●
●
●●●●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
6.8 6.9 7.0 7.1 7.2 7.3
3.903.943.984.02
Sweden
log(gdppc)
log(hiscam)
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
… the new workflow
Does GDP per capita at birth year negatively
affect occupational status in later life?
Discussion
• Data-driven research in the humanities is too expensive and confined to single
datasets.
• Linked Data can be a solution, but historians cannot be expected to change
their current workflow, or craft RDF by hand.
• QBer allows historians to upload their data, connect it to earlier work by peers,
while preserving provenance of their steps.
• The inspector view gives instant feedback of the impact on the network
• Standard SPARQL queries are converted to APIs through grlc.
• Research questions can thus be shared, replicated and applied to new data.
• This gives rise to different roles of researchers in our ecosystem
legenddatalegenddata
Discussion
• Data-driven research in the humanities is too expensive and confined to single
datasets.
• Linked Data can be a solution, but historians cannot be expected to change
their current workflow, or craft RDF by hand.
• QBer allows historians to upload their data, connect it to earlier work by peers,
while preserving provenance of their steps.
• The inspector view gives instant feedback of the impact on the network
• Standard SPARQL queries are converted to APIs through grlc.
• Research questions can thus be shared, replicated and applied to new data.
• This gives rise to different roles of researchers in our ecosystem
legenddatalegenddata
An Ecosystem for Linked Humanities Data

Contenu connexe

Tendances

Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Paul Groth
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text Paul Groth
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsPaul Groth
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data ShowcasingPaul Groth
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good booklahorisher
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflowsSSSW
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for SciencePaul Groth
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph FuturesPaul Groth
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?Paul Groth
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Sören Auer
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chainPaul Groth
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataPaul Groth
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewAngelo Salatino
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020Pistoia Alliance
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as CommoditiesMathieu d'Aquin
 

Tendances (20)

Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 
Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.Data Communities - reusable data in and outside your organization.
Data Communities - reusable data in and outside your organization.
 
End-to-End Learning for Answering Structured Queries Directly over Text
End-to-End Learning for  Answering Structured Queries Directly over Text End-to-End Learning for  Answering Structured Queries Directly over Text
End-to-End Learning for Answering Structured Queries Directly over Text
 
From Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge GraphsFrom Text to Data to the World: The Future of Knowledge Graphs
From Text to Data to the World: The Future of Knowledge Graphs
 
From Data Search to Data Showcasing
From Data Search to Data ShowcasingFrom Data Search to Data Showcasing
From Data Search to Data Showcasing
 
Oop principles a good book
Oop principles a good bookOop principles a good book
Oop principles a good book
 
Tutorial Data Management and workflows
Tutorial Data Management and workflowsTutorial Data Management and workflows
Tutorial Data Management and workflows
 
The Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for ScienceThe Challenge of Deeper Knowledge Graphs for Science
The Challenge of Deeper Knowledge Graphs for Science
 
Knowledge Graph Futures
Knowledge Graph FuturesKnowledge Graph Futures
Knowledge Graph Futures
 
Cognitive data
Cognitive dataCognitive data
Cognitive data
 
More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?More ways of symbol grounding for knowledge graphs?
More ways of symbol grounding for knowledge graphs?
 
Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...Describing Scholarly Contributions semantically with the Open Research Knowle...
Describing Scholarly Contributions semantically with the Open Research Knowle...
 
The need for a transparent data supply chain
The need for a transparent data supply chainThe need for a transparent data supply chain
The need for a transparent data supply chain
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Scientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an OverviewScientific Knowledge Graphs: an Overview
Scientific Knowledge Graphs: an Overview
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
 
Data and Knowledge as Commodities
Data and Knowledge as CommoditiesData and Knowledge as Commodities
Data and Knowledge as Commodities
 

Similaire à An Ecosystem for Linked Humanities Data

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Richard Zijdeman
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Richard Zijdeman
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4CLARIAH
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansBram van den Hout
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyRichard Zijdeman
 
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfessionGary Rector
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_scienceMartina Pugliese
 
Data Science definition
Data Science definitionData Science definition
Data Science definitionCarloLauro1
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data ScienceCarlo Lauro
 
Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017Adrian Mladenic Grobelnik
 
Design Science in Information Systems
Design Science in Information SystemsDesign Science in Information Systems
Design Science in Information SystemsSergej Lugovic
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Lauri Eloranta
 
Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchPeter van Heusden
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Future of Scholarly Communications
Future of Scholarly CommunicationsFuture of Scholarly Communications
Future of Scholarly CommunicationsDavid De Roure
 
Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402vrij
 
An Introduction into Philosophy of Science for Software Engineers
An Introduction into Philosophy of Science for Software Engineers An Introduction into Philosophy of Science for Software Engineers
An Introduction into Philosophy of Science for Software Engineers Daniel Mendez
 
Introductory Lecture Information Systems 2011.12
Introductory Lecture Information Systems 2011.12Introductory Lecture Information Systems 2011.12
Introductory Lecture Information Systems 2011.12Dr Mariann Hardey
 

Similaire à An Ecosystem for Linked Humanities Data (20)

Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven Linked Data: Een extra ontstluitingslaag op archieven
Linked Data: Een extra ontstluitingslaag op archieven
 
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
Linked Open Data: Combining Data for the Social Sciences and Humanities (and ...
 
2016 05-20-clariah-wp4
2016 05-20-clariah-wp42016 05-20-clariah-wp4
2016 05-20-clariah-wp4
 
Wehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historiansWehc - Linked Data for Economic-Social historians
Wehc - Linked Data for Economic-Social historians
 
Data legend dh_benelux_2017.key
Data legend dh_benelux_2017.keyData legend dh_benelux_2017.key
Data legend dh_benelux_2017.key
 
DevelopingDataScienceProfession
DevelopingDataScienceProfessionDevelopingDataScienceProfession
DevelopingDataScienceProfession
 
from_physics_to_data_science
from_physics_to_data_sciencefrom_physics_to_data_science
from_physics_to_data_science
 
Data Science definition
Data Science definitionData Science definition
Data Science definition
 
Let's talk about Data Science
Let's talk about Data ScienceLet's talk about Data Science
Let's talk about Data Science
 
Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017Predicting the “Next Big Thing” in Science - #scichallenge2017
Predicting the “Next Big Thing” in Science - #scichallenge2017
 
Design Science in Information Systems
Design Science in Information SystemsDesign Science in Information Systems
Design Science in Information Systems
 
Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1Introduction to Computational Social Science - Lecture 1
Introduction to Computational Social Science - Lecture 1
 
Scientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible researchScientific Workflow Systems for accessible, reproducible research
Scientific Workflow Systems for accessible, reproducible research
 
Data and science
Data and scienceData and science
Data and science
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Future of Scholarly Communications
Future of Scholarly CommunicationsFuture of Scholarly Communications
Future of Scholarly Communications
 
Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402Gridforum David De Roure Newe Science 20080402
Gridforum David De Roure Newe Science 20080402
 
An Introduction into Philosophy of Science for Software Engineers
An Introduction into Philosophy of Science for Software Engineers An Introduction into Philosophy of Science for Software Engineers
An Introduction into Philosophy of Science for Software Engineers
 
Introductory Lecture Information Systems 2011.12
Introductory Lecture Information Systems 2011.12Introductory Lecture Information Systems 2011.12
Introductory Lecture Information Systems 2011.12
 
Deep learning
Deep learningDeep learning
Deep learning
 

Plus de Rinke Hoekstra

QBer - Connect your data to the cloud
QBer - Connect your data to the cloudQBer - Connect your data to the cloud
QBer - Connect your data to the cloudRinke Hoekstra
 
Jurix 2014 welcome presentation
Jurix 2014 welcome presentationJurix 2014 welcome presentation
Jurix 2014 welcome presentationRinke Hoekstra
 
Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataRinke Hoekstra
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerRinke Hoekstra
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Rinke Hoekstra
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataRinke Hoekstra
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for ResearchRinke Hoekstra
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of DataRinke Hoekstra
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckRinke Hoekstra
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie ExtractieRinke Hoekstra
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesRinke Hoekstra
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataRinke Hoekstra
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of DataRinke Hoekstra
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)Rinke Hoekstra
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design PatternsRinke Hoekstra
 
Publicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataPublicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataRinke Hoekstra
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsRinke Hoekstra
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitRinke Hoekstra
 

Plus de Rinke Hoekstra (20)

QBer - Connect your data to the cloud
QBer - Connect your data to the cloudQBer - Connect your data to the cloud
QBer - Connect your data to the cloud
 
Jurix 2014 welcome presentation
Jurix 2014 welcome presentationJurix 2014 welcome presentation
Jurix 2014 welcome presentation
 
Linkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research DataLinkitup: Link Discovery for Research Data
Linkitup: Link Discovery for Research Data
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
 
COMMIT/VIVO
COMMIT/VIVOCOMMIT/VIVO
COMMIT/VIVO
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering Bottleneck
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie Extractie
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)
 
Making Sense of Design Patterns
Making Sense of Design PatternsMaking Sense of Design Patterns
Making Sense of Design Patterns
 
Publicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids DataPublicatie van Linked Open Overheids Data
Publicatie van Linked Open Overheids Data
 
ODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the NetherlandsODaF 2010 Linked Data in the Netherlands
ODaF 2010 Linked Data in the Netherlands
 
Overzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site VisitOverzicht BEST Project - NWO Site Visit
Overzicht BEST Project - NWO Site Visit
 

Dernier

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsNurulAfiqah307317
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Dernier (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

An Ecosystem for Linked Humanities Data

  • 1. An Ecosystem for Linked Humanities Data Rinke Hoekstra
 Vrije Universiteit Amsterdam/University of Amsterdam
 rinke.hoekstra@vu.nl
 
 Albert Meroño-Peñuela, Kathrin Dentler, Auke Rijpma, Richard Zijdeman and Ivo Zandhuis legenddatalegenddata
  • 2. The Promise of Digital Humanities
  • 3. The Promise of Digital Humanities
  • 6.
  • 7. The Problem of Digital Humanities Pacific Barreleye, http://imgur.com/gallery/Mzyb5 (can rotate its eyes forwards or upwards to look through the transparent head to prey above)
  • 9.
  • 10.
  • 11. The Cost of Data Preparation Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo⇤, Pinar Alper †, Khalid Belhajjame†, Oscar Corcho⇤, Yolanda Gil‡, Carole Goble† ⇤Ontology Engineering Group, Universidad Polit´ecnica de Madrid. {dgarijo, ocorcho}@fi.upm.es †School of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk ‡Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu Abstract—While workflow technology has gained momentum in the last decade as a means for specifying and enacting compu- tational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc. I. INTRODUCTION Scientific workflows have been increasingly used in the last decade as an instrument for data intensive scientific analysis. In these settings, workflows serve a dual function: first as detailed documentation of the method (i. e. the input sources and processing steps taken for the derivation of a certain data item) and second as re-usable, executable artifacts for data-intensive analysis. Workflows stitch together a variety of data manipulation activities such as data movement, data transformation or data visualization to serve the goals of the scientific study. The stitching is realized by the constructs made available by the workflow system used and is largely shaped by the environment in which the system operates and the function undertaken by the workflow. A variety of workflow systems are in use [10] [3] [7] [2] serving several scientific disciplines. A workflow is a software [14] and CrowdLabs [8] have made publishing and finding workflows easier, but scientists still face the challenges of re- use, which amounts to fully understanding and exploiting the available workflows/fragments. One difficulty in understanding workflows is their complex nature. A workflow may contain several scientifically-significant analysis steps, combined with various other data preparation activities, and in different implementation styles depending on the environment and context in which the workflow is executed. The difficulty in understanding causes workflow developers to revert to starting from scratch rather than re-using existing fragments. Through an analysis of the current practices in scientific workflow development, we could gain insights on the creation of understandable and more effectively re-usable workflows. Specifically, we propose an analysis with the following objec- tives: 1) To reverse-engineer the set of current practices in work- flow development through an analysis of empirical evi- dence. 2) To identify workflow abstractions that would facilitate understandability and therefore effective re-use. 3) To detect potential information sources and heuristics that can be used to inform the development of tools for creating workflow abstractions. In this paper we present the result of an empirical analysis performed over 177 workflow descriptions from Taverna [10] and Wings [3]. Based on this analysis, we propose a catalogue of scientific workflow motifs. Motifs are provided through i) a characterization of the kinds of data-oriented activities that are carried out within workflows, which we refer to as data- oriented motifs, and ii) a characterization of the different man- ners in which those activity motifs are realized/implemented within workflows, which we refer to as workflow-oriented motifs. It is worth mentioning that, although important, motifs Fig. 3. Distribution of Data-Oriented Motifs per domain Fig. 3. Distribution of Data-Oriented Motifs per domain Fig. 5. Data Preparation Motifs in the Genomics Workflows
  • 12. We do this repeatedly for the same datasets
  • 13. Top Down: Big Micro Data(sets) • North Atlantic Population Project (NAPP) • Integrated Public Use Microdata Series (IPUMS) • Mosaic

  • 14. Top Down: Big Micro Data(sets) • North Atlantic Population Project (NAPP) • Integrated Public Use Microdata Series (IPUMS) • Mosaic
 • Only data slices can be downloaded • Standardisation leads to loss of detail • Results are not mutually compatible • Large scale efforts are very expensive
  • 15. Top Down: Big Micro Data(sets) • North Atlantic Population Project (NAPP) • Integrated Public Use Microdata Series (IPUMS) • Mosaic
 • Only data slices can be downloaded • Standardisation leads to loss of detail • Results are not mutually compatible • Large scale efforts are very expensive … and they do not solve the problem!
  • 16. … the current workflow
  • 17. … the current workflow
  • 18. … the current workflow Do adverse conditions (Great Depression) around birth or early in life affect socioeconomic and health outcomes?
  • 19. … the current workflow Do adverse conditions (Great Depression) around birth or early in life affect socioeconomic and health outcomes? Does GDP per capita at birth year negatively affect occupational status in later life?
  • 20. … the current workflow Do adverse conditions (Great Depression) around birth or early in life affect socioeconomic and health outcomes? Dutch “Hunger-winter” studies (cf Lindeboom) Does GDP per capita at birth year negatively affect occupational status in later life?
  • 21. … the current workflow Do adverse conditions (Great Depression) around birth or early in life affect socioeconomic and health outcomes? Thomasson and Fishback. 2014. “Hard Times in the Land of Plenty: The Effect on Income and Disability Later in Life for People Born during the Great Depression.” Expl in Eco Hist 54: 64–78. Dutch “Hunger-winter” studies (cf Lindeboom) Does GDP per capita at birth year negatively affect occupational status in later life?
  • 22. … the current workflow bryr AGE OCCHISCO hiscocode hiscam gdppc 1870 21 98560 9-85.55 48.70 1694.525258 1870 21 99120 9-99.10 47.88 1694.525258 1873 18 53220 5-32.10 51.65 1841.878773 1870 21 13210 1-30.00 77.29 1694.525258 1873 18 54010 5-40.90 53.27 1841.878773 1874 17 61110 6-11.10 52.61 1853.715852
  • 23. … the current workflow bryr AGE OCCHISCO hiscocode hiscam gdppc 1870 21 98560 9-85.55 48.70 1694.525258 1870 21 99120 9-99.10 47.88 1694.525258 1873 18 53220 5-32.10 51.65 1841.878773 1870 21 13210 1-30.00 77.29 1694.525258 1873 18 54010 5-40.90 53.27 1841.878773 1874 17 61110 6-11.10 52.61 1853.715852 1. Gather and enter own data 2. Find data on multiple repositories 3. Download 4. Clean and reshape 5. Merge 6. Clean and reshape… 7. Analyse
  • 24. … the current workflow bryr AGE OCCHISCO hiscocode hiscam gdppc 1870 21 98560 9-85.55 48.70 1694.525258 1870 21 99120 9-99.10 47.88 1694.525258 1873 18 53220 5-32.10 51.65 1841.878773 1870 21 13210 1-30.00 77.29 1694.525258 1873 18 54010 5-40.90 53.27 1841.878773 1874 17 61110 6-11.10 52.61 1853.715852 Link occupations in census micro data… … to standardised occupations … … to appropriate occupational status scores … … to country level GDP at birth year 1. Gather and enter own data 2. Find data on multiple repositories 3. Download 4. Clean and reshape 5. Merge 6. Clean and reshape… 7. Analyse
  • 25. … the current workflow
  • 26. … the current workflow
  • 27. … the current workflow
  • 28. … the current workflow
  • 29. … the current workflow Not a very complicated research question…
  • 30. … the current workflow Not a very complicated research question… … only one sample …
  • 31. … the current workflow Not a very complicated research question… … only one sample … What if we want to answer more involved questions?
  • 32. "Studies that have plotted data set size against the number of data sources reliably uncover a skewed distribution. Well-organized big science efforts featuring homogenous, well-organized data represent only a small proportion of the total data collected by scientists. A very large proportion of scientific data falls in the long-tail of the distribution, with numerous small independent research efforts yielding a rich variety of specialty research data sets. The extreme right portion of the long tail includes data that are unpublished; such as siloed databases, null findings, laboratory notes, animal care records, etc. These dark data hold a potential wealth of knowledge but are often inaccessible to the outside world."
  • 33. In the fast moving data analysis industry, real-time traceability could help identify supply chain, brand and repetitional risks
  • 34. Our Goals • Empower individual researchers to • Code and harmonize individual datasets according to best practices of the community (e.g. HISCO, SDMX, World Bank, etc.) or against their colleagues • Share their own code lists with fellow researchers • Align code lists across datasets • Publish their standards-compliant datasets • Perform analyses across multiple datasets at the same time • While tracking provenance of both data and analyses
  • 35. A Linked Data Handbook for Historians? Nah…
  • 36. Exists Frequency Table Variable does not yet existVariables Mappings Publish Augment Includes both external Linked Data and standard vocabularies, e.g. World Bank External (Meta) Data Existing Variables & Codes Provenance tracking of all data External Datasets Structured Data Hub legenddatalegenddata
  • 37. Exists Frequency Table Variable does not yet existVariables Mappings Publish Augment Includes both external Linked Data and standard vocabularies, e.g. World Bank External (Meta) Data Existing Variables & Codes Provenance tracking of all data External Datasets Structured Data Hub legenddatalegenddata Linked Statistical Dimensions
  • 39.
  • 40. surname age occupation sex Fumes 20 cigar maker female Bridges 45 civil engineer female Moves 17 dancer male
  • 41. surname age occupation sex Fumes 20 cigar maker female Bridges 45 civil engineer female Moves 17 dancer male achternaam leeftijd beroep geslacht Fumes 20 sigarenmaker v Bridges 45 ingenieur v Moves 17 danser m
  • 42. surname age occupation sex Fumes 20 cigar maker female Bridges 45 civil engineer female Moves 17 dancer male achternaam leeftijd beroep geslacht Fumes 20 sigarenmaker v Bridges 45 ingenieur v Moves 17 danser m
  • 43. surname age occupation sex Fumes 20 cigar maker female Bridges 45 civil engineer female Moves 17 dancer male achternaam leeftijd beroep geslacht Fumes 20 sigarenmaker v Bridges 45 ingenieur v Moves 17 danser m
  • 44. surname age occupation sex Fumes 20 cigar maker female Bridges 45 civil engineer female Moves 17 dancer male achternaam leeftijd beroep geslacht Fumes 20 sigarenmaker v Bridges 45 ingenieur v Moves 17 danser m achternaam leeftijd beroep sdmx:Sex Fumes 20 sigarenmaker sdmx:F Bridges 45 ingenieur sdmx:F Moves 17 danser sdmx:M surname age occupation sdmx:Sex Fumes 20 cigar maker sdmx:F Bridges 45 civil engineer sdmx:F Moves 17 dancer sdmx:M
  • 45. surname age occupation sex Fumes 20 cigar maker female Bridges 45 civil engineer female Moves 17 dancer male achternaam leeftijd beroep geslacht Fumes 20 sigarenmaker v Bridges 45 ingenieur v Moves 17 danser m achternaam leeftijd beroep sdmx:Sex Fumes 20 sigarenmaker sdmx:F Bridges 45 ingenieur sdmx:F Moves 17 danser sdmx:M surname age occupation sdmx:Sex Fumes 20 cigar maker sdmx:F Bridges 45 civil engineer sdmx:F Moves 17 dancer sdmx:M
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
  • 53.
  • 54.
  • 55.
  • 56. An ecosystem is a community of living organisms in conjunction with the nonliving components of their environment (things like air, water and mineral soil), interacting as a system. - Wikipedia
  • 57. … the current workflow Does GDP per capita at birth year negatively affect occupational status in later life? ● ● ● ● ● ● ● ● ● ● ● ●● ● 20 30 40 50 60 70 3.984.004.024.04 Canada age log(hiscam) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6.8 7.0 7.2 7.4 3.984.004.024.04 Canada log(gdppc) log(hiscam) log(hiscam) log(hiscam) (Intercept) 4.420*** 3.616*** (0,039) (0,134) log(gdppc) -0.058*** 0.036** (0,005) (0,018) I(age^2) -0.000*** 0,000 age 0.007*** 0,000 R2 0,003 0,013 Adj. R2 0,003 0,012 Num. obs. 36201 36201 RMSE 0,142 0,142
  • 58. … the current workflow Does GDP per capita at birth year negatively affect occupational status in later life? ● ● ● ● ● ● ● ● ● ● ● ●● ● 20 30 40 50 60 70 3.984.004.024.04 Canada age log(hiscam) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6.8 7.0 7.2 7.4 3.984.004.024.04 Canada log(gdppc) log(hiscam) log(hiscam) log(hiscam) (Intercept) 4.420*** 3.616*** (0,039) (0,134) log(gdppc) -0.058*** 0.036** (0,005) (0,018) I(age^2) -0.000*** 0,000 age 0.007*** 0,000 R2 0,003 0,013 Adj. R2 0,003 0,012 Num. obs. 36201 36201 RMSE 0,142 0,142 Identify locally, extrapolate globally?
  • 59. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life?
  • 60. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life? 1. Discover data on datalegend 2. Explore 3. Build or reuse a query 4. Analyse
  • 61. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life? 1. Discover data on datalegend 2. Explore 3. Build or reuse a query 4. Analyse http://data.socialhistory.org/resource/napp/OCCHISCO/54020
  • 62. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life? 1. Discover data on datalegend 2. Explore 3. Build or reuse a query 4. Analyse http://data.socialhistory.org/resource/napp/OCCHISCO/54020 http://yasgui.org
  • 63. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life? 1. Discover data on datalegend 2. Explore 3. Build or reuse a query 4. Analyse http://data.socialhistory.org/resource/napp/OCCHISCO/54020 http://yasgui.org http://grlc.clariah-sdh.eculture.labs.vu.nl/clariah/wp4-queries/api-docs
  • 64. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life?
  • 65. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life?
  • 66. … the new workflow canada sweden (Intercept) 3.616*** 4.430*** (0,134) (0,033) log(gdppc) 0.036** -0.070*** (0,018) (0,004) I(age^2) -0.000*** -0.000*** 0,000 0,000 age 0.007*** 0.001*** 0,000 0,000 R2 0,013 0,021 Adj. R2 0,012 0,021 Num. obs. 36201 275127 RMSE 0,142 0,102 ● ● ● ● ● ● ● ● ● ● ● ●● ● 20 30 40 50 60 70 3.984.004.024.04 Canada age log(hiscam) ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6.8 6.9 7.0 7.1 7.2 7.3 7.4 7.5 3.984.004.024.04 Canada log(gdppc) log(hiscam) ● ● ● ●● ● ● ● ●●●● ●● ● ● ● ● ● ● ● ● ●● ● ●● ●●●● ● ● ● ● ●●●● ● ● ● ●●● ● ●●● ● ● ● ● ● ●● ● ● 20 30 40 50 60 70 3.903.943.984.02 Sweden age log(hiscam) ● ●●● ● ●●● ●● ● ●●● ● ● ● ●●●● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 6.8 6.9 7.0 7.1 7.2 7.3 3.903.943.984.02 Sweden log(gdppc) log(hiscam) Does GDP per capita at birth year negatively affect occupational status in later life?
  • 67. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life?
  • 68. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life?
  • 69. … the new workflow Does GDP per capita at birth year negatively affect occupational status in later life?
  • 70. Discussion • Data-driven research in the humanities is too expensive and confined to single datasets. • Linked Data can be a solution, but historians cannot be expected to change their current workflow, or craft RDF by hand. • QBer allows historians to upload their data, connect it to earlier work by peers, while preserving provenance of their steps. • The inspector view gives instant feedback of the impact on the network • Standard SPARQL queries are converted to APIs through grlc. • Research questions can thus be shared, replicated and applied to new data. • This gives rise to different roles of researchers in our ecosystem legenddatalegenddata
  • 71. Discussion • Data-driven research in the humanities is too expensive and confined to single datasets. • Linked Data can be a solution, but historians cannot be expected to change their current workflow, or craft RDF by hand. • QBer allows historians to upload their data, connect it to earlier work by peers, while preserving provenance of their steps. • The inspector view gives instant feedback of the impact on the network • Standard SPARQL queries are converted to APIs through grlc. • Research questions can thus be shared, replicated and applied to new data. • This gives rise to different roles of researchers in our ecosystem legenddatalegenddata