SlideShare une entreprise Scribd logo
1  sur  62
Télécharger pour lire hors ligne
2 Semantics
Datato
From Data

Semantics for Scientific Data Publishers

linkitup

Link Discovery for Research Data
Rinke Hoekstra and Paul Groth

Network Insitute, VU University Amsterdam

Law Faculty, University of Amsterdam
★

★

Linkitup - Link Discovery for Research Data by Rinke Hoekstra

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
2 Semantics
Datato
From Data

Semantics for Scientific Data Publishers

linkitup

Link Discovery for Research Data
Rinke Hoekstra and Paul Groth

Network Insitute, VU University Amsterdam

Law Faculty, University of Amsterdam
★

★

How to share, publish, access, analyse, interpret and reuse data?

Linkitup - Link Discovery for Research Data by Rinke Hoekstra

Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
1010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101
1011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111
0101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001
1101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010
1101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011
1010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100
0110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101
0110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101
0101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010
1011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110
0011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010
0010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101
0101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011
1001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101
1001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010
0010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101
1100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010
1100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101
1001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010
0110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101
1110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010
0100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101
1011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010
1111001010100101010010010110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110
0101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101
1010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011
0101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011
1010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010
0101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101
1010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001
0101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001
0010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010

DATA
1010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101
1011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111
0101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001
1101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010
1101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011
1010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100
0110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101
0110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101
0101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010
1011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110
0011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010
0010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101
0101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011
1001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101
1001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010
0010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101
1100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010
1100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101
1001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010
0110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101
1110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010
0100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101
1011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010
1111001010100101010010010110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110
0101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101
1010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011
0101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011
1010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010
0101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101
1010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001
0101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001
0010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010

DATA
.. the fallacies (Kayur Patel)
DATA
Silver Bullet?
DATA
Silver Bullet?

http://on.wsj.com/XCajtB
DATA
Silver Bullet?

http://on.wsj.com/XCajtB
www.nature.com/nature

Data’s shameful neglect

Vol 461 | Issue no. 7261 | 10 September 2009

Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.

M

ore and more often these days, a research project’s success is
measured not just by the publications it produces, but also by
the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful
such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed
in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional
and cultural frameworks required to support such open data access
(see pages 168 and 171) — leading to a scandalous shortfall in the
sharing of data by researchers (see page 160). This deficiency urgently
needs to be addressed by funders, universities and the researchers
themselves.
Research funding agencies need to recognize that preservation of
and access to digital data are central to their mission, and need to
be supported accordingly. Organizations in the United Kingdom,
for instance, have made a good start. The Joint Information Systems
Committee, established by the seven UK research councils in 1993,
has made data-sharing a priority, and has helped to establish a Digital
Curation Centre, headquartered at the University of Edinburgh, to be
a national focus for research and development into data issues. Other
European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a
29-member Interagency Working Group on Digital Data has been
trying to get US funding agencies to develop plans for how they will
support data archiving — and just as importantly, to develop policies
on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back.
They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure
that they support not only the hardware needed to store the data, but

also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline
the tedious process of annotating data with a description of what the
bits mean, which instrument collected them, which algorithms have
been used to process them and so on — information that is essential
if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and
combined in unanticipated ways, is software that can keep track of
which pieces of data came from whom. Such systems are essential if
tenure and promotion committees are ever to give credit — as they
should — to candidates’ track-record of
“Data management
data contribution.
Who should host these data? Agencies should be woven
and the research community together into every course in
need to create the digital equivalent science.”
of libraries: institutions that can take
responsibility for preserving digital data and making them accessible
over the long term. The university research libraries themselves are
obvious candidates to assume this role. But whoever takes it on, data
preservation will require robust, long-term funding. One potentially
helpful initiative is the US National Science Foundation’s DataNet
programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a
vigorous programme of education and outreach about data. Consider,
for example, that most university science students get a reasonably
good grounding in statistics. But their studies rarely include anything
about information management — a discipline that encompasses the
entire life cycle of data, from how they are acquired and stored to how
they are organized, retrieved and maintained over time. That needs
to change: data management should be woven into every course in
science, as one of the foundations of knowledge.
■

A step too far?

a base on the Moon, then send them to Mars. This idea immediately
set off a debate that is still continuing, in which sceptics ask whether
there is any point in returning to the Moon nearly half a century
after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space
beyond Earth?
Yet that debate is both counter-productive — a new set of rockets
could go to all of these places — and moot, because Bush’s vision
never attracted the hoped-for budget increases. Indeed, a blue-riband
commission reporting to US President Barack Obama this week (see
page 153) finds the organizational malaise unchanged: NASA is still
doing too much with too little. Without more money, the agency won’t
be sending people anywhere beyond the International Space Station,
which resides in low Earth orbit only 350 kilometres up. And even the
ability to do that is in question: Ares I, the US rocket that would return

Research cannot flourish if data are not preserved and made
accessible. All concerned must act accordingly.

DATA
The Obama administration must fund human space
flight adequately, or stop speaking of ‘exploration’.

A

fter the space shuttle Columbia burned up during re-entry
into Earth’s atmosphere in 2003, the board that was convened
to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed
out, the shuttle programme had been trying to do too much with
too little money. NASA desperately needed a clearer vision and a
better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply
that vision with a new long-term goal: first send astronauts to build

145

145-146 Editorials WF IF.indd 145

8/9/09 14:06:40

Silver Bullet?

http://on.wsj.com/XCajtB
www.nature.com/nature

Data’s shameful neglect

Vol 461 | Issue no. 7261 | 10 September 2009

Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.

M

ore and more often these days, a research project’s success is
measured not just by the publications it produces, but also by
the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful
such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed
in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional
and cultural frameworks required to support such open data access
(see pages 168 and 171) — leading to a scandalous shortfall in the
sharing of data by researchers (see page 160). This deficiency urgently
needs to be addressed by funders, universities and the researchers
themselves.
Research funding agencies need to recognize that preservation of
and access to digital data are central to their mission, and need to
be supported accordingly. Organizations in the United Kingdom,
for instance, have made a good start. The Joint Information Systems
Committee, established by the seven UK research councils in 1993,
has made data-sharing a priority, and has helped to establish a Digital
Curation Centre, headquartered at the University of Edinburgh, to be
a national focus for research and development into data issues. Other
European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a
29-member Interagency Working Group on Digital Data has been
trying to get US funding agencies to develop plans for how they will
support data archiving — and just as importantly, to develop policies
on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back.
They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure
that they support not only the hardware needed to store the data, but

also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline
the tedious process of annotating data with a description of what the
bits mean, which instrument collected them, which algorithms have
been used to process them and so on — information that is essential
if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and
combined in unanticipated ways, is software that can keep track of
which pieces of data came from whom. Such systems are essential if
tenure and promotion committees are ever to give credit — as they
should — to candidates’ track-record of
“Data management
data contribution.
Who should host these data? Agencies should be woven
and the research community together into every course in
need to create the digital equivalent science.”
of libraries: institutions that can take
responsibility for preserving digital data and making them accessible
over the long term. The university research libraries themselves are
obvious candidates to assume this role. But whoever takes it on, data
preservation will require robust, long-term funding. One potentially
helpful initiative is the US National Science Foundation’s DataNet
programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a
vigorous programme of education and outreach about data. Consider,
for example, that most university science students get a reasonably
good grounding in statistics. But their studies rarely include anything
about information management — a discipline that encompasses the
entire life cycle of data, from how they are acquired and stored to how
they are organized, retrieved and maintained over time. That needs
to change: data management should be woven into every course in
science, as one of the foundations of knowledge.
■

A step too far?

a base on the Moon, then send them to Mars. This idea immediately
set off a debate that is still continuing, in which sceptics ask whether
there is any point in returning to the Moon nearly half a century
after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space
beyond Earth?
Yet that debate is both counter-productive — a new set of rockets
could go to all of these places — and moot, because Bush’s vision
never attracted the hoped-for budget increases. Indeed, a blue-riband
commission reporting to US President Barack Obama this week (see
page 153) finds the organizational malaise unchanged: NASA is still
doing too much with too little. Without more money, the agency won’t
be sending people anywhere beyond the International Space Station,
which resides in low Earth orbit only 350 kilometres up. And even the
ability to do that is in question: Ares I, the US rocket that would return

Research cannot flourish if data are not preserved and made
accessible. All concerned must act accordingly.

DATA
The Obama administration must fund human space
flight adequately, or stop speaking of ‘exploration’.

A

fter the space shuttle Columbia burned up during re-entry
into Earth’s atmosphere in 2003, the board that was convened
to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed
out, the shuttle programme had been trying to do too much with
too little money. NASA desperately needed a clearer vision and a
better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply
that vision with a new long-term goal: first send astronauts to build

145

145-146 Editorials WF IF.indd 145

8/9/09 14:06:40

Silver Bullet?

http://on.wsj.com/XCajtB
www.nature.com/nature

Data’s shameful neglect

Vol 461 | Issue no. 7261 | 10 September 2009

Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly.

M

ore and more often these days, a research project’s success is
measured not just by the publications it produces, but also by
the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful
such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed
in ways that the original researchers could not have anticipated.
All but a handful of disciplines still lack the technical, institutional
and cultural frameworks required to support such open data access
(see pages 168 and 171) — leading to a scandalous shortfall in the
sharing of data by researchers (see page 160). This deficiency urgently
needs to be addressed by funders, universities and the researchers
themselves.
Research funding agencies need to recognize that preservation of
and access to digital data are central to their mission, and need to
be supported accordingly. Organizations in the United Kingdom,
for instance, have made a good start. The Joint Information Systems
Committee, established by the seven UK research councils in 1993,
has made data-sharing a priority, and has helped to establish a Digital
Curation Centre, headquartered at the University of Edinburgh, to be
a national focus for research and development into data issues. Other
European agencies have also pursued initiatives.
The United States, by contrast, is playing catch-up. Since 2005, a
29-member Interagency Working Group on Digital Data has been
trying to get US funding agencies to develop plans for how they will
support data archiving — and just as importantly, to develop policies
on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back.
They should all being moving forwards vigorously.
What is more, funding agencies and researchers alike must ensure
that they support not only the hardware needed to store the data, but

also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline
the tedious process of annotating data with a description of what the
bits mean, which instrument collected them, which algorithms have
been used to process them and so on — information that is essential
if other scientists are to reuse the data effectively.
Also necessary, especially in an era when data can be mixed and
combined in unanticipated ways, is software that can keep track of
which pieces of data came from whom. Such systems are essential if
tenure and promotion committees are ever to give credit — as they
should — to candidates’ track-record of
“Data management
data contribution.
Who should host these data? Agencies should be woven
and the research community together into every course in
need to create the digital equivalent science.”
of libraries: institutions that can take
responsibility for preserving digital data and making them accessible
over the long term. The university research libraries themselves are
obvious candidates to assume this role. But whoever takes it on, data
preservation will require robust, long-term funding. One potentially
helpful initiative is the US National Science Foundation’s DataNet
programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees.
Finally, universities and individual disciplines need to undertake a
vigorous programme of education and outreach about data. Consider,
for example, that most university science students get a reasonably
good grounding in statistics. But their studies rarely include anything
about information management — a discipline that encompasses the
entire life cycle of data, from how they are acquired and stored to how
they are organized, retrieved and maintained over time. That needs
to change: data management should be woven into every course in
science, as one of the foundations of knowledge.
■

A step too far?

a base on the Moon, then send them to Mars. This idea immediately
set off a debate that is still continuing, in which sceptics ask whether
there is any point in returning to the Moon nearly half a century
after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space
beyond Earth?
Yet that debate is both counter-productive — a new set of rockets
could go to all of these places — and moot, because Bush’s vision
never attracted the hoped-for budget increases. Indeed, a blue-riband
commission reporting to US President Barack Obama this week (see
page 153) finds the organizational malaise unchanged: NASA is still
doing too much with too little. Without more money, the agency won’t
be sending people anywhere beyond the International Space Station,
which resides in low Earth orbit only 350 kilometres up. And even the
ability to do that is in question: Ares I, the US rocket that would return

Research cannot flourish if data are not preserved and made
accessible. All concerned must act accordingly.

DATA
The Obama administration must fund human space
flight adequately, or stop speaking of ‘exploration’.

A

fter the space shuttle Columbia burned up during re-entry
into Earth’s atmosphere in 2003, the board that was convened
to investigate the disaster looked beyond its technical causes
to NASA’s organizational malaise. For decades, the board pointed
out, the shuttle programme had been trying to do too much with
too little money. NASA desperately needed a clearer vision and a
better-defined mission for human space flight.
The next year, then-President George W. Bush attempted to supply
that vision with a new long-term goal: first send astronauts to build

145

145-146 Editorials WF IF.indd 145

8/9/09 14:06:40

Silver Bullet?

http://on.wsj.com/XCajtB
Repository Services
•
•
•
•
•

Data is easy to upload
Landing page for data
Citable reference for data
Default licensing options
Guarantees for long term archival
Standard Metadata
•

Provenance metadata


•

Content metadata


•
•

Metadata is locked in

authors, title, publication date

free text tags, categories, links

Hard to interpret the data itself
Data is the Bottleneck
Common Motifs in Scientific Workflows:
An Empirical Analysis
Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble†
⇤ Ontology

Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es
e
of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu
† School

Abstract—While workflow technology has gained momentum
in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing
existing workflows to build new scientific experiments is still a
daunting task. This is partly due to the difficulty that scientists
experience when attempting to understand existing workflows,
which contain several data preparation and adaptation steps in
addition to the scientifically significant analysis steps. One way
to tackle the understandability problem is through providing
abstractions that give a high-level view of activities undertaken
within workflows. As a first step towards abstractions, we report
in this paper on the results of a manual analysis performed over
a set of real-world scientific workflows from Taverna and Wings
systems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that are
observed in workflows (data oriented motifs), and ii) the different
manners in which activities are implemented within workflows
(workflow oriented motifs). These motifs can be useful to inform
workflow designers on the good and bad practices for workflow
development, to inform the design of automated tools for the
generation of workflow abstractions, etc.

I. I NTRODUCTION
Scientific workflows have been increasingly used in the last
decade as an instrument for data intensive scientific analysis.
In these settings, workflows serve a dual function: first as
detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain
data item) and second as re-usable, executable artifacts for
data-intensive analysis. Workflows stitch together a variety
of data manipulation activities such as data movement, data
transformation or data visualization to serve the goals of the
scientific study. The stitching is realized by the constructs
made available by the workflow system used and is largely
shaped by the environment in which the system operates and
the function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]
serving several scientific disciplines. A workflow is a software
artifact, and as such once developed and tested, it can be
shared and exchanged between scientists. Other scientists can
then reuse existing workflows in their experiments, e.g., as
sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and
improves quality through shared workflow development by
leveraging the expertise of previous users. Users can also
re-purpose existing workflows to adapt them to their needs
[4]. Emerging workflow repositories such as myExperiment

[14] and CrowdLabs [8] have made publishing and finding
workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the
available workflows/fragments. One difficulty in understanding
workflows is their complex nature. A workflow may contain
several scientifically-significant analysis steps, combined with
various other data preparation activities, and in different
implementation styles depending on the environment and
context in which the workflow is executed. The difficulty in
understanding causes workflow developers to revert to starting
from scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientific
workflow development, we could gain insights on the creation
of understandable and more effectively re-usable workflows.
Specifically, we propose an analysis with the following objectives:
1) To reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence.
2) To identify workflow abstractions that would facilitate
understandability and therefore effective re-use.
3) To detect potential information sources and heuristics
that can be used to inform the development of tools for
creating workflow abstractions.
In this paper we present the result of an empirical analysis
performed over 177 workflow descriptions from Taverna [10]
and Wings [3]. Based on this analysis, we propose a catalogue
of scientific workflow motifs. Motifs are provided through i)
a characterization of the kinds of data-oriented activities that
are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented
within workflows, which we refer to as workflow-oriented
motifs. It is worth mentioning that, although important, motifs
that have to do with scheduling and mapping of workflows
onto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providing
related work in Section II, which is followed in Section III by
brief background information on Scientific Workflows, and the
two systems that were subject to our analysis. Afterwards we
describe the dataset and the general approach of our analysis.
We present the detected scientific workflow motifs in Section
IV and we highlight the main features of their distribution
Data is the Bottleneck
Common Motifs in Scientific Workflows:
An Empirical Analysis
Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble†
⇤ Ontology

Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es
e
of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu
† School

Abstract—While workflow technology has gained momentum
in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing
existing workflows to build new scientific experiments is still a
daunting task. This is partly due to the difficulty that scientists
experience when attempting to understand existing workflows,
which contain several data preparation and adaptation steps in
addition to the scientifically significant analysis steps. One way
to tackle the understandability problem is through providing
abstractions that give a high-level view of activities undertaken
within workflows. As a first step towards abstractions, we report
in this paper on the results of a manual analysis performed over
a set of real-world scientific workflows from Taverna and Wings
systems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that are
observed in workflows (data oriented motifs), and ii) the different
manners in which activities are implemented within workflows
(workflow oriented motifs). These motifs can be useful to inform
workflow designers on the good and bad practices for workflow
development, to inform the design of automated tools for the
generation of workflow abstractions, etc.

I. I NTRODUCTION
Scientific workflows have been increasingly used in the last
decade as an instrument for data intensive scientific analysis.
In these settings, workflows serve a dual function: first as
detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain
data item) and second as re-usable, executable artifacts for
data-intensive analysis. Workflows stitch together a variety
of data manipulation activities such as data movement, data
transformation or data visualization to serve the goals of the
scientific study. The stitching is realized by the constructs
made available by the workflow system used and is largely
shaped by the environment in which the system operates and
the function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]
serving several scientific disciplines. A workflow is a software
artifact, and as such once developed and tested, it can be
shared and exchanged between scientists. Other scientists can
then reuse existing workflows in their experiments, e.g., as
sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and
improves quality through shared workflow development by
leveraging the expertise of previous users. Users can also
re-purpose existing workflows to adapt them to their needs
[4]. Emerging workflow repositories such as myExperiment

[14] and CrowdLabs [8] have made publishing and finding
workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the
available workflows/fragments. One difficulty in understanding
workflows is their complex nature. A workflow may contain
several scientifically-significant analysis steps, combined with
various other data preparation activities, and in different
implementation styles depending on the environment and
context in which the workflow is executed. The difficulty in
understanding causes workflow developers to revert to starting
from scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientific
workflow development, we could gain insights on the creation
of understandable and more effectively re-usable workflows.
Specifically, we propose an analysis with the following objectives:
1) To reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence.
2) To identify workflow abstractions that would facilitate
understandability and therefore effective re-use.
3) To detect potential information sources and heuristics
that can be used to inform the development of tools for
creating workflow abstractions.
In this paper we present the result of an empirical analysis
performed over 177 workflow descriptions from Taverna [10]
and Wings [3]. Based on this analysis, we propose a catalogue
of scientific workflow motifs. Motifs are provided through i)
a characterization of the kinds of data-oriented activities that
are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented
within workflows, which we refer to as workflow-oriented
motifs. It is worth mentioning that, although important, motifs
that have to do with scheduling and mapping of workflows
onto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providing
related work in Section II, which is followed in Section III by
brief background information on Scientific Workflows, and the
two systems that were subject to our analysis. Afterwards we
describe the dataset and the general approach of our analysis.
We present the detected scientific workflow motifs in Section
IV and we highlight the main features of their distribution

Data-Oriented Motifs per Domain
Fig. 3.

Distribution of Data-Oriented Motifs per domain
Data is the Bottleneck
Common Motifs in Scientific Workflows:
An Empirical Analysis
Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble†
⇤ Ontology

Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es
e
of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk
‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu
† School

Abstract—While workflow technology has gained momentum
in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing
existing workflows to build new scientific experiments is still a
daunting task. This is partly due to the difficulty that scientists
experience when attempting to understand existing workflows,
which contain several data preparation and adaptation steps in
addition to the scientifically significant analysis steps. One way
to tackle the understandability problem is through providing
abstractions that give a high-level view of activities undertaken
within workflows. As a first step towards abstractions, we report
in this paper on the results of a manual analysis performed over
a set of real-world scientific workflows from Taverna and Wings
systems. Our analysis has resulted in a set of scientific workflow
motifs that outline i) the kinds of data intensive activities that are
observed in workflows (data oriented motifs), and ii) the different
manners in which activities are implemented within workflows
(workflow oriented motifs). These motifs can be useful to inform
workflow designers on the good and bad practices for workflow
development, to inform the design of automated tools for the
generation of workflow abstractions, etc.

Fig. 3.

[14] and CrowdLabs [8] have made publishing and finding
workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the
available workflows/fragments. One difficulty in understanding
workflows is their complex nature. A workflow may contain
several scientifically-significant analysis steps, combined with
various other data preparation activities, and in different
implementation styles depending on the environment and
context in which the workflow is executed. The difficulty in
understanding causes workflow developers to revert to starting
from scratch rather than re-using existing fragments.
Through an analysis of the current practices in scientific
workflow development, we could gain insights on the creation
of understandable and more effectively re-usable workflows.
Specifically, we propose an analysis with the following objectives:

Distribution of Data-Orientedpractices in work- domain
1) To reverse-engineer the set of current Motifs per

I. I NTRODUCTION
Scientific workflows have been increasingly used in the last
decade as an instrument for data intensive scientific analysis.
In these settings, workflows serve a dual function: first as
detailed documentation of the method (i. e. the input sources
and processing steps taken for the derivation of a certain
data item) and second as re-usable, executable artifacts for
data-intensive analysis. Workflows stitch together a variety
of data manipulation activities such as data movement, data
transformation or data visualization to serve the goals of the
scientific study. The stitching is realized by the constructs
made available by the workflow system used and is largely
shaped by the environment in which the system operates and
the function undertaken by the workflow.
A variety of workflow systems are in use [10] [3] [7] [2]
serving several scientific disciplines. A workflow is a software
artifact, and as such once developed and tested, it can be
shared and exchanged between scientists. Other scientists can
then reuse existing workflows in their experiments, e.g., as
sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and
improves quality through shared workflow development by
leveraging the expertise of previous users. Users can also
re-purpose existing workflows to adapt them to their needs
[4]. Emerging workflow repositories such as myExperiment

flow development through an analysis of empirical evidence.
2) To identify workflow abstractions that would facilitate
understandability and therefore effective re-use.
3) To detect potential information sources and heuristics
that can be used to inform the development of tools for
creating workflow abstractions.

In this paper we present the result of an empirical analysis
performed over 177 workflow descriptions from Taverna [10]
and Wings [3]. Based on this analysis, we propose a catalogue
of scientific workflow motifs. Motifs are provided through i)
a characterization of the kinds of data-oriented activities that
are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented
within workflows, which we refer to as workflow-oriented
motifs. It is worth mentioning that, although important, motifs
that have to do with scheduling and mapping of workflows
onto distributed resources [12] are out the scope of this paper.
The paper is structured as follows. We begin by providing
related work in Section II, which is followed in Section III by
brief background information on Scientific Workflows, and the
two systems that were subject to our analysis. Afterwards we
describe the dataset and the general approach of our analysis.
We present the detected scientific workflow motifs in Section
IV and we highlight the main features of their distribution

Fig. 5.

Fig. 3.

Data-Preparation Motifs per Domain

Data-Oriented Motifs per Domain

Data Preparation Motifs in the Genomics Wo

Distribution of Data-Oriented Motifs per domain
Make Data Flourish
From data to information to knowledge
Make Data Flourish
From data to information to knowledge

Global identification of 

data sets and data items
Data uses a common syntax
Papers explicitly 

link to data

Metadata expressed using

shared vocabularies
Capture the processes by
which data is manipulated
Track and publish explicit
provenance information
Make Data Flourish
From data to information to knowledge

Global identification of 

data sets and data items

Metadata expressed using

shared vocabularies

Capture the processes by
"Someone who is not the person who collected the data can 

which data is
Data uses a common syntax experiment and data" - Shreejoy Tripathy manipulated
understand the
Papers explicitly 

link to data

Track and publish explicit
provenance information
Linked Data
•
•
•
•
•

Use existing Web infrastructure
Everything gets a URI and usually a category
Express typed relations between things (triples)
Express sameness or difference
Reuse identifiers as much as possible

+

=
Salah, Alkim Almila Akdag, Cheng Gao, Krzysztof Suchecki, and Andrea Scharnhorst. 2012. “Need to Categorize: A Comparative Look at the Categories of Universal
Decimal Classification System and Wikipedia.” Leonardo 45 (1) (February): 84-85. doi:10.1162/LEON_a_00344. (Preprint http://arxiv.org/abs/1105.5912v1)
Linked Data for Science
Neuroscience Information Framework
(Ontologies, Semantic Wiki, Catalog)

Nanopublications
(small scientific assertions)

Workflow Systems
(WINGS, Taverna, …)

Linked Science
(tools)

BioPortal
(ontologies)

Organic Data Publishing

Rightfield


(Semantic Wiki)

(systems biology)

Bio2RDF
(big linked data)
…Claire Monteleoni
Hellenic
FBD
Hellenic
PD

Crime
Reports
UK
Ox
Points

NHS
(EnAKTing)

Ren.
Energy
Generators

Open
Election
Data
Project

EU
Institutions

CO2
Emission
(EnAKTing)

Energy
(EnAKTing)

EEA

Mortality
(EnAKTing)

Ordnance
Survey

legislation
data.gov.uk
UK Postcodes

ESD
standards

ISTAT
Immigration

Lichfield
Spending

Scotland
Pupils &
Exams

Traffic
Scotland

Data
Gov.ie

reference
data.gov.
uk

London
Gazette

TWC LOGD

Eurostat
(FUB)

CORDIS

CORDIS
(FUB)

(RKB
Explorer)

Linked
EDGAR
(Ontology
Central)

EURES

(Ontology
Central)

GovTrack

Finnish
Municipalities

New
York
Times

Italian
public
schools

IdRef
Sudoc

Greek
DBpedia

Geo
Names

World
Factbook

Geo
Species

UMBEL

Freebase

DBLP
(FU
Berlin)

dataopenac-uk

TCM
Gene
DIT

Daily
Med

SIDER

Twarql

EUNIS

PDB

SMC
Journals

Ocean
Drilling
Codices

Turismo
de
Zaragoza

Janus
AMP

Climbing

Linked
GeoData

Alpine
Ski
Austria

AEMET

Metoffice
Weather
Forecasts

Yahoo!
Geo
Planet

National
Radioactivity
JP

ChEMBL
Open
Data
Thesaurus

Sears

DBLP
(RKB
Explorer)

STW

GESIS

Budapest

Pisa

RESEX

Scholarometer

IRIT

ACM

NVD

IBM
DEPLOY

Newcastle

RAE2001

LOCAH
Roma

CiteSeer

Courseware

dotAC

ePrints

IEEE
RISKS

PROSITE

Affymetrix

SISVU

GEMET

Airports

lobid
Organisations

ECS
(RKB
Explorer)

HGNC

(Bio2RDF)

PubMed

ProDom

VIVO
Cornell

STITCH

Linked
Open
Colors

SGD

Gene
Ontology

AGROV
OC

Product
DB

Weather
Stations

Swedish
Open
Cultural
Heritage

LAAS

NSF

KISTI
JISC

WordNet
(RKB
Explorer)

EARTh

ECS
Southampton
EPrints

VIVO
Indiana

UniProt

LODE
WordNet
(W3C)

Wiki

ECS
Southampton

Pfam

LinkedCT

Taxono
my

Cornetto

NSZL
Catalog

P20

Eurécom

totl.net
WordNet
(VUA)

lobid
Resources

UN/
LOCODE

Drug
Bank

Enipedia

Lexvo

DBLP
(L3S)

ERA
Diseasome

lingvoj

Europeana
Deutsche
Biographie

OAI

data
dcs

Uberblic

YAGO

Open
Cyc

BibBase

OS

dbpedia
lite

Norwegian
MeSH

VIAF

UB
Mannheim
Ulm

data
bnf.fr

BNB

Project
Gutenberg

Rådata
nå!

GND

ndlna

Calames

DDC

iServe

riese

GeoWord
Net

El
Viajero
Tourism

URI
Burner

LIBRIS

LCSH

MARC
Codes
List

PSH

RDF
Book
Mashup

Open
Calais

ntnusc

Thesaurus W

SW
Dog
Food

Portuguese
DBpedia

LEM

RAMEAU
SH

LinkedL
CCN

Sudoc

UniProt

US Census
(rdfabout)

Piedmont
Accomodations

Linked
MDB

t4gm
info

Open
Library
(Talis)

theses.
fr

my
Experiment

flickr
wrappr

NDL
subjects

Plymouth
Reading
Lists

Revyu

Fishes
of Texas

(rdfabout)

Scotland
Geography

Pokedex

Event
Media

US SEC

Semantic
XBRL

FTS

Goodwin
Family

NTU
Resource
Lists

Open
Library

SSW
Thesaur
us

Didactal
ia

DBpedia

Linked
Sensor Data
(Kno.e.sis)

Eurostat

Chronicling
America

Telegraphis

Geo
Linked
Data

Source Code
Ecosystem
Linked Data

semantic
web.org

BBC
Music

BBC
Wildlife
Finder

NASA
(Data
Incubator)

transport
data.gov.
uk

Eurostat

Classical
(DB
Tune)

Taxon
Concept

LOIUS

Poképédia

St.
Andrews
Resource
Lists

Manchester
Reading
Lists

gnoss

Last.FM
(rdfize)

BBC
Program
mes

Rechtspraak.
nl

Openly
Local

data.gov.uk
intervals

Music
Brainz
(DBTune)

Jamendo
(DBtune)

Ontos
News
Portal

Sussex
Reading
Lists

Bricklink

yovisto

Semantic
Tweet

Linked
Crunchbase

RDF
ohloh

(Data
Incubator)

(DBTune)

OpenEI

statistics
data.gov.
uk

GovWILD

Brazilian
Politicians

educatio
n.data.g
ov.uk

Music
Brainz
(zitgist)

Discogs

FanHubz

patents
data.go
v.uk

research
data.gov.
uk

Klappstuhlclub

Lotico

(Data
Incubator)

Last.FM
artists

Population (EnAKTing)

reegle

Surge
Radio

tags2con
delicious

Slideshare
2RDF

(DBTune)

Music
Brainz

John
Peel
(DBTune)

EUTC
Productions

business
data.gov.
uk

Crime
(EnAKTing)

GTAA

Magnatune

DB
Tropes

Moseley
Folk

Linked
User
Feedback

LOV

Audio
Scrobbler

OMIM

MGI

InterPro
Smart
Link

Product
Types
Ontology

Open
Corporates

Italian
Museums

Amsterdam
Museum

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

UniParc

UniRef

UniSTS

GeneID

Linked
Open
Numbers

Reactome

OGOLOD

KEGG
Pathway

Medi
Care

Google
Art
wrapper

meducator

KEGG
Drug

Pub
Chem

UniPath
way

Chem2
Bio2RDF

Homolo
Gene

VIVO UF

ECCOTCP
bible
ontology

KEGG
Enzyme

PBAC

KEGG
Reaction

KEGG
Compound

KEGG
Glycan

Media
Geographic
Publications

User-generated content
Government
Cross-domain
Life sciences
As of September 2011
Eurostat

Finnish
Municipalities

0

(rdfabout)

Scotland
Geography

US Census
(rdfabout)
GeoWord
Net

Piedmont
Accomodations

Italian
public
schools

El
Viajero
Tourism

Greek
DBpedia

World
Factbook

Geo
Species

UMBEL

Freebase
Project
Gutenberg

dbpedia
lite

DBLP
(FU
Berlin)

dataopenac-uk

TCM
Gene
DIT

Daily
Med

SIDER

SMC
Journals

Ocean
Drilling
Codices

Turismo
de
Zaragoza

Janus
AMP

EUNIS

Climbing

Twarql

Linked
GeoData

WordNet
(W3C)

Alpine
Ski
Austria

AEMET

Metoffice
Weather
Forecasts

WordNet
(RKB
Explorer)

UniProt

(Bio2RDF)

Affymetrix

SISVU

GEMET

ChEMBL
Open
Data
Thesaurus

Product
DB
Airports

National
Radioactivity
JP

LODE

Taxono
my

Sears

Linked
Open
Colors

PDB

PROSITE

Open
Corporates

Italian
Museums

PubMed

MGI

InterPro

Amsterdam
Museum

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

UniRef

HGNC

SGD

Gene
Ontology

OMIM

UniParc

UniSTS

Linked
Open
Numbers

Reactome

OGOLOD

Pub
Chem

GeneID

ECS
Southampton
EPrints

lobid
Organisations

ECS
(RKB
Explorer)

DBLP
(RKB
Explorer)

UniPath
way

Chem2
Bio2RDF

Swedish
Open
Cultural
Heritage

STW

GESIS

Budapest

Pisa

RESEX

Scholarometer

IRIT

ACM

NVD

IBM
DEPLOY

Newcastle

RAE2001

LOCAH
Roma

CiteSeer

Courseware

KEGG
Drug

KEGG
Pathway

Homolo
Gene

dotAC

ePrints

LAAS

NSF

KISTI
JISC

VIVO UF

ECCOTCP
bible
ontology

KEGG
Enzyme

PBAC

KEGG
Reaction

KEGG
Compound

IEEE
RISKS

VIVO
Cornell

STITCH

Medi
Care

Google
Art
wrapper

meducator

Wiki

ECS
Southampton

VIVO
Indiana

ProDom

Smart
Link

Product
Types
Ontology

NSZL
Catalog

Pfam

LinkedCT

AGROV
OC

EARTh

Weather
Stations

Yahoo!
Geo
Planet

Cornetto

lobid
Resources

P20

Eurécom

totl.net
WordNet
(VUA)

Ulm

UN/
LOCODE

Drug
Bank

Enipedia

Lexvo

DBLP
(L3S)

ERA
Diseasome

lingvoj

Europeana
Deutsche
Biographie

OAI

data
dcs

Uberblic

YAGO

Open
Cyc

BibBase

OS

VIAF

UB
Mannheim

Calames

BNB

UniProt

US SEC

Semantic
XBRL

FTS

Geo
Names

riese

8 okt. 2007

Linked
EDGAR
(Ontology
Central)

EURES

(Ontology
Central)

GovTrack

URI
Burner

Norwegian
MeSH

GND

ndlna

data
bnf.fr

iServe

Fishes
of Texas

Linked
Sensor Data
(Kno.e.sis)

Eurostat

1 mei 2007

CORDIS
(FUB)

(RKB
Explorer)

IdRef
Sudoc

DDC

Open
Calais

Rådata
nå!

PSH

RDF
Book
Mashup

DBpedia

Geo
Linked
Data

CORDIS

New
York
Times

LIBRIS

LCSH

MARC
Codes
List

Sudoc

SW
Dog
Food

Portuguese
DBpedia

ntnusc

Thesaurus W

23 feb. 2012

TWC LOGD

Eurostat
(FUB)

Event
Media

LEM

RAMEAU
SH

LinkedL
CCN

14 jul. 2009

Data
Gov.ie

100

London
Gazette

NASA
(Data
Incubator)

transport
data.gov.
uk

Linked
MDB

27 mrt. 2009

Traffic
Scotland

data.gov.uk
intervals

flickr
wrappr

t4gm
info

Open
Library
(Talis)

theses.
fr

my
Experiment

5 mrt. 2009

Scotland
Pupils &
Exams

reference
data.gov.
uk

Pokedex

NDL
subjects

Plymouth
Reading
Lists

Revyu

Taxon
Concept

LOIUS

Chronicling
America

Telegraphis

200

Goodwin
Family

NTU
Resource
Lists

Open
Library

SSW
Thesaur
us

semantic
web.org

BBC
Music

BBC
Wildlife
Finder

Rechtspraak.
nl

Openly
Local

Classical
(DB
Tune)

Source Code
Ecosystem
Linked Data

Didactal
ia

18 sep. 2008

ISTAT
Immigration

Lichfield
Spending

OpenEI

statistics
data.gov.
uk

GovWILD
ESD
standards

educatio
n.data.g
ov.uk

Ordnance
Survey

legislation
data.gov.uk
UK Postcodes

Brazilian
Politicians

300

Poképédia

Last.FM
(rdfize)

BBC
Program
mes

Ontos
News
Portal

Manchester
Reading
Lists

gnoss

31 mrt. 2008

Open
Election
Data
Project

EU
Institutions

CO2
Emission
(EnAKTing)

Energy
(EnAKTing)

EEA

Mortality
(EnAKTing)

Jamendo
(DBtune)

28 feb. 2008

Ren.
Energy
Generators

(DBTune)

patents
data.go
v.uk

research
data.gov.
uk

Music
Brainz
(DBTune)

FanHubz
Last.FM
artists

Population (EnAKTing)

NHS
(EnAKTing)

(Data
Incubator)

yovisto

Semantic
Tweet

Linked
Crunchbase

RDF
ohloh

Discogs

10 nov. 2007

Ox
Points

reegle

business
data.gov.
uk

Crime
(EnAKTing)

Surge
Radio

Music
Brainz
(zitgist)

(Data
Incubator)

7 nov. 2007

Crime
Reports
UK

400

Lotico

St.
Andrews
Resource
Lists

19 sep. 2011

Hellenic
PD

EUTC
Productions

Klappstuhlclub

Sussex
Reading
Lists

Bricklink

(DBTune)

Music
Brainz

John
Peel
(DBTune)

tags2con
delicious

Slideshare
2RDF

22 sep. 2010

Hellenic
FBD

GTAA

Magnatune

DB
Tropes

Moseley
Folk

Linked
User
Feedback

LOV

Audio
Scrobbler

KEGG
Glycan

Media
Geographic
Publications

User-generated content
Government
Cross-domain
Life sciences
As of September 2011
62.224.812.703 Triples!
62.224.812.703 Triples!
(1.75 Billion)
LODStats Analysis

http://stats.lod2.eu

140

134
2%
4%
4%

105

7%
84

35

HTTP

Other

12
Unknown response

28%

30

No URL provided

11

XML

6

Connection reset

0

Not RDF

22

10%

45%

70

Not RDF
Connection reset
Unknown response
XML
No URL provided
Other
HTTP

Hoekstra, Rinke; Groth, Paul (2013): Distribution of Errors Reported by LOD2 LODStats Project. figshare.
http://dx.doi.org/10.6084/m9.figshare.695949

299 out of 639 datasets have errors
An Ambient Agent Model for Monitoring and
Analysing Dynamics of Complex Human
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model for Monitoring and
Analysing Dynamics of Complex Human
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura
a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*

a

a

a

a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying
whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*

a

a

a

a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*

a

a

a

a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
“And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
presents
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*

a

a

a

a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
“And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
presents
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni“Noo! You’ll need persistent Cool URI’s and publish your endpoint
toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
for eternity of course. Duh.”
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
Keywords: ambient agent model, human behaviour, dynamics

Journal of Ambient Intelligence and Smart Environments
“Whoah! Cool, you should publish that stuff as Linked Data”

An Ambient Agent Model
“Um, but doesn’t TTL have incompatible semantics?” for Monitoring and
Analysing Dynamics of Complex Human
“Nah, silly, who cares? We’ll just start a new W3C WG!”
Behaviour
“Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur
even then, we can’t just publish the model as is!”
Tibor Bosse
a*

a

a

a

a

Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam,
The Netherlands

“”No worries, just add the provenance using PROV-O, annotate the PDF
Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.”
with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying
whether a certain value of a sensor is above a certain threshold. Instead, such
namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper
“And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1)
presents
the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents,
and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni“Noo! You’ll need persistent Cool URI’s and publish your endpoint
toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving
behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations,
respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers
for eternity of course. Duh.”
within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have
shown that the framework is easy to use and applicable in a wide variety of domains.
“Eh?”
Keywords: ambient agent model, human behaviour, dynamics
“Oh... and don’t forget all data collected by the agents, in all runs,
including the first experiments. Now THAT would be ultra cool.
“Ngh!?”
Journal of Ambient Intelligence and Smart Environments
Creating Linked Data

http://linkeddatabook.com

•
•
•
•
•
•
•
•

Decide on resources to describe
Mint cool URIs
Decide on triples to include
Describe the dataset
Choose vocabularies
Define terms
Make links
Publish to triple store/annotations/dump
If this already is tedious...

... can you expect researchers to publish Linked Research Data?
If this already is tedious...

... can you expect researchers to publish Linked Research Data?
Conclusion?
We need to make publishing Linked Research Data...
...a lot easier...

... more persistent ...

... and more rewarding.

Linked Data is sóóóóó 2005
We need to make publishing Linked Research Data...
...a lot easier...

... more persistent ...

... and more rewarding.

“People as frontier in computing” - Haym Hirsch, Pietro Michelucci
We need to make publishing Linked Research Data...
...a lot easier...

... more persistent ...

... and more rewarding.

http://linkitup.data2semantics.org
We need to make publishing Linked Research Data...
...a lot easier...

... more persistent ...

•
•
•
•
•
•

... and more rewarding.

Lightweight web application
Interface to API of existing data repositories
Enrich metadata by linking to (linked) data resources
Human in the Loop
Track provenance
Publish rich metadata as new data publication

Nanopublication + OA 

+ PROV-O + DCTerms + FOAF
http://linkitup.data2semantics.org
We need to make publishing Linked Research Data...
...a lot easier...

... more persistent ...

•
•
•
•
•
•

... and more rewarding.

Lightweight web application
Interface to API of existing data repositories
Enrich metadata by linking to (linked) data resources
Human in the Loop
Track provenance
Publish rich metadata as new data publication

Nanopublication + OA 

+ PROV-O + DCTerms + FOAF
http://linkitup.data2semantics.org
Use tags & categories to query the DBpedia endpoint
Use authors to query the DBLP endpoint
Use tags & categories to query the NeuroLex endpoint
Use author names to query the ORCID API
Extract references to resolve to CrossRef DOIs
Every operation is tracked automatically
http://semweb.cs.vu.nl/provoviz

Connection to PROV-O-Viz service
Review selected links, and publish to Figshare
Plugins
Name
DBLP
ORCID
LinkedLifeData
Crossref
Elsevier LDR
DANS EASY
SameAs
DBPedia Spotlight
DBPedia/Wikipedia
NeuroLex
NIF Registry
your

Service
SPARQL
REST
REST
Custom
REST
Custom
REST
REST
SPARQL
SPARQL
REST
data

Source
Authors
Authors
Tags & Categories
Citations
Tags & Categories
Tags & Categories
Links
Description, Tags &
Categories
Tags & Categories
Tags & Categories
Tags & Categories
set

Links to
Author Identifiers
Author Identifiers
Biomedical Entities
DOIs
Funding agencies
General Datasets
General Entities
General Entities
General Entities
Neuroscience Concepts
Neuroscience Datasets
here
What does this solve?

http://linkeddatabook.com

•
•
•
•
•
•
•
•

Decide on resources to describe
Mint cool URIs
Decide on triples to include
Describe the dataset
Choose vocabularies
Define terms
Make links
Publish to triple store/annotations/dump
What does this solve?

http://linkeddatabook.com

•
•
•
•
•
•
•
•

You decide on resources to describe
We mint cool URIs
We decide on triples to include
We describe the dataset
We choose vocabularies
We define terms
Together we make links
We publish the dataset to a reliable repository
Coming up…
•
•
•
•
•
•

Publish directly from Dropbox, Github, …
Reconstruct provenance information (http://git2prov.org)
Analyze, convert and enrich on the fly
Generate a data report for advertisement purposes
Measure for information content of datasets (“D-Index”)
Integrate a data dashboard
84

70

12

22

30
HTTP

11

Other

6

No URL provided

0

XML

35

Unknown response

… enhancing the data publication…

105

Connection reset

http://linkitup.data2semantics.org

134

Not RDF

linkitup

140

… increasing findability …
… boosting reusability …
… result is stored persistently
http://git2prov.org
http://semweb.cs.vu.nl/provoviz
http://yasgui.data2semantics.org

http://www.data2semantics.org

Contenu connexe

Tendances

Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Koray Tugberk GUBUR
 

Tendances (20)

The Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 WorkshopThe Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 Workshop
 
Linked Data
Linked DataLinked Data
Linked Data
 
Information Extraction and Linked Data Cloud
Information Extraction and Linked Data CloudInformation Extraction and Linked Data Cloud
Information Extraction and Linked Data Cloud
 
The Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & PythonThe Art of Social Media Analysis with Twitter & Python
The Art of Social Media Analysis with Twitter & Python
 
Slawski New Approaches for Structured Data:Evolution of Question Answering
Slawski   New Approaches for Structured Data:Evolution of Question Answering Slawski   New Approaches for Structured Data:Evolution of Question Answering
Slawski New Approaches for Structured Data:Evolution of Question Answering
 
Sanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUDSanderson Shout It Out: LOUD
Sanderson Shout It Out: LOUD
 
Trustworthy AI and Open Science
Trustworthy AI and Open ScienceTrustworthy AI and Open Science
Trustworthy AI and Open Science
 
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
Semantic Search Engine: Semantic Search and Query Parsing with Phrases and En...
 
How to hack into the big data team
How to hack into the big data teamHow to hack into the big data team
How to hack into the big data team
 
Dealing with poor data quality of osint data in fraud risk analysis
Dealing with poor data quality of osint data in fraud risk analysisDealing with poor data quality of osint data in fraud risk analysis
Dealing with poor data quality of osint data in fraud risk analysis
 
Konrad cedem praesi
Konrad cedem praesiKonrad cedem praesi
Konrad cedem praesi
 
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
Interlinking Standardized OpenStreetMap Data and Citizen Science Data in the ...
 
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joinedKeystone summer school_2015_miguel_antonio_ldcompression_4-joined
Keystone summer school_2015_miguel_antonio_ldcompression_4-joined
 
Semantic Web Good News
Semantic Web Good NewsSemantic Web Good News
Semantic Web Good News
 
Smx advanced-william-slawski-final
Smx advanced-william-slawski-finalSmx advanced-william-slawski-final
Smx advanced-william-slawski-final
 
William slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-searchWilliam slawski-google-patents- how-do-they-influence-search
William slawski-google-patents- how-do-they-influence-search
 
Empowering red and blue teams with osint c0c0n 2017
Empowering red and blue teams with osint   c0c0n 2017Empowering red and blue teams with osint   c0c0n 2017
Empowering red and blue teams with osint c0c0n 2017
 
Semantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA ConSemantic search Bill Slawski DEEP SEA Con
Semantic search Bill Slawski DEEP SEA Con
 
After the Data Breach: Stolen Credentials
After the Data Breach: Stolen CredentialsAfter the Data Breach: Stolen Credentials
After the Data Breach: Stolen Credentials
 
Observing Linked Data Dynamics
Observing Linked Data DynamicsObserving Linked Data Dynamics
Observing Linked Data Dynamics
 

Similaire à Linkitup: Link Discovery for Research Data

Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
Richard Wallis
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
NSMNSS
 

Similaire à Linkitup: Link Discovery for Research Data (20)

Identifying The Benefit of Linked Data
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
 
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)ITWS Capstone: Engineering a Semantic Web (Fall 2022)
ITWS Capstone: Engineering a Semantic Web (Fall 2022)
 
PhD Projects in Big Data Analytics Research Guidance
PhD Projects in Big Data Analytics Research GuidancePhD Projects in Big Data Analytics Research Guidance
PhD Projects in Big Data Analytics Research Guidance
 
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
Engineering a Semantic Web: ITWS Capstone Lecture (Spring 2014)
 
Metadata for digital humanities
Metadata for digital humanities Metadata for digital humanities
Metadata for digital humanities
 
Foresight conversation
Foresight conversationForesight conversation
Foresight conversation
 
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
ITWS 4310: Building and Consuming the Web of Data (Fall 2013)
 
OpenSourceIntelligence-OSINT.pptx
OpenSourceIntelligence-OSINT.pptxOpenSourceIntelligence-OSINT.pptx
OpenSourceIntelligence-OSINT.pptx
 
Linking Open Government Data at Scale
Linking Open Government Data at Scale Linking Open Government Data at Scale
Linking Open Government Data at Scale
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
 
PEARC17: ARCC Identity and Access Management, Security and related topics. Cy...
PEARC17: ARCC Identity and Access Management, Security and related topics. Cy...PEARC17: ARCC Identity and Access Management, Security and related topics. Cy...
PEARC17: ARCC Identity and Access Management, Security and related topics. Cy...
 
Removing Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data AllianceRemoving Barriers to Data Sharing: the Research Data Alliance
Removing Barriers to Data Sharing: the Research Data Alliance
 
Intro to Web Science (Oct 2022)
Intro to Web Science (Oct 2022)Intro to Web Science (Oct 2022)
Intro to Web Science (Oct 2022)
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science,  a Digital Research...
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
 
Open Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in PharmaOpen Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in Pharma
 
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific EndeavourBeyond Meta-Data: Nano-Publications Recording Scientific Endeavour
Beyond Meta-Data: Nano-Publications Recording Scientific Endeavour
 
LIBER Webinar: 23 Things About Research Data Management
LIBER Webinar: 23 Things About Research Data ManagementLIBER Webinar: 23 Things About Research Data Management
LIBER Webinar: 23 Things About Research Data Management
 
It19 20140721 linked data personal perspective
It19 20140721 linked data personal perspectiveIt19 20140721 linked data personal perspective
It19 20140721 linked data personal perspective
 
Better Data for a Better World
Better Data for a Better WorldBetter Data for a Better World
Better Data for a Better World
 

Plus de Rinke Hoekstra

Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Rinke Hoekstra
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
Rinke Hoekstra
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
Rinke Hoekstra
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
Rinke Hoekstra
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
Rinke Hoekstra
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
Rinke Hoekstra
 

Plus de Rinke Hoekstra (20)

Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Managing Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS caseManaging Metadata for Science and Technology Studies: the RISIS case
Managing Metadata for Science and Technology Studies: the RISIS case
 
An Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities DataAn Ecosystem for Linked Humanities Data
An Ecosystem for Linked Humanities Data
 
QBer - Connect your data to the cloud
QBer - Connect your data to the cloudQBer - Connect your data to the cloud
QBer - Connect your data to the cloud
 
Jurix 2014 welcome presentation
Jurix 2014 welcome presentationJurix 2014 welcome presentation
Jurix 2014 welcome presentation
 
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
Provenance and Reuse of Open Data (PILOD 2.0 June 2014)
 
Prov-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance VisualizationProv-O-Viz: Interactive Provenance Visualization
Prov-O-Viz: Interactive Provenance Visualization
 
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document ServerA Network Analysis of Dutch Regulations - Using the Metalex Document Server
A Network Analysis of Dutch Regulations - Using the Metalex Document Server
 
Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?Linked (Open) Data - But what does it buy me?
Linked (Open) Data - But what does it buy me?
 
Linked Science - Building a Web of Research Data
Linked Science - Building a Web of Research DataLinked Science - Building a Web of Research Data
Linked Science - Building a Web of Research Data
 
COMMIT/VIVO
COMMIT/VIVOCOMMIT/VIVO
COMMIT/VIVO
 
Semantic Representations for Research
Semantic Representations for ResearchSemantic Representations for Research
Semantic Representations for Research
 
A Slightly Different Web of Data
A Slightly Different Web of DataA Slightly Different Web of Data
A Slightly Different Web of Data
 
The Knowledge Reengineering Bottleneck
The Knowledge Reengineering BottleneckThe Knowledge Reengineering Bottleneck
The Knowledge Reengineering Bottleneck
 
Linked Census Data
Linked Census DataLinked Census Data
Linked Census Data
 
Concept- en Definitie Extractie
Concept- en Definitie ExtractieConcept- en Definitie Extractie
Concept- en Definitie Extractie
 
SIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web LanguagesSIKS 2011 Semantic Web Languages
SIKS 2011 Semantic Web Languages
 
The MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked DataThe MetaLex Document Server - Legal Documents as Versioned Linked Data
The MetaLex Document Server - Legal Documents as Versioned Linked Data
 
Querying the Web of Data
Querying the Web of DataQuerying the Web of Data
Querying the Web of Data
 
History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)History of Knowledge Representation (SIKS Course 2010)
History of Knowledge Representation (SIKS Course 2010)
 

Dernier

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

Linkitup: Link Discovery for Research Data

  • 1. 2 Semantics Datato From Data Semantics for Scientific Data Publishers linkitup
 Link Discovery for Research Data Rinke Hoekstra and Paul Groth
 Network Insitute, VU University Amsterdam
 Law Faculty, University of Amsterdam ★ ★ Linkitup - Link Discovery for Research Data by Rinke Hoekstra
 Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
  • 2. 2 Semantics Datato From Data Semantics for Scientific Data Publishers linkitup
 Link Discovery for Research Data Rinke Hoekstra and Paul Groth
 Network Insitute, VU University Amsterdam
 Law Faculty, University of Amsterdam ★ ★ How to share, publish, access, analyse, interpret and reuse data? Linkitup - Link Discovery for Research Data by Rinke Hoekstra
 Licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
  • 3. 1010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101 1011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111 0101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001 1101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010 1101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011 1010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100 0110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101 0110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101 0101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010 1011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110 0011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010 0010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101 0101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011 1001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101 1001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010 0010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101 1100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010 1100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101 1001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010 0110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101 1110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010 0100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101 1011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010 1111001010100101010010010110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110 0101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101 1010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011 0101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011 1010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010 0101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101 1010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001 0101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001 0010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010 DATA
  • 4. 1010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101 1011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111 0101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001 1101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010 1101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011 1010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100 0110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101 0110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101 0101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010 1011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110 0011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010 0010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101 0101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011 1001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101 1001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010 0010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101 1100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010 1100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101 1001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010 0110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101 1110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010 0100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101 1011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010 1111001010100101010010010110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110 0101011110010101001010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101 1010100100101101010101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011 0101101011001101101010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011 1010101111001010100101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010 0101010010010110101010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101 1010110101100110110101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001 0101010111100101010010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001 0010101001001011010101011010110011011010101011110010101001010100100101101010101101011001101101010101111001010100101010010010 DATA .. the fallacies (Kayur Patel)
  • 8. www.nature.com/nature Data’s shameful neglect Vol 461 | Issue no. 7261 | 10 September 2009 Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. M ore and more often these days, a research project’s success is measured not just by the publications it produces, but also by the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed in ways that the original researchers could not have anticipated. All but a handful of disciplines still lack the technical, institutional and cultural frameworks required to support such open data access (see pages 168 and 171) — leading to a scandalous shortfall in the sharing of data by researchers (see page 160). This deficiency urgently needs to be addressed by funders, universities and the researchers themselves. Research funding agencies need to recognize that preservation of and access to digital data are central to their mission, and need to be supported accordingly. Organizations in the United Kingdom, for instance, have made a good start. The Joint Information Systems Committee, established by the seven UK research councils in 1993, has made data-sharing a priority, and has helped to establish a Digital Curation Centre, headquartered at the University of Edinburgh, to be a national focus for research and development into data issues. Other European agencies have also pursued initiatives. The United States, by contrast, is playing catch-up. Since 2005, a 29-member Interagency Working Group on Digital Data has been trying to get US funding agencies to develop plans for how they will support data archiving — and just as importantly, to develop policies on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back. They should all being moving forwards vigorously. What is more, funding agencies and researchers alike must ensure that they support not only the hardware needed to store the data, but also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline the tedious process of annotating data with a description of what the bits mean, which instrument collected them, which algorithms have been used to process them and so on — information that is essential if other scientists are to reuse the data effectively. Also necessary, especially in an era when data can be mixed and combined in unanticipated ways, is software that can keep track of which pieces of data came from whom. Such systems are essential if tenure and promotion committees are ever to give credit — as they should — to candidates’ track-record of “Data management data contribution. Who should host these data? Agencies should be woven and the research community together into every course in need to create the digital equivalent science.” of libraries: institutions that can take responsibility for preserving digital data and making them accessible over the long term. The university research libraries themselves are obvious candidates to assume this role. But whoever takes it on, data preservation will require robust, long-term funding. One potentially helpful initiative is the US National Science Foundation’s DataNet programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees. Finally, universities and individual disciplines need to undertake a vigorous programme of education and outreach about data. Consider, for example, that most university science students get a reasonably good grounding in statistics. But their studies rarely include anything about information management — a discipline that encompasses the entire life cycle of data, from how they are acquired and stored to how they are organized, retrieved and maintained over time. That needs to change: data management should be woven into every course in science, as one of the foundations of knowledge. ■ A step too far? a base on the Moon, then send them to Mars. This idea immediately set off a debate that is still continuing, in which sceptics ask whether there is any point in returning to the Moon nearly half a century after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space beyond Earth? Yet that debate is both counter-productive — a new set of rockets could go to all of these places — and moot, because Bush’s vision never attracted the hoped-for budget increases. Indeed, a blue-riband commission reporting to US President Barack Obama this week (see page 153) finds the organizational malaise unchanged: NASA is still doing too much with too little. Without more money, the agency won’t be sending people anywhere beyond the International Space Station, which resides in low Earth orbit only 350 kilometres up. And even the ability to do that is in question: Ares I, the US rocket that would return Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. DATA The Obama administration must fund human space flight adequately, or stop speaking of ‘exploration’. A fter the space shuttle Columbia burned up during re-entry into Earth’s atmosphere in 2003, the board that was convened to investigate the disaster looked beyond its technical causes to NASA’s organizational malaise. For decades, the board pointed out, the shuttle programme had been trying to do too much with too little money. NASA desperately needed a clearer vision and a better-defined mission for human space flight. The next year, then-President George W. Bush attempted to supply that vision with a new long-term goal: first send astronauts to build 145 145-146 Editorials WF IF.indd 145 8/9/09 14:06:40 Silver Bullet? http://on.wsj.com/XCajtB
  • 9. www.nature.com/nature Data’s shameful neglect Vol 461 | Issue no. 7261 | 10 September 2009 Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. M ore and more often these days, a research project’s success is measured not just by the publications it produces, but also by the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed in ways that the original researchers could not have anticipated. All but a handful of disciplines still lack the technical, institutional and cultural frameworks required to support such open data access (see pages 168 and 171) — leading to a scandalous shortfall in the sharing of data by researchers (see page 160). This deficiency urgently needs to be addressed by funders, universities and the researchers themselves. Research funding agencies need to recognize that preservation of and access to digital data are central to their mission, and need to be supported accordingly. Organizations in the United Kingdom, for instance, have made a good start. The Joint Information Systems Committee, established by the seven UK research councils in 1993, has made data-sharing a priority, and has helped to establish a Digital Curation Centre, headquartered at the University of Edinburgh, to be a national focus for research and development into data issues. Other European agencies have also pursued initiatives. The United States, by contrast, is playing catch-up. Since 2005, a 29-member Interagency Working Group on Digital Data has been trying to get US funding agencies to develop plans for how they will support data archiving — and just as importantly, to develop policies on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back. They should all being moving forwards vigorously. What is more, funding agencies and researchers alike must ensure that they support not only the hardware needed to store the data, but also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline the tedious process of annotating data with a description of what the bits mean, which instrument collected them, which algorithms have been used to process them and so on — information that is essential if other scientists are to reuse the data effectively. Also necessary, especially in an era when data can be mixed and combined in unanticipated ways, is software that can keep track of which pieces of data came from whom. Such systems are essential if tenure and promotion committees are ever to give credit — as they should — to candidates’ track-record of “Data management data contribution. Who should host these data? Agencies should be woven and the research community together into every course in need to create the digital equivalent science.” of libraries: institutions that can take responsibility for preserving digital data and making them accessible over the long term. The university research libraries themselves are obvious candidates to assume this role. But whoever takes it on, data preservation will require robust, long-term funding. One potentially helpful initiative is the US National Science Foundation’s DataNet programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees. Finally, universities and individual disciplines need to undertake a vigorous programme of education and outreach about data. Consider, for example, that most university science students get a reasonably good grounding in statistics. But their studies rarely include anything about information management — a discipline that encompasses the entire life cycle of data, from how they are acquired and stored to how they are organized, retrieved and maintained over time. That needs to change: data management should be woven into every course in science, as one of the foundations of knowledge. ■ A step too far? a base on the Moon, then send them to Mars. This idea immediately set off a debate that is still continuing, in which sceptics ask whether there is any point in returning to the Moon nearly half a century after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space beyond Earth? Yet that debate is both counter-productive — a new set of rockets could go to all of these places — and moot, because Bush’s vision never attracted the hoped-for budget increases. Indeed, a blue-riband commission reporting to US President Barack Obama this week (see page 153) finds the organizational malaise unchanged: NASA is still doing too much with too little. Without more money, the agency won’t be sending people anywhere beyond the International Space Station, which resides in low Earth orbit only 350 kilometres up. And even the ability to do that is in question: Ares I, the US rocket that would return Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. DATA The Obama administration must fund human space flight adequately, or stop speaking of ‘exploration’. A fter the space shuttle Columbia burned up during re-entry into Earth’s atmosphere in 2003, the board that was convened to investigate the disaster looked beyond its technical causes to NASA’s organizational malaise. For decades, the board pointed out, the shuttle programme had been trying to do too much with too little money. NASA desperately needed a clearer vision and a better-defined mission for human space flight. The next year, then-President George W. Bush attempted to supply that vision with a new long-term goal: first send astronauts to build 145 145-146 Editorials WF IF.indd 145 8/9/09 14:06:40 Silver Bullet? http://on.wsj.com/XCajtB
  • 10. www.nature.com/nature Data’s shameful neglect Vol 461 | Issue no. 7261 | 10 September 2009 Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. M ore and more often these days, a research project’s success is measured not just by the publications it produces, but also by the data it makes available to the wider community. Pioneering archives such as GenBank have demonstrated just how powerful such legacy data sets can be for generating new discoveries — especially when data are combined from many laboratories and analysed in ways that the original researchers could not have anticipated. All but a handful of disciplines still lack the technical, institutional and cultural frameworks required to support such open data access (see pages 168 and 171) — leading to a scandalous shortfall in the sharing of data by researchers (see page 160). This deficiency urgently needs to be addressed by funders, universities and the researchers themselves. Research funding agencies need to recognize that preservation of and access to digital data are central to their mission, and need to be supported accordingly. Organizations in the United Kingdom, for instance, have made a good start. The Joint Information Systems Committee, established by the seven UK research councils in 1993, has made data-sharing a priority, and has helped to establish a Digital Curation Centre, headquartered at the University of Edinburgh, to be a national focus for research and development into data issues. Other European agencies have also pursued initiatives. The United States, by contrast, is playing catch-up. Since 2005, a 29-member Interagency Working Group on Digital Data has been trying to get US funding agencies to develop plans for how they will support data archiving — and just as importantly, to develop policies on what data should and should not be preserved, and what exceptions should be made for reasons such as patient privacy. Some agencies have taken the lead in doing so; many more are hanging back. They should all being moving forwards vigorously. What is more, funding agencies and researchers alike must ensure that they support not only the hardware needed to store the data, but also the software that will help investigators to do this. One important facet is metadata management software: tools that streamline the tedious process of annotating data with a description of what the bits mean, which instrument collected them, which algorithms have been used to process them and so on — information that is essential if other scientists are to reuse the data effectively. Also necessary, especially in an era when data can be mixed and combined in unanticipated ways, is software that can keep track of which pieces of data came from whom. Such systems are essential if tenure and promotion committees are ever to give credit — as they should — to candidates’ track-record of “Data management data contribution. Who should host these data? Agencies should be woven and the research community together into every course in need to create the digital equivalent science.” of libraries: institutions that can take responsibility for preserving digital data and making them accessible over the long term. The university research libraries themselves are obvious candidates to assume this role. But whoever takes it on, data preservation will require robust, long-term funding. One potentially helpful initiative is the US National Science Foundation’s DataNet programme, in which researchers are exploring financial mechanisms such as subscription services and membership fees. Finally, universities and individual disciplines need to undertake a vigorous programme of education and outreach about data. Consider, for example, that most university science students get a reasonably good grounding in statistics. But their studies rarely include anything about information management — a discipline that encompasses the entire life cycle of data, from how they are acquired and stored to how they are organized, retrieved and maintained over time. That needs to change: data management should be woven into every course in science, as one of the foundations of knowledge. ■ A step too far? a base on the Moon, then send them to Mars. This idea immediately set off a debate that is still continuing, in which sceptics ask whether there is any point in returning to the Moon nearly half a century after the first landings. Why not go to Mars directly, or visit nearEarth asteroids, or send people to service telescopes in the deep space beyond Earth? Yet that debate is both counter-productive — a new set of rockets could go to all of these places — and moot, because Bush’s vision never attracted the hoped-for budget increases. Indeed, a blue-riband commission reporting to US President Barack Obama this week (see page 153) finds the organizational malaise unchanged: NASA is still doing too much with too little. Without more money, the agency won’t be sending people anywhere beyond the International Space Station, which resides in low Earth orbit only 350 kilometres up. And even the ability to do that is in question: Ares I, the US rocket that would return Research cannot flourish if data are not preserved and made accessible. All concerned must act accordingly. DATA The Obama administration must fund human space flight adequately, or stop speaking of ‘exploration’. A fter the space shuttle Columbia burned up during re-entry into Earth’s atmosphere in 2003, the board that was convened to investigate the disaster looked beyond its technical causes to NASA’s organizational malaise. For decades, the board pointed out, the shuttle programme had been trying to do too much with too little money. NASA desperately needed a clearer vision and a better-defined mission for human space flight. The next year, then-President George W. Bush attempted to supply that vision with a new long-term goal: first send astronauts to build 145 145-146 Editorials WF IF.indd 145 8/9/09 14:06:40 Silver Bullet? http://on.wsj.com/XCajtB
  • 11. Repository Services • • • • • Data is easy to upload Landing page for data Citable reference for data Default licensing options Guarantees for long term archival
  • 12. Standard Metadata • Provenance metadata
 • Content metadata
 • • Metadata is locked in authors, title, publication date free text tags, categories, links Hard to interpret the data itself
  • 13. Data is the Bottleneck Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble† ⇤ Ontology Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es e of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk ‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu † School Abstract—While workflow technology has gained momentum in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc. I. I NTRODUCTION Scientific workflows have been increasingly used in the last decade as an instrument for data intensive scientific analysis. In these settings, workflows serve a dual function: first as detailed documentation of the method (i. e. the input sources and processing steps taken for the derivation of a certain data item) and second as re-usable, executable artifacts for data-intensive analysis. Workflows stitch together a variety of data manipulation activities such as data movement, data transformation or data visualization to serve the goals of the scientific study. The stitching is realized by the constructs made available by the workflow system used and is largely shaped by the environment in which the system operates and the function undertaken by the workflow. A variety of workflow systems are in use [10] [3] [7] [2] serving several scientific disciplines. A workflow is a software artifact, and as such once developed and tested, it can be shared and exchanged between scientists. Other scientists can then reuse existing workflows in their experiments, e.g., as sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and improves quality through shared workflow development by leveraging the expertise of previous users. Users can also re-purpose existing workflows to adapt them to their needs [4]. Emerging workflow repositories such as myExperiment [14] and CrowdLabs [8] have made publishing and finding workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the available workflows/fragments. One difficulty in understanding workflows is their complex nature. A workflow may contain several scientifically-significant analysis steps, combined with various other data preparation activities, and in different implementation styles depending on the environment and context in which the workflow is executed. The difficulty in understanding causes workflow developers to revert to starting from scratch rather than re-using existing fragments. Through an analysis of the current practices in scientific workflow development, we could gain insights on the creation of understandable and more effectively re-usable workflows. Specifically, we propose an analysis with the following objectives: 1) To reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence. 2) To identify workflow abstractions that would facilitate understandability and therefore effective re-use. 3) To detect potential information sources and heuristics that can be used to inform the development of tools for creating workflow abstractions. In this paper we present the result of an empirical analysis performed over 177 workflow descriptions from Taverna [10] and Wings [3]. Based on this analysis, we propose a catalogue of scientific workflow motifs. Motifs are provided through i) a characterization of the kinds of data-oriented activities that are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented within workflows, which we refer to as workflow-oriented motifs. It is worth mentioning that, although important, motifs that have to do with scheduling and mapping of workflows onto distributed resources [12] are out the scope of this paper. The paper is structured as follows. We begin by providing related work in Section II, which is followed in Section III by brief background information on Scientific Workflows, and the two systems that were subject to our analysis. Afterwards we describe the dataset and the general approach of our analysis. We present the detected scientific workflow motifs in Section IV and we highlight the main features of their distribution
  • 14. Data is the Bottleneck Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble† ⇤ Ontology Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es e of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk ‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu † School Abstract—While workflow technology has gained momentum in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc. I. I NTRODUCTION Scientific workflows have been increasingly used in the last decade as an instrument for data intensive scientific analysis. In these settings, workflows serve a dual function: first as detailed documentation of the method (i. e. the input sources and processing steps taken for the derivation of a certain data item) and second as re-usable, executable artifacts for data-intensive analysis. Workflows stitch together a variety of data manipulation activities such as data movement, data transformation or data visualization to serve the goals of the scientific study. The stitching is realized by the constructs made available by the workflow system used and is largely shaped by the environment in which the system operates and the function undertaken by the workflow. A variety of workflow systems are in use [10] [3] [7] [2] serving several scientific disciplines. A workflow is a software artifact, and as such once developed and tested, it can be shared and exchanged between scientists. Other scientists can then reuse existing workflows in their experiments, e.g., as sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and improves quality through shared workflow development by leveraging the expertise of previous users. Users can also re-purpose existing workflows to adapt them to their needs [4]. Emerging workflow repositories such as myExperiment [14] and CrowdLabs [8] have made publishing and finding workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the available workflows/fragments. One difficulty in understanding workflows is their complex nature. A workflow may contain several scientifically-significant analysis steps, combined with various other data preparation activities, and in different implementation styles depending on the environment and context in which the workflow is executed. The difficulty in understanding causes workflow developers to revert to starting from scratch rather than re-using existing fragments. Through an analysis of the current practices in scientific workflow development, we could gain insights on the creation of understandable and more effectively re-usable workflows. Specifically, we propose an analysis with the following objectives: 1) To reverse-engineer the set of current practices in workflow development through an analysis of empirical evidence. 2) To identify workflow abstractions that would facilitate understandability and therefore effective re-use. 3) To detect potential information sources and heuristics that can be used to inform the development of tools for creating workflow abstractions. In this paper we present the result of an empirical analysis performed over 177 workflow descriptions from Taverna [10] and Wings [3]. Based on this analysis, we propose a catalogue of scientific workflow motifs. Motifs are provided through i) a characterization of the kinds of data-oriented activities that are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented within workflows, which we refer to as workflow-oriented motifs. It is worth mentioning that, although important, motifs that have to do with scheduling and mapping of workflows onto distributed resources [12] are out the scope of this paper. The paper is structured as follows. We begin by providing related work in Section II, which is followed in Section III by brief background information on Scientific Workflows, and the two systems that were subject to our analysis. Afterwards we describe the dataset and the general approach of our analysis. We present the detected scientific workflow motifs in Section IV and we highlight the main features of their distribution Data-Oriented Motifs per Domain Fig. 3. Distribution of Data-Oriented Motifs per domain
  • 15. Data is the Bottleneck Common Motifs in Scientific Workflows: An Empirical Analysis Daniel Garijo⇤ , Pinar Alper † , Khalid Belhajjame† , Oscar Corcho⇤ , Yolanda Gil‡ , Carole Goble† ⇤ Ontology Engineering Group, Universidad Polit´ cnica de Madrid. {dgarijo, ocorcho}@fi.upm.es e of Computer Science, University of Manchester. {alperp, khalidb, carole.goble}@cs.manchester.ac.uk ‡ Information Sciences Institute, Department of Computer Science, University of Southern California. gil@isi.edu † School Abstract—While workflow technology has gained momentum in the last decade as a means for specifying and enacting computational experiments in modern science, reusing and repurposing existing workflows to build new scientific experiments is still a daunting task. This is partly due to the difficulty that scientists experience when attempting to understand existing workflows, which contain several data preparation and adaptation steps in addition to the scientifically significant analysis steps. One way to tackle the understandability problem is through providing abstractions that give a high-level view of activities undertaken within workflows. As a first step towards abstractions, we report in this paper on the results of a manual analysis performed over a set of real-world scientific workflows from Taverna and Wings systems. Our analysis has resulted in a set of scientific workflow motifs that outline i) the kinds of data intensive activities that are observed in workflows (data oriented motifs), and ii) the different manners in which activities are implemented within workflows (workflow oriented motifs). These motifs can be useful to inform workflow designers on the good and bad practices for workflow development, to inform the design of automated tools for the generation of workflow abstractions, etc. Fig. 3. [14] and CrowdLabs [8] have made publishing and finding workflows easier, but scientists still face the challenges of reuse, which amounts to fully understanding and exploiting the available workflows/fragments. One difficulty in understanding workflows is their complex nature. A workflow may contain several scientifically-significant analysis steps, combined with various other data preparation activities, and in different implementation styles depending on the environment and context in which the workflow is executed. The difficulty in understanding causes workflow developers to revert to starting from scratch rather than re-using existing fragments. Through an analysis of the current practices in scientific workflow development, we could gain insights on the creation of understandable and more effectively re-usable workflows. Specifically, we propose an analysis with the following objectives: Distribution of Data-Orientedpractices in work- domain 1) To reverse-engineer the set of current Motifs per I. I NTRODUCTION Scientific workflows have been increasingly used in the last decade as an instrument for data intensive scientific analysis. In these settings, workflows serve a dual function: first as detailed documentation of the method (i. e. the input sources and processing steps taken for the derivation of a certain data item) and second as re-usable, executable artifacts for data-intensive analysis. Workflows stitch together a variety of data manipulation activities such as data movement, data transformation or data visualization to serve the goals of the scientific study. The stitching is realized by the constructs made available by the workflow system used and is largely shaped by the environment in which the system operates and the function undertaken by the workflow. A variety of workflow systems are in use [10] [3] [7] [2] serving several scientific disciplines. A workflow is a software artifact, and as such once developed and tested, it can be shared and exchanged between scientists. Other scientists can then reuse existing workflows in their experiments, e.g., as sub-workflows [17]. Workflow reuse presents several advantages [4]. For example, it enables proper data citation and improves quality through shared workflow development by leveraging the expertise of previous users. Users can also re-purpose existing workflows to adapt them to their needs [4]. Emerging workflow repositories such as myExperiment flow development through an analysis of empirical evidence. 2) To identify workflow abstractions that would facilitate understandability and therefore effective re-use. 3) To detect potential information sources and heuristics that can be used to inform the development of tools for creating workflow abstractions. In this paper we present the result of an empirical analysis performed over 177 workflow descriptions from Taverna [10] and Wings [3]. Based on this analysis, we propose a catalogue of scientific workflow motifs. Motifs are provided through i) a characterization of the kinds of data-oriented activities that are carried out within workflows, which we refer to as dataoriented motifs, and ii) a characterization of the different manners in which those activity motifs are realized/implemented within workflows, which we refer to as workflow-oriented motifs. It is worth mentioning that, although important, motifs that have to do with scheduling and mapping of workflows onto distributed resources [12] are out the scope of this paper. The paper is structured as follows. We begin by providing related work in Section II, which is followed in Section III by brief background information on Scientific Workflows, and the two systems that were subject to our analysis. Afterwards we describe the dataset and the general approach of our analysis. We present the detected scientific workflow motifs in Section IV and we highlight the main features of their distribution Fig. 5. Fig. 3. Data-Preparation Motifs per Domain Data-Oriented Motifs per Domain Data Preparation Motifs in the Genomics Wo Distribution of Data-Oriented Motifs per domain
  • 16. Make Data Flourish From data to information to knowledge
  • 17. Make Data Flourish From data to information to knowledge Global identification of 
 data sets and data items Data uses a common syntax Papers explicitly 
 link to data Metadata expressed using
 shared vocabularies Capture the processes by which data is manipulated Track and publish explicit provenance information
  • 18. Make Data Flourish From data to information to knowledge Global identification of 
 data sets and data items Metadata expressed using
 shared vocabularies Capture the processes by "Someone who is not the person who collected the data can 
 which data is Data uses a common syntax experiment and data" - Shreejoy Tripathy manipulated understand the Papers explicitly 
 link to data Track and publish explicit provenance information
  • 19. Linked Data • • • • • Use existing Web infrastructure Everything gets a URI and usually a category Express typed relations between things (triples) Express sameness or difference Reuse identifiers as much as possible + =
  • 20. Salah, Alkim Almila Akdag, Cheng Gao, Krzysztof Suchecki, and Andrea Scharnhorst. 2012. “Need to Categorize: A Comparative Look at the Categories of Universal Decimal Classification System and Wikipedia.” Leonardo 45 (1) (February): 84-85. doi:10.1162/LEON_a_00344. (Preprint http://arxiv.org/abs/1105.5912v1)
  • 21. Linked Data for Science Neuroscience Information Framework (Ontologies, Semantic Wiki, Catalog) Nanopublications (small scientific assertions) Workflow Systems (WINGS, Taverna, …) Linked Science (tools) BioPortal (ontologies) Organic Data Publishing Rightfield
 (Semantic Wiki) (systems biology) Bio2RDF (big linked data)
  • 23. Hellenic FBD Hellenic PD Crime Reports UK Ox Points NHS (EnAKTing) Ren. Energy Generators Open Election Data Project EU Institutions CO2 Emission (EnAKTing) Energy (EnAKTing) EEA Mortality (EnAKTing) Ordnance Survey legislation data.gov.uk UK Postcodes ESD standards ISTAT Immigration Lichfield Spending Scotland Pupils & Exams Traffic Scotland Data Gov.ie reference data.gov. uk London Gazette TWC LOGD Eurostat (FUB) CORDIS CORDIS (FUB) (RKB Explorer) Linked EDGAR (Ontology Central) EURES (Ontology Central) GovTrack Finnish Municipalities New York Times Italian public schools IdRef Sudoc Greek DBpedia Geo Names World Factbook Geo Species UMBEL Freebase DBLP (FU Berlin) dataopenac-uk TCM Gene DIT Daily Med SIDER Twarql EUNIS PDB SMC Journals Ocean Drilling Codices Turismo de Zaragoza Janus AMP Climbing Linked GeoData Alpine Ski Austria AEMET Metoffice Weather Forecasts Yahoo! Geo Planet National Radioactivity JP ChEMBL Open Data Thesaurus Sears DBLP (RKB Explorer) STW GESIS Budapest Pisa RESEX Scholarometer IRIT ACM NVD IBM DEPLOY Newcastle RAE2001 LOCAH Roma CiteSeer Courseware dotAC ePrints IEEE RISKS PROSITE Affymetrix SISVU GEMET Airports lobid Organisations ECS (RKB Explorer) HGNC (Bio2RDF) PubMed ProDom VIVO Cornell STITCH Linked Open Colors SGD Gene Ontology AGROV OC Product DB Weather Stations Swedish Open Cultural Heritage LAAS NSF KISTI JISC WordNet (RKB Explorer) EARTh ECS Southampton EPrints VIVO Indiana UniProt LODE WordNet (W3C) Wiki ECS Southampton Pfam LinkedCT Taxono my Cornetto NSZL Catalog P20 Eurécom totl.net WordNet (VUA) lobid Resources UN/ LOCODE Drug Bank Enipedia Lexvo DBLP (L3S) ERA Diseasome lingvoj Europeana Deutsche Biographie OAI data dcs Uberblic YAGO Open Cyc BibBase OS dbpedia lite Norwegian MeSH VIAF UB Mannheim Ulm data bnf.fr BNB Project Gutenberg Rådata nå! GND ndlna Calames DDC iServe riese GeoWord Net El Viajero Tourism URI Burner LIBRIS LCSH MARC Codes List PSH RDF Book Mashup Open Calais ntnusc Thesaurus W SW Dog Food Portuguese DBpedia LEM RAMEAU SH LinkedL CCN Sudoc UniProt US Census (rdfabout) Piedmont Accomodations Linked MDB t4gm info Open Library (Talis) theses. fr my Experiment flickr wrappr NDL subjects Plymouth Reading Lists Revyu Fishes of Texas (rdfabout) Scotland Geography Pokedex Event Media US SEC Semantic XBRL FTS Goodwin Family NTU Resource Lists Open Library SSW Thesaur us Didactal ia DBpedia Linked Sensor Data (Kno.e.sis) Eurostat Chronicling America Telegraphis Geo Linked Data Source Code Ecosystem Linked Data semantic web.org BBC Music BBC Wildlife Finder NASA (Data Incubator) transport data.gov. uk Eurostat Classical (DB Tune) Taxon Concept LOIUS Poképédia St. Andrews Resource Lists Manchester Reading Lists gnoss Last.FM (rdfize) BBC Program mes Rechtspraak. nl Openly Local data.gov.uk intervals Music Brainz (DBTune) Jamendo (DBtune) Ontos News Portal Sussex Reading Lists Bricklink yovisto Semantic Tweet Linked Crunchbase RDF ohloh (Data Incubator) (DBTune) OpenEI statistics data.gov. uk GovWILD Brazilian Politicians educatio n.data.g ov.uk Music Brainz (zitgist) Discogs FanHubz patents data.go v.uk research data.gov. uk Klappstuhlclub Lotico (Data Incubator) Last.FM artists Population (EnAKTing) reegle Surge Radio tags2con delicious Slideshare 2RDF (DBTune) Music Brainz John Peel (DBTune) EUTC Productions business data.gov. uk Crime (EnAKTing) GTAA Magnatune DB Tropes Moseley Folk Linked User Feedback LOV Audio Scrobbler OMIM MGI InterPro Smart Link Product Types Ontology Open Corporates Italian Museums Amsterdam Museum Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ UniParc UniRef UniSTS GeneID Linked Open Numbers Reactome OGOLOD KEGG Pathway Medi Care Google Art wrapper meducator KEGG Drug Pub Chem UniPath way Chem2 Bio2RDF Homolo Gene VIVO UF ECCOTCP bible ontology KEGG Enzyme PBAC KEGG Reaction KEGG Compound KEGG Glycan Media Geographic Publications User-generated content Government Cross-domain Life sciences As of September 2011
  • 24. Eurostat Finnish Municipalities 0 (rdfabout) Scotland Geography US Census (rdfabout) GeoWord Net Piedmont Accomodations Italian public schools El Viajero Tourism Greek DBpedia World Factbook Geo Species UMBEL Freebase Project Gutenberg dbpedia lite DBLP (FU Berlin) dataopenac-uk TCM Gene DIT Daily Med SIDER SMC Journals Ocean Drilling Codices Turismo de Zaragoza Janus AMP EUNIS Climbing Twarql Linked GeoData WordNet (W3C) Alpine Ski Austria AEMET Metoffice Weather Forecasts WordNet (RKB Explorer) UniProt (Bio2RDF) Affymetrix SISVU GEMET ChEMBL Open Data Thesaurus Product DB Airports National Radioactivity JP LODE Taxono my Sears Linked Open Colors PDB PROSITE Open Corporates Italian Museums PubMed MGI InterPro Amsterdam Museum Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/ UniRef HGNC SGD Gene Ontology OMIM UniParc UniSTS Linked Open Numbers Reactome OGOLOD Pub Chem GeneID ECS Southampton EPrints lobid Organisations ECS (RKB Explorer) DBLP (RKB Explorer) UniPath way Chem2 Bio2RDF Swedish Open Cultural Heritage STW GESIS Budapest Pisa RESEX Scholarometer IRIT ACM NVD IBM DEPLOY Newcastle RAE2001 LOCAH Roma CiteSeer Courseware KEGG Drug KEGG Pathway Homolo Gene dotAC ePrints LAAS NSF KISTI JISC VIVO UF ECCOTCP bible ontology KEGG Enzyme PBAC KEGG Reaction KEGG Compound IEEE RISKS VIVO Cornell STITCH Medi Care Google Art wrapper meducator Wiki ECS Southampton VIVO Indiana ProDom Smart Link Product Types Ontology NSZL Catalog Pfam LinkedCT AGROV OC EARTh Weather Stations Yahoo! Geo Planet Cornetto lobid Resources P20 Eurécom totl.net WordNet (VUA) Ulm UN/ LOCODE Drug Bank Enipedia Lexvo DBLP (L3S) ERA Diseasome lingvoj Europeana Deutsche Biographie OAI data dcs Uberblic YAGO Open Cyc BibBase OS VIAF UB Mannheim Calames BNB UniProt US SEC Semantic XBRL FTS Geo Names riese 8 okt. 2007 Linked EDGAR (Ontology Central) EURES (Ontology Central) GovTrack URI Burner Norwegian MeSH GND ndlna data bnf.fr iServe Fishes of Texas Linked Sensor Data (Kno.e.sis) Eurostat 1 mei 2007 CORDIS (FUB) (RKB Explorer) IdRef Sudoc DDC Open Calais Rådata nå! PSH RDF Book Mashup DBpedia Geo Linked Data CORDIS New York Times LIBRIS LCSH MARC Codes List Sudoc SW Dog Food Portuguese DBpedia ntnusc Thesaurus W 23 feb. 2012 TWC LOGD Eurostat (FUB) Event Media LEM RAMEAU SH LinkedL CCN 14 jul. 2009 Data Gov.ie 100 London Gazette NASA (Data Incubator) transport data.gov. uk Linked MDB 27 mrt. 2009 Traffic Scotland data.gov.uk intervals flickr wrappr t4gm info Open Library (Talis) theses. fr my Experiment 5 mrt. 2009 Scotland Pupils & Exams reference data.gov. uk Pokedex NDL subjects Plymouth Reading Lists Revyu Taxon Concept LOIUS Chronicling America Telegraphis 200 Goodwin Family NTU Resource Lists Open Library SSW Thesaur us semantic web.org BBC Music BBC Wildlife Finder Rechtspraak. nl Openly Local Classical (DB Tune) Source Code Ecosystem Linked Data Didactal ia 18 sep. 2008 ISTAT Immigration Lichfield Spending OpenEI statistics data.gov. uk GovWILD ESD standards educatio n.data.g ov.uk Ordnance Survey legislation data.gov.uk UK Postcodes Brazilian Politicians 300 Poképédia Last.FM (rdfize) BBC Program mes Ontos News Portal Manchester Reading Lists gnoss 31 mrt. 2008 Open Election Data Project EU Institutions CO2 Emission (EnAKTing) Energy (EnAKTing) EEA Mortality (EnAKTing) Jamendo (DBtune) 28 feb. 2008 Ren. Energy Generators (DBTune) patents data.go v.uk research data.gov. uk Music Brainz (DBTune) FanHubz Last.FM artists Population (EnAKTing) NHS (EnAKTing) (Data Incubator) yovisto Semantic Tweet Linked Crunchbase RDF ohloh Discogs 10 nov. 2007 Ox Points reegle business data.gov. uk Crime (EnAKTing) Surge Radio Music Brainz (zitgist) (Data Incubator) 7 nov. 2007 Crime Reports UK 400 Lotico St. Andrews Resource Lists 19 sep. 2011 Hellenic PD EUTC Productions Klappstuhlclub Sussex Reading Lists Bricklink (DBTune) Music Brainz John Peel (DBTune) tags2con delicious Slideshare 2RDF 22 sep. 2010 Hellenic FBD GTAA Magnatune DB Tropes Moseley Folk Linked User Feedback LOV Audio Scrobbler KEGG Glycan Media Geographic Publications User-generated content Government Cross-domain Life sciences As of September 2011
  • 27. LODStats Analysis http://stats.lod2.eu 140 134 2% 4% 4% 105 7% 84 35 HTTP Other 12 Unknown response 28% 30 No URL provided 11 XML 6 Connection reset 0 Not RDF 22 10% 45% 70 Not RDF Connection reset Unknown response XML No URL provided Other HTTP Hoekstra, Rinke; Groth, Paul (2013): Distribution of Errors Reported by LOD2 LODStats Project. figshare. http://dx.doi.org/10.6084/m9.figshare.695949 299 out of 639 datasets have errors
  • 28. An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 29. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 30. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model “Um, but doesn’t TTL have incompatible semantics?” for Monitoring and Analysing Dynamics of Complex Human Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 31. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model “Um, but doesn’t TTL have incompatible semantics?” for Monitoring and Analysing Dynamics of Complex Human “Nah, silly, who cares? We’ll just start a new W3C WG!” Behaviour Tibor Bossea*, Mark Hoogendoorna, Michel C.A. Kleina, and Jan Treura a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 32. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model “Um, but doesn’t TTL have incompatible semantics?” for Monitoring and Analysing Dynamics of Complex Human “Nah, silly, who cares? We’ll just start a new W3C WG!” Behaviour “Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur even then, we can’t just publish the model as is!” Tibor Bosse a* a a a a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands Abstract. In ambient intelligent systems, monitoring of a human could consist of more complex tasks than merely identifying whether a certain value of a sensor is above a certain threshold. Instead, such tasks may involve monitoring of complex dynamic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 33. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model “Um, but doesn’t TTL have incompatible semantics?” for Monitoring and Analysing Dynamics of Complex Human “Nah, silly, who cares? We’ll just start a new W3C WG!” Behaviour “Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur even then, we can’t just publish the model as is!” Tibor Bosse a* a a a a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands “”No worries, just add the provenance using PROV-O, annotate the PDF Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.” with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying whether a certain value of a sensor is above a certain threshold. Instead, such namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper presents a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 34. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model “Um, but doesn’t TTL have incompatible semantics?” for Monitoring and Analysing Dynamics of Complex Human “Nah, silly, who cares? We’ll just start a new W3C WG!” Behaviour “Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur even then, we can’t just publish the model as is!” Tibor Bosse a* a a a a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands “”No worries, just add the provenance using PROV-O, annotate the PDF Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.” with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying whether a certain value of a sensor is above a certain threshold. Instead, such namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper “And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) presents the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex monitoring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 35. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model “Um, but doesn’t TTL have incompatible semantics?” for Monitoring and Analysing Dynamics of Complex Human “Nah, silly, who cares? We’ll just start a new W3C WG!” Behaviour “Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur even then, we can’t just publish the model as is!” Tibor Bosse a* a a a a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands “”No worries, just add the provenance using PROV-O, annotate the PDF Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.” with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying whether a certain value of a sensor is above a certain threshold. Instead, such namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper “And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) presents the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni“Noo! You’ll need persistent Cool URI’s and publish your endpoint toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers for eternity of course. Duh.” within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. Keywords: ambient agent model, human behaviour, dynamics Journal of Ambient Intelligence and Smart Environments
  • 36. “Whoah! Cool, you should publish that stuff as Linked Data” An Ambient Agent Model “Um, but doesn’t TTL have incompatible semantics?” for Monitoring and Analysing Dynamics of Complex Human “Nah, silly, who cares? We’ll just start a new W3C WG!” Behaviour “Uh, ok, if we must. But, Mark Hoogendoorn , Michel C.A. Klein , and Jan Treur even then, we can’t just publish the model as is!” Tibor Bosse a* a a a a Vrije Universiteit Amsterdam, Department of Artificial Intelligence, de Boelelaan 1081, 1081 HV Amsterdam, The Netherlands “”No worries, just add the provenance using PROV-O, annotate the PDF Abstract. In ambient intelligent systems, monitoring of a human could consist of more link to other research using CITO.” with OA, tasks may involvetasks than merelycomplex dyand complex monitoring of identifying whether a certain value of a sensor is above a certain threshold. Instead, such namic interactions between human and environment. In order to enable such more complex types of monitoring, this paper “And that’s it?” a generic agent-based framework. The framework consists of support on various levels of system design, namely: (1) presents the top level, including the interaction between agents, (2) the agent level, providing support on the design of individual agents, and (3) the level of monitoring complex dynamic behaviour, allowing the specification of the aforementioned complex moni“Noo! You’ll need persistent Cool URI’s and publish your endpoint toring properties within the agents. The approach is exemplified by a large case study concerning the assessment of driving behaviour, and is applied to two smaller cases as well (concerning fall detection of elderly, and assistance of naval operations, respectively), which are briefly described. These case studies have illustrated that the presented framework enables developers for eternity of course. Duh.” within ambient intelligence to build systems with more expressiveness regarding their monitoring focus. Moreover, they have shown that the framework is easy to use and applicable in a wide variety of domains. “Eh?” Keywords: ambient agent model, human behaviour, dynamics “Oh... and don’t forget all data collected by the agents, in all runs, including the first experiments. Now THAT would be ultra cool. “Ngh!?” Journal of Ambient Intelligence and Smart Environments
  • 37. Creating Linked Data http://linkeddatabook.com • • • • • • • • Decide on resources to describe Mint cool URIs Decide on triples to include Describe the dataset Choose vocabularies Define terms Make links Publish to triple store/annotations/dump
  • 38. If this already is tedious... ... can you expect researchers to publish Linked Research Data?
  • 39. If this already is tedious... ... can you expect researchers to publish Linked Research Data?
  • 41. We need to make publishing Linked Research Data... ...a lot easier... ... more persistent ... ... and more rewarding. Linked Data is sóóóóó 2005
  • 42. We need to make publishing Linked Research Data... ...a lot easier... ... more persistent ... ... and more rewarding. “People as frontier in computing” - Haym Hirsch, Pietro Michelucci
  • 43. We need to make publishing Linked Research Data... ...a lot easier... ... more persistent ... ... and more rewarding. http://linkitup.data2semantics.org
  • 44. We need to make publishing Linked Research Data... ...a lot easier... ... more persistent ... • • • • • • ... and more rewarding. Lightweight web application Interface to API of existing data repositories Enrich metadata by linking to (linked) data resources Human in the Loop Track provenance Publish rich metadata as new data publication Nanopublication + OA 
 + PROV-O + DCTerms + FOAF http://linkitup.data2semantics.org
  • 45. We need to make publishing Linked Research Data... ...a lot easier... ... more persistent ... • • • • • • ... and more rewarding. Lightweight web application Interface to API of existing data repositories Enrich metadata by linking to (linked) data resources Human in the Loop Track provenance Publish rich metadata as new data publication Nanopublication + OA 
 + PROV-O + DCTerms + FOAF http://linkitup.data2semantics.org
  • 46.
  • 47.
  • 48.
  • 49. Use tags & categories to query the DBpedia endpoint
  • 50. Use authors to query the DBLP endpoint
  • 51. Use tags & categories to query the NeuroLex endpoint
  • 52. Use author names to query the ORCID API
  • 53. Extract references to resolve to CrossRef DOIs
  • 54. Every operation is tracked automatically
  • 56. Review selected links, and publish to Figshare
  • 57.
  • 58. Plugins Name DBLP ORCID LinkedLifeData Crossref Elsevier LDR DANS EASY SameAs DBPedia Spotlight DBPedia/Wikipedia NeuroLex NIF Registry your Service SPARQL REST REST Custom REST Custom REST REST SPARQL SPARQL REST data Source Authors Authors Tags & Categories Citations Tags & Categories Tags & Categories Links Description, Tags & Categories Tags & Categories Tags & Categories Tags & Categories set Links to Author Identifiers Author Identifiers Biomedical Entities DOIs Funding agencies General Datasets General Entities General Entities General Entities Neuroscience Concepts Neuroscience Datasets here
  • 59. What does this solve? http://linkeddatabook.com • • • • • • • • Decide on resources to describe Mint cool URIs Decide on triples to include Describe the dataset Choose vocabularies Define terms Make links Publish to triple store/annotations/dump
  • 60. What does this solve? http://linkeddatabook.com • • • • • • • • You decide on resources to describe We mint cool URIs We decide on triples to include We describe the dataset We choose vocabularies We define terms Together we make links We publish the dataset to a reliable repository
  • 61. Coming up… • • • • • • Publish directly from Dropbox, Github, … Reconstruct provenance information (http://git2prov.org) Analyze, convert and enrich on the fly Generate a data report for advertisement purposes Measure for information content of datasets (“D-Index”) Integrate a data dashboard
  • 62. 84 70 12 22 30 HTTP 11 Other 6 No URL provided 0 XML 35 Unknown response … enhancing the data publication… 105 Connection reset http://linkitup.data2semantics.org 134 Not RDF linkitup 140 … increasing findability … … boosting reusability … … result is stored persistently http://git2prov.org http://semweb.cs.vu.nl/provoviz http://yasgui.data2semantics.org http://www.data2semantics.org