The document summarizes the evolution of e-research over three generations from 1981 to the present. The first generation saw early adopters using tools within their disciplines with some reuse. The second generation was characterized by increased reuse of tools, data and methods across areas. The third generation is defined by radical sharing of resources globally across any discipline through social networks and reusable research objects. The document also discusses several specific projects and tools that exemplify each generation of e-research including myExperiment, Galaxy, and SALAMI.
2. MathsPhysics
Medical
electronics PhD in distributed declarative
programming language design
Hypermedia
Large scale
Distributed
Systems
Semantic Sensor Networks
Web
Science
Devices
Amorphous
Computing
Digital
Social
Research
Equator
e-Science
MusicElectronics Programming
Transputers
Temporal
Media
Computational Musicology
Advanced
Knowledge
Technologies
Semantic
Web
Process
Networks
myExperiment
Web 2
Statistics
Grid
Linked
Data
1981
2010
Environmental
sensing
Networks
VREs
MITAJGH PH WH
PEOPLEOPLE Agents
Semantic
Grid
e-Laboratories
Workflows
QBH
3. Overview
Generation 1: Early adopters
Generation 2: Embedding
Generation 3: Radical sharing
SALAMI
A case study in 3rd generation e-Research
4. e-Science
• e-Science was defined by John Taylor (Director
General of the UK Research Councils) as
global collaboration in key areas of science
and the next generation of infrastructure that
will enable it
• e-Science was the name of the destination
• It became the name of the journey
• When we arrive, the destination is just called
science
7. ...the imminent flood of
scientific data expected
from the next generation of
experiments, simulations,
sensors and satellites
Tony Hey and Anne Trefethen
Source: CERN, CERN-EX-0712023, http://cdsweb.cern.ch/record/1203203
9. • Workflows are the new rock
and roll
• Machinery for coordinating
the execution of (scientific)
services and linking together
(scientific) resources
• The era of Service Oriented
Applications
• Repetitive and mundane
boring stuff made easier
Carole Goble
E. Science laboris
15. empower
to equip or supply with an ability;
enable
service
the performance of duties or the
duties performed as or by a
waiter or servant
16. Early adoptors of tools.
Characterised by researchers using tools within their
particular problem area, with some re-use of tools, data
and methods within the discipline.
Traditional publishing is supplemented by publication of
some digital artefacts like workflows and links to data.
Science is accelerated and practice beginning to shift to
emphasise in silico work.
1st Generation Summary
Thanks to Iain Buchan
and the chipmunks
18. • Paul writes workflows for identifying biological
pathways implicated in resistance to
Trypanosomiasis in cattle
• Paul meets Jo. Jo is investigating Whipworm in
mouse.
• Jo reuses one of Paul’s workflow without change.
• Jo identifies the biological pathways involved in
sex dependence in the mouse model, believed to
be involved in the ability of mice to expel the
parasite.
• Previously a manual two year study by Jo had
failed to do this.
Reuse, Recycling, Repurposing
Carole Goble
19. Carole Goble “e-Science
is me-Science: What do
Scientists want?”, EGEE
2006
“There are these great
collaboration tools that
12-year-olds are using.
It’s all back to front.”
Robert Stevens
20. “A biologist would rather share their
toothbrush than their gene name”
Mike Ashburner and others
Professor in Dept of Genetics,
University of Cambridge, UK
25. “Facebook for Scientists”
...but different to Facebook!
A repository of research
methods
A community social network
of people and things
A Social Virtual Research
Environment
A probe into researcher
behaviour
Open source (BSD) Ruby on
Rails app
REST and SPARQL interfaces,
supports Linked Data
Inspiration for: BioCatalogue,
MethodBox and SysMO-SEEK
myExperiment currently has 4400 members, 236 groups, 1336
workflows, 351 files and 141 packs
27. Visits to www.myexperiment.org (Oct 2010)
Global collaboration
in key areas of
science and the next
generation of
infrastructure that
will enable it
http://wiki.myexperiment.org
29. Methods should be first class citizens
Celebrate the flux! Let the data flow
through the pipelines. Nail down the
methods not the data!
Towards “Linked Open Methods”
Though this be madness, yet there is method in it
* Polonius in Hamlet ** Sean Bechhofer in Manchester *** Not the e-Science Envoy
*
***
**
Data bonanza => Methods bonanza!
30. It’s not just the data
And what other people do with it
...that you never thought of
It’s what you do with it that counts
32. Research Objects enable data-intensive research to be:
1. Replayable – go back and see what happened
2. Repeatable – run the experiment again
3. Reproducible – independent expt to reproduce
4. Reusable – use as part of new experiments
5. Repurposeable – reuse the pieces in new expt
6. Reliable – robust under automation
7. Referenceable – citable and traceable
The Six Rs of Research Object Behaviours
http://blog.openwetware.org/deroure/?p=56
36. Projects delivering now.
Some institutional embedding.
Key characteristic is re-use – of the increasing pool of
tools, data and methods across areas/disciplines.
Contain some freestanding, recombinant, reproducible
research objects.
New scientific practices are established and opportunities
arise for completely new scientific investigations.
Some expert curation.
2nd Generation Summary
38. 4th Paradigm
The Fourth Paradigm:
Data-Intensive
Scientific Discovery
Presenting the first
broad look at the rapidly
emerging field of data-
intensive science
http://research.microsoft.com/en-us/collaboration/fourthparadigm/
43. “…to discover proteins that interact with transmembrane
proteins, particularly those that can be related to neuro-
degenerative diseases in which amyloids play a significant role”
1) Taverna provenance exposed as RDF
2) myExperiment RDF document for a protein discovery workflow
3) Mocked-up BioCatalogue document using myExperiment RDF
data as example
4) Provisional RDF documents obtained from the ConceptWiki
(conceptwiki.org) development server
5) An RDF document for an example protein, obtained from the RDF
interface of the UniProt web site
A Bioinformatics Experiment Scott Marshall
Marco Roos
48. The solutions we'll be delivering in 5 years
Characterised by global reuse of tools, data and methods
across any discipline, and surfacing the right levels of
complexity for the researcher.
Routine use.
Key characteristic is radical sharing.
Research is significantly data driven – plundering the
backlog of data, results and methods.
Publishing by the social network
Increasing automation and decision-support for the
researcher – the VRE becomes assistive.
Curation is autonomic and social.
3rd Generation Summary
49.
50. Easy and low risk to start
Progress to advanced skills
For researchers
No obligation
Go as far as you want
Find a service & relax
Intellectual ramps
Malcolm Atkinson
59. The SALAMI collaboration
• DDeR (e-Research South), J. Stephen Downie (Illinois) and
Ichiro Fujinaga (McGill)
• NCSA donating 250,000 supercomputer hours
• 350,000 pieces of music (23,000 hours)
– Internet Archive, DRAM, IMIRSEL, McGill
• Feature analysis and structural analysis
• Music Ontology by Yves Raimond (BBC)
• Musicologists from McGill and Southampton
• Sharing of analyses
http://salami.music.mcgill.ca
62. MIREX Overview
• Began in 2005
• Tasks defined by community debate
• Data sets collected and/or donated
• Participants submit code to IMIRSEL
• Code rarely works first try
• Huge labour consumption getting
programs to work
• Meet at ISMIR to discuss results Stephen Downie
http://www.music-ir.org/mirex
63. MIREX TASKS
Audio Artist Identification Audio Onset Detection
Audio Beat Tracking Audio Tag Classification
Audio Chord Detection Audio Tempo Extraction
Audio Classical Composer ID Multiple F0 Estimation
Audio Cover Song Identification Multiple F0 Note Detection
Audio Drum Detection Query-by-Singing/Humming
Audio Genre Classification Query-by-Tapping
Audio Key Finding Score Following
Audio Melody Extraction Symbolic Genre Classification
Audio Mood Classification Symbolic Key Finding
Audio Music Similarity Symbolic Melodic Similarity
68. “Again, it [the Analytical
Engine] might act upon
other things besides
number, were objects
found whose mutual
fundamental relations
could be expressed by
those of the abstract
science of operations,
and which should be
also susceptible of
adaptations to the action
of the operating notation
and mechanism of the
engine...”
69. “Supposing, for instance,
that the fundamental
relations of pitched
sounds in the science of
harmony and of musical
composition were
susceptible of such
expression and
adaptations, the engine
might compose elaborate
and scientific pieces of
music of any degree of
complexity or extent.”
Ada, The Enchantress of
Numbers: Poetical Science
by Betty Alexandra Toole
http://www.well.com/user/adatoole/
Betty Alexandra Toole
70. I can write a workflow that creates
workflows based on those of others, and
automatically modify it – think genetic
mutation and crossovers. Who owns it?
I can register a query over an increasing
number and diversity of “linked data”
sources to ask new research questions.
http://eresearch-ethics.org/
The computer can learn from the activities of 1,000,000
scientists – and be indistinguishable from them?
What about the ethics of Citizen Social Science? Of citizens
designing experiments?
72. david.deroure@oerc.ox.ac.uk
Thanks to: Jeremy Frey & CombeChem; Carole Goble, myGrid and
myExperiment; Iain Buchan & Obesity e-Lab; Sean Bechhofer; Doug Kell;
Marco Roos; Lucy Yardley; Arfon Smith; Malcolm Atkinson; Stephen
Downie, Kevin Page, Ben Fields, Ashley Burgoyne and NEMA/SALAMI;
Betty Toole.
http://www.myexperiment.org/packs/153
Editor's Notes
Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
First something about words. This definition of e-Science is important – it reminds us that it isn’t just about technology but about people working together and being empowered by technology – and the emphasis on “science” reminds us that ultimately success is measured by new scientific outcome.At the turn of the decade this was a vision of the future. A programme was created called e-Science. The projects doing the innovation were labelled as “e-Science”. By the time we arrive, it’s just “science”. So “e-Science” has become the name of the journey rather than the destination. Note that the innovation that takes us to the destination isn’t solely in the custody of e-Science projects – there’s a lot of relevant work going on that doesn’t carry that label.Note also that when we say “e-Science” we actually mean “e-Research”! We sometimes forget to say that.
First something about words. This definition of e-Science is important – it reminds us that it isn’t just about technology but about people working together and being empowered by technology – and the emphasis on “science” reminds us that ultimately success is measured by new scientific outcome.At the turn of the decade this was a vision of the future. A programme was created called e-Science. The projects doing the innovation were labelled as “e-Science”. By the time we arrive, it’s just “science”. So “e-Science” has become the name of the journey rather than the destination. Note that the innovation that takes us to the destination isn’t solely in the custody of e-Science projects – there’s a lot of relevant work going on that doesn’t carry that label.Note also that when we say “e-Science” we actually mean “e-Research”! We sometimes forget to say that.
Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
CERN teams up with Leaders in Information Technology to build giant Data GridData accumulation rate: 10 Petabytes per year (equivalent to about 20 million CD-ROMs).http://public.web.cern.ch/press/pressreleases/Releases2001/PR11.01ECERNopenlab.html
Scientific workflow systems are a key automation technique for systematically handling the data deluge and giving us the “workflow” as a new sharable artefact of digital science – to record, repeat, reproduce and repurpose an experiment.This is an iconic slide by Carole Goble which is much repeated, reproduced and repurposed!
Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
What we didn’t see much in phase 1 was sharing and reuse, but this is essential to harnessing of the new technology.The story on this slide involves sharing in a corridor and we will go on to see how we do it digitally! But it’s an important motivation. It led to new science.
myExperiment in one slide! It’s a “boutique” Web site with the largest public collection of scientific workflows. For lots more information see the myExperiment wiki http://wiki.myexperiment.org/BioCatalogue is a registry of Web Service in the life sciences and is directly based on the myExperiment experience. Sysmo and Methodbox grew from the myExperiment codebase – methodbox is an e-Social Science e-Laboratory for sharing and analysing data, and sysmo is customised to the systems biology domain. Seehttp://www.biocatalogue.org/http://www.methodbox.org/http://www.sysmo-db.org/
This is reflected in a third distinctive – the pack. This is Paul Fishers pack from the Tryps example.Some packs contain example input and output data so workflows can be checked for “decay” (they don’t actually rot, but the world changes round them).While others are looking at semantically enhanced publication, we are asking “what is the shared artefact of future research?” We come at the same problem from the other side. We have it surrounded! Our approach relieves us of the paper mindest – so, for example, a Research Object could contain information for many audiences and purposes, with a commonly interpreted core (social scientists will recognise the idea of a “boundary object”).
Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
First something about words. This definition of e-Science is important – it reminds us that it isn’t just about technology but about people working together and being empowered by technology – and the emphasis on “science” reminds us that ultimately success is measured by new scientific outcome.At the turn of the decade this was a vision of the future. A programme was created called e-Science. The projects doing the innovation were labelled as “e-Science”. By the time we arrive, it’s just “science”. So “e-Science” has become the name of the journey rather than the destination. Note that the innovation that takes us to the destination isn’t solely in the custody of e-Science projects – there’s a lot of relevant work going on that doesn’t carry that label.Note also that when we say “e-Science” we actually mean “e-Research”! We sometimes forget to say that.
Now we look at myExperiment as a probe into the future behaviour of researchers. For example, these workflows by Francois Belleau show what could be described as another level of working – building on the new tooling.
Here we see bioinformaticians assembling the resources they need to answer a research question – and also demonstrating what the methods section of the future paper needs to look like.They are using Linked Data. We see the power – ease of assembly. This could be where the new computer science challenges lie in e-Research.
From The Galileo Project web site: http://galileo.rice.edu/sci/instruments/telescope.html- The earliest known illustration of a telescope. Giovanpattista della Porta included this sketch in a letter written in August 1609 - porta-sketchJohannes Hevelius (Poland, 1611-1687) observing with one of his telescopes (Source: Selenographia, 1647)Hubble_earth_horz and hubble - from http://hubble.nasa.gov/. Very Large Array from http://images.nrao.edu/Telescopes. Copyright requirement - include "NRAO/AUI/NSF" on slide.
Today I’m going to talk about the trajectory of e-Science – from its conception through examples of 3 generations, and I’ll reflect on how we are moving from generation 2 to generation 3. Different disciplines and especially communities may be in different stages of evolution.
That example comes from a Digging into Data project with the best project acronym ever. The projects is conducting a massive structural analysis of music in the internet archibe, to support musicologists. It illustrates many of the things we are now seeing in e-Research – crowdsourcing, annotation, community software development, high performance computation, data publication. This project involves UIUC, McGill and Oxford – and the supercomputer time is donated by NCSA.
That example comes from a Digging into Data project with the best project acronym ever. The projects is conducting a massive structural analysis of music in the internet archibe, to support musicologists. It illustrates many of the things we are now seeing in e-Research – crowdsourcing, annotation, community software development, high performance computation, data publication. This project involves UIUC, McGill and Oxford – and the supercomputer time is donated by NCSA.