The Challenges of Making Data Travel, by Sabina Leonelli

The Challenges of Making Data Travel
Sabina Leonelli
Exeter Centre for the Study of Life Sciences (Egenis)
& Department of Sociology, Philosophy and
Anthropology
University of Exeter
@sabinaleonelli
www.datastudies.eu

Outline
• The Potential of Open Data
• Data Journeys:
– Challenges of collection
– Challenges of re-use
– Challenges of openness
– The Open Data divide
• Conclusions

Openness in Science
Long history of openness as a key norm for science: public scrutiny,
transparency and reproducibility of results define what science is,
how it works, what counts as a research output
Equally long history of reasons why it does not work in practice:
• Trust system where scrutiny is delegated to specialists
• Long paths from data generation to discovery
• Strong incentives provided by commercialisation and competition,
with associated intellectual property regimes around research
results (and conflicting interests of research sponsors and
institutions)
• Practical difficulties in disseminating and reproducing data,
software, techniques and materials, vis-à-vis research articles
• Publication regime itself increasingly commercialised

What makes Open Data valuable now?
• Potential to improve
– pathways to and quality of discoveries
– uptake of new technologies
– collaborative efforts across disciplines, nations and expertises
– research evaluation, debate and transparency
– appropriate valuation of research components beyond papers and patents
– fight against fraud, low quality and duplication of efforts
– legitimacy of science and public trust
– public understanding and participation
• Open Data as a platform to debate what counts as science, scientific
infrastructures and scientific governance, and how results should be
credited and disseminated
• Making data open means making data mobile and useful across sites,
contexts, uses: major challenges to realising that potential
• My concern: examining conditions under which the potential of data as
evidence for scientific claims can be realised sustainably in the long term

Researching Data Journeys
Investigating the conceptual/material/institutional labor involved in
making data travel from sites of production to sites of (re-)use
• Digital data infrastructures as sites for data movements and
integration across a wide variety of sources and perspectives
• Situations of data uptake and re-use in developed and developing
world (ongoing studies in UK, USA, Kenya, South Africa)
• Methods: history, philosophy and social studies of science
– Archival research
– Ethnographies and interviews on attitudes to openness, curation
practices and re-use
– Collaboration with researchers
• Policy involvement:
– Lead for Open Science working group of the Global Young Academy
(e.g. Access to Open Software Survey – Nigeria, Ghana, Bangladesh)
– Chair of ongoing Open Data consultation across European YAs

Research Data Management Across Disciplines
Scientific realms under investigation:
• model organism research: data on different aspects of same organism
• plant science: environmental, phenotypic and omics data
• biomedicine: clinical, crowdsourced, biological data
• oceanography: geological, geographical, metereological, biological data
• archaeology, particle physics, climate science, economics
Parameters of comparison:
• Subject matter (complex objects versus simplified models)
• Data source (one or multiple disciplines)
• Data production mode (centralised vs dispersed; highly automated vs
system-specific)
• Data types (ease of dissemination and analysis, size, relation to software)
• Publication cultures and collaborative ethos
• Geographical locations, types and sources of funding involved
• Availability of relevant data (and other) infrastructures
• Ethical concerns and regulation

Challenges of Collection
Data sharing needs to be extensive, comprehensive, global
and long-term. This requires:
• Habitual data donation: challenge to current credit systems
and research practices, given considerable labor involved (NB:
when adopted as community ethos, huge boost to research)
• Adequate standards & guidelines for data formatting:
problematic given large diversity of methods & terminologies
• Well-organised databases: intelligent and labor-intensive
curation to avoid ‘data dumps’
• Sharing of related materials: reliable stock centres and
collections, rarely available & well-coordinated with databases
• Diversity of data types: now emphasis on cheap and easy
quantitative measurements
• Sustainability in time:
– commitment to data infrastructures beyond short term
– continuous updates of data standards and classification to
keep up with shifts in technology and knowledge

Challenges of Re-Use
• Qualitative results: very limited re-use*. Why?
• Misalignment between IT solutions and research
questions/needs/situations; problems with access to related
software
• Substantive disagreement over data management:
– methods, terminologies, standards involved in data production
and interpretation
– what counts as data in the first place (data as a relational
category)
• Re-use often linked to participation in developing data
infrastructures  rarely the case for busy practitioners, also
gap in skills
• Conflation of epistemic and economic value of data  wish
to capitalise on past investments risks encouraging
conservatism (building on old data instead of pursuing new

Challenges of Openness
• Semantic ambiguity: Openness means different things to different
people, even in same discipline (e.g. free of license, free of
ownership, under CC-BY license, common good, good enough to
share, unrestricted access and/or use, accessible without payment,
unclear/open to interpretation..) – explicit debate is key
• Problematic implementation: research ethos, career structures &
incentives lag behind; strong disincentives in competitive fields;
publication pressure leads to information control
• IP: confusion around which modes of intellectual property apply,
and to whom (individual researchers, labs, projects, networks,
universities, funders)
• Social & ethical concerns: data as tokens of personal identity
• Universities and the state: confusion around Open Data policies
perceived and perceived tensions with metrics of excellence and
impact (e.g. UK)

The Open Data Divide
High-resource bias: richer labs struggle to comply, poorer labs are left
behind and/or choose not to participate
• databases mostly display outputs of top English-speaking labs, which
have funds to curate contents, visibility to determine dissemination
formats/procedures, resources and confidence to build on data
donated by others
• involvement of poor/unfashionable labs, scientists in middle-low-
income countries, non-scientists remains low & at ‘receiving’ end
• few provisions for situations of systematic disadvantage (e.g. lack of
infrastructures and online access, funding, governmental support,
expertise, materials; teaching demands; power cuts and transport
delays) and vulnerability (e.g. where access to a resource/location is
what gives competitive edge, as in archaeology, botany)
• low-resourced researchers are reluctant to contribute, fear it will
undermine rather than increase international credibility

Conclusions
1. OD is Not Quick Nor Cheap
1. Open to What and When?
2. Link between OD and Access to Software
3. Estimating Prospective Value vs Preserving Open-Endedness
Meanings of openness in Oxford English Dictionary:
1. ‘free’ (of..)
2. ‘accessible, exposed, unrestricted’
3. ‘available, reusable’
4. ‘flexible, unpredictable, uncertain, unsettled’
Policy and scientific discourse centers around 1-3, and yet 4 is crucial
to science

Steps Forward: Researchers, Institutions,
Funders and Learned Societies
• Current data collections are very limited in scope and difficult to
re-use by outsiders
• Careful consideration needs to be given to what is disseminated,
why, how and with which priority and time-line
• Need to promote
– data curation as integral part of research, since being involved in
developing databases is key to effective data re-use
– critical discussions about what counts as data and openness in each
research community / centre / project, taking account of specific ethical,
legal and political concerns
• Crucial role of learned societies and funders in informing
researchers as well as policy-makers of shifting needs, resources
and constrains for each field
• Beware of the term “sharing”: it suggests, but does not entail,
reciprocity and common ground

With thanks to the Exeter Data Studies Group:
Brian Rappert
Louise Bezuidenhout
Ann Kelly
Niccolo Tempini
Gregor Halfmann
Rachel Ankeny
Main reference: Leonelli, Sabina (2016, in press) Data-Centric Biology: A
Philosophical Study. Chicago, Il: The University of Chicago Press.
For other relevant publications, see www.datastudies.eu, @DataScienceFeed
This research was funded by the European Research Council under the European
Union's Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement
n° 335925; the UK Economic and Social Research Council (ESRC), grant number
ES/F028180/1; and the Leverhulme Trust, grant award RPG-2013-153.
15www.datastudies.eu

The Challenges of Making Data Travel, by Sabina Leonelli

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à The Challenges of Making Data Travel, by Sabina Leonelli

Similaire à The Challenges of Making Data Travel, by Sabina Leonelli (20)

Plus de LEARN Project

Plus de LEARN Project (20)

Dernier

Dernier (20)

The Challenges of Making Data Travel, by Sabina Leonelli

Notes de l'éditeur