Stories of “Glocality"—Nations in a Global Infrastructure

Presentation at the Weaving the Internet of Data Conference in Amsterdam

  Stories of "Glocality"—Nations in a Global Infrastructure Mark A. Parsons Secretary General Weaving The Internet Of Data Amsterdam, The Netherlands 6 April 2016
  2. 2. research infrastructure vs. e-infrastructure
  3. 3. research infrastructure vs. e-infrastructure a false dichotomy
  4. 4. e-Infrastructure is research infrastructure. Modern research infrastructure is (or at least requires) e-Infrastructure. It’s about the data
  5. 5. Infrastructure is hard to conceive and describe because when it works, it’s transparent, ubiquitous, and embedded in our daily work.
  6. 6. Research Data Alliance Vision Researchers and innovators openly share data across technologies, disciplines, and countries to address the grand challenges of society. Mission RDA builds the social and technical bridges that enable open sharing of data.
  7. 7. Glocality—Bridging across scales Glocalization “means the simultaneity—the co-presence— of both universalizing and and particularizing tendencies.” — Roland Robertson Glocalism is playing at multiple scales at once.
  8. 8. rd-alliance.org South America 1% North America 34% Europe 49% Australasia 4% Asia 9% Africa 3% Organizational Type Members
 (Feb 2016) Press & Media 22 Policy/Funding Agency 58 Large Enterprise 85 IT Consultancy/Development 119 Small and Medium Enterprise 212 Other 198 Government/Public Services 583 Academia/Research 2447 TOTAL 3724 The RDA Community:
 3700+ members from 110 countries (February 2016) May - July Aug - Oct Nov - Jan Feb - Apr May - July Aug - Oct Nov - Jan Feb -Apr May - July Aug -Oct Nov - Jan Feb- Apr 392 991 1274 1656 2048 2404 2636 2881 3126 3434 3698 3724 65+ Working and Interest Groups
  9. 9. RDA Organisational Members RDA Affiliate Members https://rd-alliance.org/organisation/rda-organisation-affiliate-members.html RDA Organisational & Affiliate members Represent the interests of RDA’s organisational members and ensure that their input and needs play a role in guiding the programs and activities of the RDA.
  10. 10. Fran Berman, Research Data Alliance “Create - Adopt - Use” (in 12-18 months) Systems Interoperability Adopted Policy Sustainable Economics Common Types, 
 Standards, Metadata Traffic Image: 
 Mike Gonzalez Adopted Community Practice Training, Education, Workforce
  11. 11. Fran Berman, Research Data Alliance RDA: Accelerate Data Sharing and Interoperability Across Cultures, Communities, 
 Scales, Technologies ▪ Technical parts of the data engine: ▪ Data type registries reference model ▪ Wheat data interoperability framework ▪ Rules of the road: ▪ Common agreement on data citation ▪ Common practice for data repositories ▪ Principles of legal interoperability ▪ Better drivers • Summer schools in data science and cloud computing in the developing world (with CODATA) • Active data management plan development and monitoring Policy and Practice Systems Interoperability Sustainable Economics Common Types, 
 Standards, Metadata Training, Education, Workforce
  12. 12. ‹#› An Area of Convergence and Agreement Internet Domain nodes with IP numbers packages being exchanged standardized protocols Data Domain objects with PID numbers objects being exchanged standardized protocols Slide courtesy P. Wittenberg from L. Lannom from D. Clark
  13. 13. How to feed the world RDA Agriculture Interest Group image courtesy National Farmers Union
  14. 14. The Wheat Data Interoperability WG Active members: Alaux Michael (INRA, France), Aubin Sophie (INRA, France), Arnaud Elizabeth (Bioversity, France), Baumann Ute (Adelaide Uni, Australia), Buche Patrice (INRA, France), Cooper Laurel (Planteome, USA), Fulss Richard (CIMMYT, Mexico), Hologne Odile (INRA, France), Laporte Marie-Angélique (Bioversity, France), Larmand Pierre (IRD, France), Letellier Thomas (INRA, France), Lucas Hélène (INRA, France), Pommier Cyril (INRA, France), Protonotarios Vassilis (Agro-Know, Greece), Quesneville Hadi (INRA, France), Shrestha Rosemary (INRA, France), Subirats Imma (FAO of the United Nations, Italy), Aravind Venkatesan (IBC, France), Whan Alex (CSIRO, Australia) Co-chairs: Esther Dzalé Yeumo Kaboré (INRA, France), Richard Allan Fulss (CIMMYT, Mexico) Aims: contribute to the improvement of Wheat related data interoperability by Building a common interoperability framework (metadata, data formats and vocabularies) Providing guidelines for describing, representing and linking Wheat related data Contributors Sponsors slide courtesy Esther Dzalé
  15. 15. Guidelines (http://wheatis.org/DataStandards.php) Data exchange formats Example: VCF (Variant Call Format) for sequence variation data, GFF3 for genome annotation data, etc. Data description best practices Consistent use of ontologies, consistent use of external database cross references Data sharing best practices Share data matrices along with relevant metadata (example: trait along with method, units and scales or environmental ones) Useful tools and use cases that highlight data formats and vocabularies issues A portal of wheat related ontologies and vocabularies (http://agroportal.lirmm.fr/ontologies?filter=WHEAT) Allows the access to the ontologies and vocabularies through APIs. A prototype Implementation of use cases of wheat data integration within the AgroLD (Agronomic Linked Data) tool: http://volvestre.cirad.fr:8080/agrold/ The deliverables slide courtesy Esther Dzalé
  16. 16. For data managers, data providers One stop shop for relevant information related to data management arise awareness, avoid duplicated efforts, foster adoption of common practices Facilitate the use of common data exchange formats easy data sharing/submission to international repositories Foster a standardized description of datasets with consistent use of ontologies and metadata increase the identification, the findability and the usability of the datasets For data scientists, bioinfomaticians Facilitate the access, integration and analysis of data from various sources Access to data of higher quality For top management, researchers Increase the chance to answer complex questions Benefits for many target users slide courtesy Esther Dzalé
  17. 17. Crisis of Confidence in Research Data Citation
  18. 18. Joint Declaration of Data Citation Principles (Overview) The Noble Eight-Fold Path to Citing Data 1. Importance 2. Credit and attribution 3. Evidence 4. Unique Identification 5. Access 6. Persistence 7. Specificity and verifiability 8. Interoperability and flexibility Principles are supplemented with a glossary, references and examples http://force11.org/datacitation
  19. 19. ‹#›Citing Dynamic Data Data Citation: Data + Means-of-access ▪ Data à time-stamped & versioned (aka history) Researcher creates working-set via some interface: ▪ Access à assign PID to QUERY, enhanced with − Time-stamping for re-execution against versioned DB − Re-writing for normalization, unique-sort, mapping to history − Hashing result-set: verifying identity/correctness leading to landing page S. Pröll, A. Rauber. Scalable Data Citation in Dynamic Large Databases: Model and Reference Implementation. In IEEE Intl. Conf. on Big Data 2013 (IEEE BigData2013), 2013
  20. 20. ‹#› Output / Results ▪ 14 Recommendations
 grouped into 4 phases: - Preparing data and query store - Persistently identifying specific data sets - Resolving PIDs - Upon modifications to the data infrastructure ▪ 2-page flyer ▪ Technical Report to follow ▪ Reference implementations
 (SQL, CSV, XML) ▪ Pilots

  21. 21. ‹#› WG Pilots ▪ Pilots and implementations by ▪ LNEC: Critical Infrastructure Monitoring System ▪ Virtual Atomic and Molecular Data Centre ▪ NERC (UK Natural Environment Research Council 
 Data Centres) ▪ ARGO Buoy Network ▪ River Flow Dataset ▪ ESIP (Earth Science Information Partners) ▪ BCO-DMO ▪ DEXHELPP – Social Security Data ▪ ENVRIplus: Carbon Observation System ▪ Million Song Database, IR Benchmark DBs ▪ Several others under discussion…
  22. 22. More stories of interconnection and glocality in RDA • Unplanned interconnection Data Registries, Data Publishing and OpenAire • Two for the price of one in WUSTL Air Quality portal. • Data types for carbon geology across continent (plus materials science, hydrology, more) • EOSC?
  23. 23. Moving forward • Future Directions—communication, engagement, and coordination • Bridging the bridges—ever more interconnection, reference implementations, engagement with ESOC and national data services. • Diversifying the funding portfolio
  24. 24. • The group of government and non-profit science funding organisations that support the data and science communities to participate in RDA activities: • European Commission (DG CNCT) • US Government (NSF and NIST) • Australian Government • Japanese Government (JST) • UK Government (Jisc) • Sloan Foundation • MacArthur Foundation • Allows agencies the opportunity to share policies and funding program plans that support data sharing across the globe, and thereby amplify their impact. • Explore actual policy impacts. • Related to but distinct from RDA. A parallel informal organisation. RDA Funders Forum
  25. 25. The Yin and Yang of national and international support
  26. 26. • personnel—encourage your staff and grantees to participate in RDA • projects and use cases • make local concerns global concerns • establish niches or areas of excellence How the national can help the global 
 (and thereby the national)
  27. 27. • networks and connections to organisations, countries, resources • amplification of local effort • access to diverse experts • efficiency and cost savings How the global can help the national (and thereby the global)
  28. 28. 12-16 September 2016 in Denver, Colorado, USA
  29. 29. Info:
 enquiries@rd-alliance.org @resdatall