Ce diaporama a bien été signalé.
Le téléchargement de votre SlideShare est en cours. ×

Big Data Initiatives for Agroecosystems

Chargement dans…3

Consultez-les par la suite

1 sur 19 Publicité

Plus De Contenu Connexe

Diaporamas pour vous (20)

Similaire à Big Data Initiatives for Agroecosystems (20)


Plus par Cyndy Parr (20)

Plus récents (20)


Big Data Initiatives for Agroecosystems

  1. 1. Big Data Initiatives for Agroecosystems Cynthia Parr Knowledge Services Division National Agricultural Library Ecological Society of America, 2015
  2. 2. Outline • Data management at the National Agricultural Library • Four examples 1. Insects 5K – i5K Workspace 2. Life Cycle Assessment 3. Long-Term Agroecosystem Research 4. Ag Data Commons • General principles 8.1 million items, Agricola, PubAg
  3. 3. 3 http://blog.thingarage.com/ raw data citable publication
  4. 4. 4 raw data collection cleaning, enrichment, analysis registration, preservation temporary data referable data citable data citable publication Modified from Peter Wittenberg, Research Data Alliance https://rd-alliance.org/group/data-fabric-ig.html
  5. 5. i5k.nal.usda.gov 5
  6. 6. Genome project hosting at the i5k Workspace • 27 pilot genomes hosted; 45 total – Storage and dissemination of a genome assembly and anything mapped to it. – BLAST, JBrowse Genome Browser • Manual Curation: Web Apollo • Post-curation maintenance – Quality Control – Official Gene Set generation • Research plan • Generate material • Sequencing • Assembly • Automated annotation • Manual Curation • Official gene set generation • Genome project maintenance • Biological insights/Publicatio n GenomeProjectTrajectory
  7. 7. Life Cycle Assessment Commons 7 www.lcacommons.gov
  8. 8. Unformatted, non-standard LCA Commons Concept LCA Community Open LCA Framework Common computing environment, application, data standards, and development NAL LCADC NREL USLCI XYZ LCI DB ABC LCI DB Distributed computing environment & application Common data standards Distributed computing environment DEF LCI DB Common application & data standards Interoperability Tools Ag Data Commons Catalog and Repository
  9. 9. Long Term Agro-ecosystem Research (LTAR)
  10. 10. LTAR Data Common Observatory – Meteorology – Hydrology – Eddy flux CO2 – Non-CO2 gasses – Soil – Biological 10 Common Experiment Approach – Business as usual – Aspirational Will include data about – Management practices – Results
  11. 11. LTAR Data Loss N=194 of ~500 citations in 2011 LTAR site proposals Bad links to data No data available 80% of papers provide no way to obtain data Data are accessible Refers to general data source
  12. 12. LTAR information management • Support for download of files, web services • Metadata in FGDC CSDGM, ISO 19115, EML, Project Open Data • Catalog of instrument specs using SensorML 2 • Data dictionaries in ISO 19110 • Weather data to be converted to other formats • Field names could be converted to match different conventions (AgMIP, etc.)
  13. 13. Ag Data Commons 13
  14. 14. data.nal.usda.gov Enhanced DKAN
  15. 15. Distributed repositories Search & Knowledge Discovery Thesaurus & Indexing Ag Data Commons Repository Organization & Curation Grant management systems INGESTION DISSEMINATION PubAg Dataset Submission Analytics & Tools Data.gov Forest Service NCBI Ag Data Commons Catalog Color Legend: Building Adapt/Re-use Existing LCA Commons
  16. 16. Guiding principle 1: a distributed network …. Geospatial Catalog Geospatial Repository STEWARDS Ag Data Commons (catalog) Ag Data Commons (repository) USDA Enterprise Inventory National Weather Service Data.gov Ecosystems .data.gov of Networks…
  17. 17. Public access to open, machine readable data enables larger scale, integrative and innovative data science The long tail Guiding principle 2: big data AND long tail
  18. 18. Guiding principle 3: curation adds value • Data dictionaries • Standards & templates • Linkages • Semantics • Preservation
  19. 19. Thanks! National Agricultural Library Knowledge Services Division: Susan McCarthy LTAR Jeffrey Campbell, Charles Lockwood i5K Monica Poelchau, Chris Childers LCA Commons Peter Arbuckle, Ezra Kahn Ag Data Commons Ursula Pieper, Jocelyn McNamara, Qing Qu, Erin Antognoli, Melissa Lowrey, Jaylen Nathwani, NuCivic … and collaborators and testers

Notes de l'éditeur

  • The National Ag Library is providing tools to assist with both the top part of this diagram as well as the bottom part
  • i5K (Insect 5000 genomes)
    Sequence and annotate the 5000 genomes of arthropod species known to be important to worldwide agriculture, food safety, energy production, and medicine

    Most of them are not well-funded model organism communities, so rather than build a website for each of these 5000 organisms my colleagues Monica Poelchau and Chris chlders have built a general workspace where communities tools and data can be hosted and shared. Here’s a list of some of the organisms in the i5K workspace already

  • We take over all of the infrastructural challenges, but a community coordinator to identify curation priorities and organize curators is still a necessity
    Collaborations between NAL, Baylor College of medicine, Lawrence Berkeley, National Taiwan University
  • My colleagues Peter Arbuckle and Ezra Kahn have been working with their colleagues on the LCA commons
    Life Cycle Assessment is a set of methodologies for doing complex accounting of end-to-end inputs and outputs into any kind of production or manufacturing process. For agriculture this means tracking things like energy pesticide and water inputs and carbon emissions and yield outputs and seeking ways to make the processes more sustainable. The LCA commons will provide three things
    1) Life Cycle Inventory Database for agriculture
    2) ADC collection for related tools and unformatted data
    3) Federal network of databases and resources – still under development

    Life Cycle Assessment (LCA) Commons
    open access to LCA datasets and tools for researchers studying sustainable methods in crop and livestock production

    1 and 2 have recently been release together in new web site and 3 is still under development. Bringing partners together and discussing terms and business model.

  • Vision for LCA commons is to be a distributed network of databases, applications, and tools that support LCA with interoperable data to the extent possible. At this point this model represents what we envision for the US Federal LCA Commons
  • Jeff Campbell at the Library is working with Mark Walbridge and his the team who are building the LTAR network. LTAR is a set of 18 sites designed to help determine who managed systems behave within their ecosystems under regional and continental conditions, to be able to better predict what the impacts might be on agriculture and the environment as the world changes, for example under conditions of climate change.

    You can’t read these but some of these site have been intentionally chosen to overlap with existing LTER and NEON sites. They all have long legacies of agroecology research so there will be both deep background data and expertise for future research.
  • Some Observatory may be real time open, generic, the contexts needed to interpret all kinds of experimental data
    Common experiment across all sites research approach
  • 11
  • Ag Data Commons
    general catalog and repository for agricultural data which can promote effective discovery of and add value to often widely distributed and seemingly disparate datasets
  • Dark Blue: develop as part of AgDatacCommons
    Light blue:Enhance existing systems.
    Gray: Already exist