SlideShare une entreprise Scribd logo
1  sur  22
The emerging biodiversity data ecosystem Cynthia Parr, Katja Schulz, Jennifer Hammock  Smithsonian Institution  Nathan Wilson, Patrick Leary Marine Biological Laboratory Richard Allen Environmental Protection Agency
Today’s story What is EOL Core questions Network analysis Hotlist development Page richness algorithm Conclusion: improving the health and richness of our knowledge network advances understanding
What is EOL http://www.eol.org ,[object Object]
All species
Freely accessible & reusable: open access, open source
Available from a single portal in a common format
Quality
Always growing,[object Object]
EOL is a content curation community Content providers Databases 	Journals LifeDesks 	Public contributions Curating Aggregation Commenting Tagging http://www.eol.org
Core questions Where is our knowledge about biodiversity? Where are the gaps? What are the most effective ways to fill gaps given our limited resources?
Network analysis with Anne Bowser, University of Maryland EOL GBIF NCBI EOL connects hubs
The GBIF hub has subnetworks
Key individuals seek out hubs TOLWeb
Implications and next steps Need more data Identify isolated projects & mechanisms for connecting them to the network Improve resilience & redundancy Distribute annotation & quality control  Model data flow quantity and impact
Viewer of Life on EOL – Kris Urie
Low % of descendents with text  in Arthropods
Within arthropods coverage varies  . . . Perhaps as expected http://synthesis.eol.org/media/treemap/
Developing the EOL hot list Consultation with taxonomic experts Development of criteria Assembly of critical lists Establishing targets for rich taxon pages, lesser known pages
EOL’s hot lists Hot List	 Red Hot List 70,000 taxa Conservation concern Invasives Model organisms Ecologically important Pests Charismatics Data availability 2,800 taxa Most searched Top 100 invasives Crops (food) Zoos & aquaria High traffic Higher taxa
Taxon page richness algorithm 60% 30% 10% Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status Depth: # words per text object, # words total Diversity: Sources (partners) + + a (Breadth) b (Depth) c (Diversity) 0 – 1, Threshold 0.4
Summary of EOL page richness Overall Hot List 640,000 have content 2 % are rich 25 % have only links  to literature 28 % of 75K are rich Average richness = 0.30 Red Hot List 56 % of 3K are rich Average richness = 0.43
Strategies for improving richness Crowd-sourcing Leveraging Collections Communities Mobile apps Enabling platforms Enabling journals Data mining BHL etc. Version 2 Coming in Fall 2011!

Contenu connexe

Tendances

Nigel Robinson - ZooBank and Zoological Record: a partnership for success
Nigel Robinson - ZooBank and Zoological Record: a partnership for successNigel Robinson - ZooBank and Zoological Record: a partnership for success
Nigel Robinson - ZooBank and Zoological Record: a partnership for success
ICZN
 
Citizen Science: Association of American Medical Colleges conference
Citizen Science: Association of American Medical Colleges conferenceCitizen Science: Association of American Medical Colleges conference
Citizen Science: Association of American Medical Colleges conference
Darlene Cavalier
 
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
TERN Australia
 

Tendances (19)

SHARE Update for CASRAI, November 2014
SHARE Update for CASRAI, November 2014SHARE Update for CASRAI, November 2014
SHARE Update for CASRAI, November 2014
 
evaluating the quality of open access content
evaluating the quality of open access contentevaluating the quality of open access content
evaluating the quality of open access content
 
Linking biodiversity data for ecology
Linking biodiversity data for ecologyLinking biodiversity data for ecology
Linking biodiversity data for ecology
 
Data dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data DiscoveryData dialogue - Human Genomic Data Discovery
Data dialogue - Human Genomic Data Discovery
 
Nigel Robinson - ZooBank and Zoological Record: a partnership for success
Nigel Robinson - ZooBank and Zoological Record: a partnership for successNigel Robinson - ZooBank and Zoological Record: a partnership for success
Nigel Robinson - ZooBank and Zoological Record: a partnership for success
 
2018 04-03-shorthouse
2018 04-03-shorthouse2018 04-03-shorthouse
2018 04-03-shorthouse
 
Tyler poster v2
Tyler poster  v2Tyler poster  v2
Tyler poster v2
 
The Road to TraitBank: What's Next for the Encyclopedia of Life
The Road to TraitBank: What's Next for the Encyclopedia of LifeThe Road to TraitBank: What's Next for the Encyclopedia of Life
The Road to TraitBank: What's Next for the Encyclopedia of Life
 
Citizen Science: Association of American Medical Colleges conference
Citizen Science: Association of American Medical Colleges conferenceCitizen Science: Association of American Medical Colleges conference
Citizen Science: Association of American Medical Colleges conference
 
Integrative Biology Summit
Integrative Biology SummitIntegrative Biology Summit
Integrative Biology Summit
 
Living in a Microbial World
Living in a Microbial WorldLiving in a Microbial World
Living in a Microbial World
 
Sleeping Beauty Transposon: Awakening a new approach to cancer treatment
Sleeping Beauty Transposon: Awakening a new approach to cancer treatmentSleeping Beauty Transposon: Awakening a new approach to cancer treatment
Sleeping Beauty Transposon: Awakening a new approach to cancer treatment
 
Using the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support EcoinformaticsUsing the Semantic Web to Support Ecoinformatics
Using the Semantic Web to Support Ecoinformatics
 
Bradley Research Sept 2007
Bradley Research Sept 2007Bradley Research Sept 2007
Bradley Research Sept 2007
 
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
Keynote Speaker 1 - Data Intensive Challenges in Biodiversity Conservation: a...
 
Bccvl hallgren
Bccvl hallgrenBccvl hallgren
Bccvl hallgren
 
Open Virus Indian Presentation
Open Virus Indian PresentationOpen Virus Indian Presentation
Open Virus Indian Presentation
 
1476-4598-3-23
1476-4598-3-231476-4598-3-23
1476-4598-3-23
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 

En vedette (7)

Mibbi workshop-isa-project
Mibbi workshop-isa-projectMibbi workshop-isa-project
Mibbi workshop-isa-project
 
R interface to TreeBASE
R interface to TreeBASER interface to TreeBASE
R interface to TreeBASE
 
GIATE mibbi2010
GIATE mibbi2010GIATE mibbi2010
GIATE mibbi2010
 
Sansone bio sharing introduction
Sansone bio sharing introductionSansone bio sharing introduction
Sansone bio sharing introduction
 
Sansone mibbi-intro
Sansone mibbi-introSansone mibbi-intro
Sansone mibbi-intro
 
2011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 20112011Field talk at iEVOBIO 2011
2011Field talk at iEVOBIO 2011
 
The TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for PlantsThe TNRS: a Taxonomic Name Resolution Service for Plants
The TNRS: a Taxonomic Name Resolution Service for Plants
 

Similaire à The emerging biodiversity data ecosystem

Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
Cyndy Parr
 
BioOne Keynote
BioOne KeynoteBioOne Keynote
BioOne Keynote
drielinger
 

Similaire à The emerging biodiversity data ecosystem (20)

Shorthouse
ShorthouseShorthouse
Shorthouse
 
Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)Writing The Encyclopedia Of Life (not EoL.org)
Writing The Encyclopedia Of Life (not EoL.org)
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Encyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypesEncyclopedia of Life: Use cases for phenotypes
Encyclopedia of Life: Use cases for phenotypes
 
Beacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data SharingBeacon Network: A System for Global Genomic Data Sharing
Beacon Network: A System for Global Genomic Data Sharing
 
Introduction to EOL.org for scientists
Introduction to EOL.org for scientistsIntroduction to EOL.org for scientists
Introduction to EOL.org for scientists
 
Stories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global InfrastructureStories of “Glocality"—Nations in a Global Infrastructure
Stories of “Glocality"—Nations in a Global Infrastructure
 
RPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 KeynoteRPG iEvoBio 2010 Keynote
RPG iEvoBio 2010 Keynote
 
iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010iEvoBio Keynote Talk 2010
iEvoBio Keynote Talk 2010
 
Global patterns of insect diiversity, distribution and evolutionary distinctness
Global patterns of insect diiversity, distribution and evolutionary distinctnessGlobal patterns of insect diiversity, distribution and evolutionary distinctness
Global patterns of insect diiversity, distribution and evolutionary distinctness
 
Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5Microbial Phylogenomics (EVE161) Class 5
Microbial Phylogenomics (EVE161) Class 5
 
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
iEvoBio Keynote: Frontiers of discovery with Encyclopedia of Life -- TRAITBANK
 
Biodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary ChallengeBiodiversity Informatics: An Interdisciplinary Challenge
Biodiversity Informatics: An Interdisciplinary Challenge
 
Big data nebraska
Big data nebraskaBig data nebraska
Big data nebraska
 
Frontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of LifeFrontiers of discovery with Encyclopedia of Life
Frontiers of discovery with Encyclopedia of Life
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45mins
 
BioOne Keynote
BioOne KeynoteBioOne Keynote
BioOne Keynote
 
Behavior ontology workshop princeton
Behavior ontology workshop princetonBehavior ontology workshop princeton
Behavior ontology workshop princeton
 
AB3ACBS 2016: EMBL Australia Bioinformatics Resource
AB3ACBS 2016: EMBL Australia Bioinformatics ResourceAB3ACBS 2016: EMBL Australia Bioinformatics Resource
AB3ACBS 2016: EMBL Australia Bioinformatics Resource
 
Cranston Evolution 2013
Cranston Evolution 2013Cranston Evolution 2013
Cranston Evolution 2013
 

Plus de Cyndy Parr

Parr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbagParr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbag
Cyndy Parr
 

Plus de Cyndy Parr (20)

Open data and the ag data commons
Open data and the ag data commonsOpen data and the ag data commons
Open data and the ag data commons
 
Ag Data Commons for AgBioData
Ag Data Commons for AgBioDataAg Data Commons for AgBioData
Ag Data Commons for AgBioData
 
Biodiversity informatics and the agricultural data landscape
Biodiversity informatics and the agricultural data landscapeBiodiversity informatics and the agricultural data landscape
Biodiversity informatics and the agricultural data landscape
 
Public access to research results at USDA
Public access to research results at USDAPublic access to research results at USDA
Public access to research results at USDA
 
Ag Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and dataAg Data Commons: Agricultural research metadata and data
Ag Data Commons: Agricultural research metadata and data
 
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...Ag Data Commons: A new USDA catalog and repository for agricultural research ...
Ag Data Commons: A new USDA catalog and repository for agricultural research ...
 
Preparing for data-intensive science across domains.
Preparing for data-intensive science across domains.Preparing for data-intensive science across domains.
Preparing for data-intensive science across domains.
 
Parr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbagParr ag datacommonsnal_brownbag
Parr ag datacommonsnal_brownbag
 
Ag Data Commons: Adding Value to open agricultural research data
Ag Data Commons: Adding Value to open agricultural research dataAg Data Commons: Adding Value to open agricultural research data
Ag Data Commons: Adding Value to open agricultural research data
 
Big Data Initiatives for Agroecosystems
Big Data Initiatives for AgroecosystemsBig Data Initiatives for Agroecosystems
Big Data Initiatives for Agroecosystems
 
TDWG 2014 opening talk: Chair's Welcome
TDWG 2014 opening talk: Chair's WelcomeTDWG 2014 opening talk: Chair's Welcome
TDWG 2014 opening talk: Chair's Welcome
 
Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...Practical interoperability across semantic stores of data for ecological, tax...
Practical interoperability across semantic stores of data for ecological, tax...
 
Using and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute dataUsing and extending Darwin Core for structured attribute data
Using and extending Darwin Core for structured attribute data
 
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
Encyclopedia of Life: Applying Concepts from Amazon and LEGO to Biodiversity ...
 
Species pages and portals
Species pages and portals Species pages and portals
Species pages and portals
 
Building EOL species pages
Building EOL species pagesBuilding EOL species pages
Building EOL species pages
 
Leveraging an international infrastructure: Case studies from the Encyclopeda...
Leveraging an international infrastructure: Case studies from the Encyclopeda...Leveraging an international infrastructure: Case studies from the Encyclopeda...
Leveraging an international infrastructure: Case studies from the Encyclopeda...
 
EOL and Science: Yes we can!
EOL and Science: Yes we can!EOL and Science: Yes we can!
EOL and Science: Yes we can!
 
EOL China Center status
EOL China Center statusEOL China Center status
EOL China Center status
 
Western Ghats Portal
Western Ghats PortalWestern Ghats Portal
Western Ghats Portal
 

Dernier

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

The emerging biodiversity data ecosystem

  • 1. The emerging biodiversity data ecosystem Cynthia Parr, Katja Schulz, Jennifer Hammock Smithsonian Institution Nathan Wilson, Patrick Leary Marine Biological Laboratory Richard Allen Environmental Protection Agency
  • 2. Today’s story What is EOL Core questions Network analysis Hotlist development Page richness algorithm Conclusion: improving the health and richness of our knowledge network advances understanding
  • 3.
  • 5. Freely accessible & reusable: open access, open source
  • 6. Available from a single portal in a common format
  • 8.
  • 9. EOL is a content curation community Content providers Databases Journals LifeDesks Public contributions Curating Aggregation Commenting Tagging http://www.eol.org
  • 10. Core questions Where is our knowledge about biodiversity? Where are the gaps? What are the most effective ways to fill gaps given our limited resources?
  • 11. Network analysis with Anne Bowser, University of Maryland EOL GBIF NCBI EOL connects hubs
  • 12. The GBIF hub has subnetworks
  • 13. Key individuals seek out hubs TOLWeb
  • 14. Implications and next steps Need more data Identify isolated projects & mechanisms for connecting them to the network Improve resilience & redundancy Distribute annotation & quality control Model data flow quantity and impact
  • 15. Viewer of Life on EOL – Kris Urie
  • 16. Low % of descendents with text in Arthropods
  • 17. Within arthropods coverage varies . . . Perhaps as expected http://synthesis.eol.org/media/treemap/
  • 18. Developing the EOL hot list Consultation with taxonomic experts Development of criteria Assembly of critical lists Establishing targets for rich taxon pages, lesser known pages
  • 19. EOL’s hot lists Hot List Red Hot List 70,000 taxa Conservation concern Invasives Model organisms Ecologically important Pests Charismatics Data availability 2,800 taxa Most searched Top 100 invasives Crops (food) Zoos & aquaria High traffic Higher taxa
  • 20. Taxon page richness algorithm 60% 30% 10% Breadth: Images, topics of text objects, references, maps, videos, sounds, conservation status Depth: # words per text object, # words total Diversity: Sources (partners) + + a (Breadth) b (Depth) c (Diversity) 0 – 1, Threshold 0.4
  • 21. Summary of EOL page richness Overall Hot List 640,000 have content 2 % are rich 25 % have only links to literature 28 % of 75K are rich Average richness = 0.30 Red Hot List 56 % of 3K are rich Average richness = 0.43
  • 22. Strategies for improving richness Crowd-sourcing Leveraging Collections Communities Mobile apps Enabling platforms Enabling journals Data mining BHL etc. Version 2 Coming in Fall 2011!
  • 23. The page richness index Helps fill gaps with existing knowledge Helps prioritize funding and training so that it has maximum impact on closing true gaps Will be available via API Computing and storing richness index on EOL is a step towards storing and serving computable data
  • 24. Dynamic data summaries = new knowledge Summarize data within a partner, then across partners. For example: compute an average value for one taxon (x specimens), compare to range of values across all taxa (621,393 samples) Atlantic Cod Gadusmorhua Jen Hammock (EOL) Edward van den Berge (OBIS)
  • 25.
  • 27. Richness assessment Large-scale data summaries can foster gap-filling and standing, dynamic knowledge analyses
  • 28. Thank you http://www.eol.org 160+ content partners 2000 Flickr contributors 1000s Wikipedia contributors 43,000 EOL members Funding:John D. and Catherine T. MacArthur Foundation, Alfred P. Sloan Foundation, Cornerstone Institutions, Private Donors See Demo and Version 2 sneak peak in Software Bazaar Leadership: Erick Mata, Bob Corrigan, Mark Westneat, Marie Studer, Tom Garnett, Jim Edwards, David Patterson, Developers: Peter Mangiafico, Jeremy Rice, DimitriMozzherin, David Shorthouse, Lisa Whalley and others Biologists: Tanya Dewey, Audrey Aronowsky, Leo Shapiro

Notes de l'éditeur

  1. Conclusion is that there is value to treating all the biodiversity information systems as part of an interconnected ecosystem. We can study the connections, we can assess depth of infomraiton in the network. I’ll focus on EOL’s role in the system, but I hope to make observations that will be generally useful too
  2. Objects such as these are essentially chunks of text sorted by topic. Span biology from physiology to ecology to evolutionEach of these credits the source, and can receive comments or ratings, or can be trusted or untrusted by curators.
  3. So, the approach of EOL is rather different than many other sites. EOL is a giant mashup that creates pages, that are then available for curators (mostly credentialed scientists) to assess and rate, or for anybody to provide comments or tags.160+ partner databases700 curators/1000s contributors/46,000 members2.8 million pages600 thousand pages with Creative Commons contentOver 2 million data objects and >1 million pages with links to research literatureTraffic in past year: 1.7 million unique users, 6.2 million page views
  4. Represents about 1600 projects, and 1700 instances of data flow or hyperlinks between them. Size of the vertex, or node, reflects degree, or how many links the node has. We used the Claust-Newman-Moore algorithm to determine which vertices grouped together, then gave each group a color code. Those nodes with a degree of 15 or higher are labeled, and their edges are shown thicker than the others. These are the hubsThese are the hubs of this network, and they are reasonably well connected to each other. (go through and expand the acronyms)
  5. Daphne Fautin’sHexacorallians of the world
  6. With this as a baseline, how connected and resilient is the network? Over time we want it to become more connected and resilient, both to enable discovery and recovery in case of catastrophic problems.We can also use this to develop effective mechanisms to annotate data and improve data quality. If the same data appear on different parts of the network, and someone reports an error, the repair of that data needs to propagate effectively. What are the factors that influence data flow quantity and effectiveness…
  7. Brighter green has higher % descendents with text, size of square is number of descendents square root scaled
  8. Ecologically important – keystone species, indicator species
  9. Inspired by community ecology & measures of species diversity, which of course were originally inspired by information theory, but we haven’t used those measures. Instead we put together these factors in a way that we could assign weights to different factors based on how well they capture “a rich page”We sampled dozens of pages and had team members assess them for their gestalt “richness” based on their own criteria. Then we compared those scores to those generated by the algorithm, and iteratively changed weights until we achieved a set of weights that appeared to reflect human perception of “richness.”Note that there’s a penalty that unvetted material is only worth about 75% of vetted materialAlso there are maximums for many of these input values – having 200 images may not make a page much more rich than having 25 images.Reserve the right to change this to ensure that the index is as useful as possible. Like Google PageRank, want to ensure that nobody can game the system.
  10. Also note that there is an implication that a “rich page” is a “high quality page” – not necessarily true but often it is.As EOL goes forward with our version 2 we’ll be gathering other inputs that can tell us if a page is successful – ratings of its objects, for example.
  11. Here’s what we are already doing – for the OBIS specimens which have rich environmental data associated with themCould add simllar values from other partners, for example from GenBank where some samples that are sequenced are collected from known envorinments, or from ecological studies that aren’t part of the specimen based system.Could subscribe to this value and get alerts if new values that come in that are outside this range.Could set up an model for this taxon and its relatives, predicting expected values, then if new values are aggregratedfrom any of EOL’s partners that violate the model, the scientist who has published the model gets a notification, could be there’s a flaw in the data integration, some violation of assumptions about the measurement workflow. Or could be that there’s something we truly didn’t understand before.Truly leveraging the scientific output of many researchers, better use of resources, more rapid advances in understanding of biological systems.
  12. Analogousto the study of ecosystems where we seek to build an understanding of entire systems with many kinds of inputs, both biotic and abiotic
  13. In addition to the authors…