Community content building for evolutionary biology: Lessons learned from LepTree and Encyclopedia of Life
1. Community content building for evolutionary biology Lessons learned from LepTree and Encyclopedia of Life Cynthia Parr Smithsonian Institution University of Maryland
2. Today’s story LepTree and Encyclopedia of Life built a couple of websites LepTree: slow for social content-building but highly useful content EOL: quick for content aggregation, but now need to atomize and semanticize Conclusion: Best of both worlds
30. EOL Cornerstone Institutions Sample Content Partners AmphibiaWeb Animal Diversity Web AntWeb Catalogue of Life FishBase Global Biodiversity Information Facility (GBIF) International Union for the Conservation of Nature Tree of Life Web Project The Biodiversity Heritage Library The Field Museum of Natural History The Missouri Botanical Garden The Marine Biological Laboratory Harvard University The Smithsonian Institution
Notes de l'éditeur
I’m going to do a compare and contrast talk, so I have two projects to introduce you to. I apologize in advance if I go a bit quickly. Please feel free to catch me anytime in the next two days to get a demonstration of either of these projects
Conclusion is that these are complementary approaches – can pursue in parallel. Focus on community driven databases that can be customized for the needs of the users of the data – result in highly atomized specialist data. Then alllow that information to be aggregated on EOL where it might find broader reuse and reinterpretation.
LepTree is an Assembling the tree of life project whose major goal is to use nuclear genetic sequence to resolve deep nodes at the family and superfamily level in the Lepidoptera. This tree on the left shows our initial published findings which are not the point of this talk. I’ll just note that our analysis suggests that macrolepidoptera, shown by these orange bars, the very large moths and all butterflies, are clearly not a monophyletic group.The subject of today’s talk is the website tools we’ve created at leptree.net that include some features such as an interactive matrix visualizion of the sequencing status for the project of the where columns are each of the genes being sequenced and the rows show the hundreds of samples being used by the project, colors show our progress for each gene.We also have a fossil project and a morphology project that also have representation on our pages.
The leptree website is built on a core of the open source drupal platform, and includes a number of the out-of-the-box community features, blog, discussion forum, commenting, the ability to create private working areas.In addition we have added new modules to allow community members to add information about their own projects, to post protocols that they are using so that they can link to them and other people can use the same protocols. Finally, we have a references module that lists about 800 articles on lepidopteransystematics. Rather than using the relational database that is the backend of drupal, these are actually storing data semantically – as RDF triples linked to rich ontologies.
And finally, we also set up a custom module that presents a user with a complex temlpate for describing taxa. The checkboxes and data fields are the result of months of consultation with lepidopterists and are intended to cover the kinds of morphological and ecological variation across the group. Like the projects, protocols, and references modules, the data are stored in a sesame triple store repository. We can use this semantic representation to link our knowledge to that generated by other projects and use machine reasoning to come up with new results. This is the kind of data that would be appropriate to “decorate” a phylogenetic tree to look for patterns.The goal is to produce about 150 of these taxon pages but we designed the system to be expandable.
So to summarize, LepTree built some semantics-enabled tools, combine this with data and links from a couple of other projects to create the taxonomic information pages you can see on LepTree.net under “Knowledge project”In addition, the taxon information is now being exported as text objects and also appears on the Encyclopedia of Life taxon pages.
Objects such as these are essentially chunks of text sorted by topic.Each of these credits the source, and can receive comments or ratings, or can be trusted or untrusted by curators.
So, the approach of EOL is rather different. EOL is a giant mashup that creates pages, that are then available for curators to assess and rate, or for anybody to provide comments or tags.LepTree has foccuseed on data entry tools while EOL has not – though I should note that we have also developed a Drupal-based system called LifeDesks, which are one of the many ways that data flows to the central EOL.
On LepTree, burden on users to learn a new systemOn EOL, burden on programming staff, not on users
The effort we went to in Leptree to add semantics to the tools likely just slowed us down, and distracted us from the effort of developing a community effort. But once we had tools with lots of checkboxes we have been able to accumulate a lot of potentially useful atomized data.By divide and conquer I mean that it should be possible to continue to promote community databases – these can be tailored to the specific needs of a scientific community and its audiences, with data as structured as possible. And then The data from these projects can be aggregated, essentially cross-indexed, so that they are accesssible from a common portal, EOL. If EOL had tried to structure or semanticize from the beginning we never would have achieved the growth we have.