TOPSAN is a database that provides extensive annotations for thousands of protein structures solved by structural genomics centers. It combines automated and human-edited annotations for each protein structure. TOPSAN aims to make its content accessible for computational analysis by encoding it using semantic web technologies like the TOPSAN Protein Syntax, which allows annotators to enter structured, machine-readable data. This will enable the content to be edited, searched, linked, and queried programmatically.
WordPress Websites for Engineers: Elevate Your Brand
Zmasek bosc2010 topsan
1. Connecting TOPSAN to Computational Analysis Christian M Zmasek, Kyle Ellrott, Dana Weekes, Constantina Bakolitsa, John Wooley, Adam Godzik Joint Center for Structural Genomics Sanford-Burnham Medical Research Institute, La Jolla, California, USA University of California, San Diego, La Jolla, California, USA Joint Center for Molecular Modeling
2. Overview What is TOPSAN? TOPSAN: The Open Protein Structure Annotation Network community based annotation protein structures “Semantic” TOPSAN How to enter machine-readable, structured data Example: editor -> entry -> semantic web Different ways to download information SPARQL example Availability and licenses Acknowledgements Connecting TOPSAN to Computational Analysis 2
3. What is TOPSAN? TOPSAN: The Open Protein Structure Annotation Network Ten-thousands of protein structures have been determined by structural genomics (SG) centers and many more are expected While these structures are available in PDB (Protein Data Bank)… … annotations for most of them a limited to one-line PDB titles TOPSAN is the first database that specifically focuses on proving extensive annotations for the thousands of structures solved by the SG centers Connecting TOPSAN to Computational Analysis 3
4. What is TOPSAN? TOPSAN’s main content are collaboratively (“open”) written articles/annotations for each solved protein structure TOPSAN combines automated with human edited elements TOPSAN spans the range of analysis of single proteins characterization of protein families reconstruction of entire genomes Articles are created by structural genomics (SG) center staff and over 400 external users, so far covering 7,250 proteins Collaborating with PFAM to use JCSG structures to refine and create new PFAM families Connecting TOPSAN to Computational Analysis 4
6. “Semantic” TOPSAN Use the principles of the semantic web to turn TOPSAN into a database that can be: edited searched linked TOPSAN content is being made accessible to computational query and analysis via semantic web technologies Connecting TOPSAN to Computational Analysis 6
7. Entering machine-readable, structured data with the TOPSAN Protein Syntax (TPS) Takes the form subject, predicate, object Subject: the protein in question Predicate, examples: homologous encoded_by citation member_of Object: “direct value” or link to other database Example: {{ note.link( ‘pfam_family_member’, ‘PFAM:PF07980′ ) }} More information: http://topsan.wordpress.com/2010/06/01/96/ Connecting TOPSAN to Computational Analysis 7
8. Example: in the Editor Connecting TOPSAN to Computational Analysis 8
18. All unique semantic triples stored in a single N3 formatted fileConnecting TOPSAN to Computational Analysis 11
19. Simple SPARQL PREFIX tps:<http://purl.org/topsan/tps#> SELECT ?id ?weight WHERE { ?id tps:molecular_weight ?weight } Connecting TOPSAN to Computational Analysis 12
20. Availability and Licenses Project Site: http://www.topsan.org Software:http://www.topsan.org/Tools Data: Open Source Licenses: Creative Commons Attribution 3.0 License Software: GNU General Public License Connecting TOPSAN to Computational Analysis 13
21. Summary Structural genomics centers produce a large number of proteins structures, most of which never get a publication TOPSAN provides a means for community annotation of such protein structures The TOPSAN Protein Syntax (TPS) allows annotators to easily enter machine-readable, structured data TOPSAN content is being made accessible to computational query and analysis via semantic web technologies Many aspects of TOPSAN are still under development and are planned to evolve with user needs Connecting TOPSAN to Computational Analysis 14
22. Acknowledgements Inspiration for TOPSAN/semantic web connection: DBCLS BioHackathon 2010 Developers: Krishna Subramanian, Kyle Ellrott, Dana Weekes All contributors and users Connecting TOPSAN to Computational Analysis 15