SlideShare a Scribd company logo
1 of 24
BioGPS and mygene.info:
Consuming and Providing Cloud
    Computing Resources
        Molecular Med Tri-Con
         February 20, 2012

         Andrew Su, Ph.D.
           http://sulab.org
            @andrewsu
            +Andrew Su
          asu@scripps.edu
2
High-throughput molecular profiling is powerful



         m/z

                                   Gene/protein
                                       list




                                     Testable
                                    hypothesis
3

20 million papers
900,000 new papers / year
4
Gene databases are numerous and overlapping




                            … and hundreds
                               more …
5
Community extensibility and user customizability




                   http://biogps.org
6
Crowdsourcing depends on positive feedback

                       Utility




                         1   100
                   2               200




   Contributors                          Users
7
Utility: A simple and universal plugin interface
         Utility




Contributors       Users
8
Utility: A simple and universal plugin interface
         Utility




Contributors       Users
9
Utility: A simple and universal plugin interface
         Utility




Contributors       Users
10
Utility: A simple and universal plugin interface
         Utility




Contributors       Users
11
Utility: A simple and universal plugin interface
         Utility




Contributors       Users
12
Utility: A simple and universal plugin interface
         Utility




Contributors         Users




                       Total of 389 gene-centric online
                   databases registered as BioGPS plugins
13
Users: BioGPS has critical mass
         Utility           Daily pageviews




Contributors       Users




   • > 4100 registered users                      Top 10 organizations
   • 4000 unique visitors per week           1.     Harvard     6. Cambridge
                                             2.     NIH         7. U Penn
   • 40,000 page views per week
                                             3.     UCSD        8. Stanford
                                             4.     Scripps     9. Wash U
                                             5.     MIT         10. UNC
14
Contributors: Explicit and implicit knowledge
         Utility




Contributors       Users




     389 plugins registered
      (65% publicly shared)

         by over 75 users

    spanning 150+ domains
15
BioGPS architecture




      http://mygene.info
16
mygene.info architecture

http://mygene.info



                     NGINX
17
BioGPS as a cloud computing consumer
                   EC2 Small

                                       EC2 Micro




              NGINX


                                  Total monthly cost: ~$100



       EC2 Micro      EC2 Micro
18
BioGPS as a cloud computing provider
       Use case: Create web application to display
           custom Affymetrix data
                                                          Gene Annotation
                                                            as a Service
                                          “CDK2”              (GAaaS)




     “204252_at”
                               Users

     Users                     Developers

     Developers                                                Users
                                           204252_at
                                                               Developers
                  Expression




                                       Data set samples
19
Gene query web service
http://mygene.info/query?q=204252_at
http://mygene.info/query?q=P24941
http://mygene.info/query?q=GO:0000307
http://mygene.info/query?q=cdk?
http://mygene.info/query?q=cdk2
20
Gene annotation web service
http://mygene.info/query?q=cdk*   http://mygene.info/gene/1017
21
Optimized for performance in web apps



                        10
             Time (s)

                         1


                        0.1


                    0.01
                              10   100      1000    10000   100000
                                         # of query terms
                                         # of hits


    More documentation (paging, sorting, filtering, etc.)
        plus code snippets at http://mygene.info.
22
The future of BioGPS




                    Third party
                 content providers
23
The future of BioGPS




                                        Semantic
                                     interpretation,
                                         change
                                     detection, etc.


                    Third party
                 content providers
24

       Group members                                    Contact

Erik Clarke         Ian Macleod                      http://sulab.org
Ben Good            Chunlei Wu                      asu@scripps.edu
Salvatore Loguercio                                    @andrewsu
                                                      +Andrew Su




                           Funding and Support

                              (BioGPS: GM83924,
                             Gene Wiki: GM089820)

More Related Content

Similar to 20120220 Tri-Con Cloud Computing Symposium

RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
Carole Goble
 
Cytoscape ci chapter 1
Cytoscape ci chapter 1Cytoscape ci chapter 1
Cytoscape ci chapter 1
bdemchak
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
BOSC 2010
 

Similar to 20120220 Tri-Con Cloud Computing Symposium (20)

Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Stephen Dunn, the Guardian
Stephen Dunn, the GuardianStephen Dunn, the Guardian
Stephen Dunn, the Guardian
 
Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...Enabling the physical world to the Internet and potential benefits for agricu...
Enabling the physical world to the Internet and potential benefits for agricu...
 
2014.07.22 shorthouse
2014.07.22   shorthouse2014.07.22   shorthouse
2014.07.22 shorthouse
 
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
Scott Edmunds A*STAR open access workshop: how licensing can change the way w...
 
A Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical ResearchA Data Biosphere for Biomedical Research
A Data Biosphere for Biomedical Research
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
Biocatalogue Talk Slides
Biocatalogue Talk SlidesBiocatalogue Talk Slides
Biocatalogue Talk Slides
 
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
Jesse Xiao at CODATA2017: Updates to the GigaDB open access data publishing p...
 
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...RDMkit, a Research Data Management Toolkit.  Built by the Community for the ...
RDMkit, a Research Data Management Toolkit. Built by the Community for the ...
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
Grid computing
Grid computingGrid computing
Grid computing
 
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challengeScott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
Scott Edmunds at OASP Asia: Open (and Big) Data – the next challenge
 
Cytoscape ci chapter 1
Cytoscape ci chapter 1Cytoscape ci chapter 1
Cytoscape ci chapter 1
 
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance Collaboration
 
Path visio3
Path visio3Path visio3
Path visio3
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
Experiences in the Design and Implementation of a Social Cloud for Volunteer ...
 

More from Andrew Su

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
Andrew Su
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Andrew Su
 

More from Andrew Su (20)

Building and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graphBuilding and mining a heterogeneous biomedical knowledge graph
Building and mining a heterogeneous biomedical knowledge graph
 
Wikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciencesWikidata as a FAIR knowledge graph for the life sciences
Wikidata as a FAIR knowledge graph for the life sciences
 
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledgeThe Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
The Gene Wiki: Using Wikipedia and Wikidata to organize biomedical knowledge
 
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
BOSC2017: Using Wikidata as an open, community-maintained database of biomedi...
 
WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)WikiGenomes Poster (ISMB)
WikiGenomes Poster (ISMB)
 
The case for an open biomedical knowledgebase
The case for an open biomedical knowledgebaseThe case for an open biomedical knowledgebase
The case for an open biomedical knowledgebase
 
Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)Open data, compound repurposing, and rare diseases (ISCB)
Open data, compound repurposing, and rare diseases (ISCB)
 
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
Open data, compound repurposing, and rare diseases -- Point Loma Nazarene Uni...
 
Citizen Science and Rare Disease Research
Citizen Science and Rare Disease ResearchCitizen Science and Rare Disease Research
Citizen Science and Rare Disease Research
 
Open biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen scienceOpen biomedical knowledge using crowdsourcing and citizen science
Open biomedical knowledge using crowdsourcing and citizen science
 
Heart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen ScienceHeart BD2K, Biocuration, and Citizen Science
Heart BD2K, Biocuration, and Citizen Science
 
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
Panel on Citizen Science and Crowdsourcing Games - March 27, 2015
 
Using Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledgeUsing Citizen Science to organize biomedical knowledge
Using Citizen Science to organize biomedical knowledge
 
UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6UCSD / DBMI seminar 2015-02-6
UCSD / DBMI seminar 2015-02-6
 
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
Crowdsourcing and Learning from Crowd Data (Tutorial @ PSB2015)
 
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
Microtask crowdsourcing for annotating diseases in PubMed abstracts (ASHG 2014)
 
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen ScienceCrowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
Crowdsourcing Biology: The Gene Wiki, BioGPS, and Citizen Science
 
Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)Centralized Model Organism Database (Biocuration 2014 poster)
Centralized Model Organism Database (Biocuration 2014 poster)
 
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
A Centralized Model Organism Database (CMOD) for the Long Tail of Sequenced G...
 
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.orgCrowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
Crowdsourcing Biology: The Gene Wiki, BioGPS and GeneGames.org
 

Recently uploaded

Recently uploaded (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

20120220 Tri-Con Cloud Computing Symposium

  • 1. BioGPS and mygene.info: Consuming and Providing Cloud Computing Resources Molecular Med Tri-Con February 20, 2012 Andrew Su, Ph.D. http://sulab.org @andrewsu +Andrew Su asu@scripps.edu
  • 2. 2 High-throughput molecular profiling is powerful m/z Gene/protein list Testable hypothesis
  • 3. 3 20 million papers 900,000 new papers / year
  • 4. 4 Gene databases are numerous and overlapping … and hundreds more …
  • 5. 5 Community extensibility and user customizability http://biogps.org
  • 6. 6 Crowdsourcing depends on positive feedback Utility 1 100 2 200 Contributors Users
  • 7. 7 Utility: A simple and universal plugin interface Utility Contributors Users
  • 8. 8 Utility: A simple and universal plugin interface Utility Contributors Users
  • 9. 9 Utility: A simple and universal plugin interface Utility Contributors Users
  • 10. 10 Utility: A simple and universal plugin interface Utility Contributors Users
  • 11. 11 Utility: A simple and universal plugin interface Utility Contributors Users
  • 12. 12 Utility: A simple and universal plugin interface Utility Contributors Users Total of 389 gene-centric online databases registered as BioGPS plugins
  • 13. 13 Users: BioGPS has critical mass Utility Daily pageviews Contributors Users • > 4100 registered users Top 10 organizations • 4000 unique visitors per week 1. Harvard 6. Cambridge 2. NIH 7. U Penn • 40,000 page views per week 3. UCSD 8. Stanford 4. Scripps 9. Wash U 5. MIT 10. UNC
  • 14. 14 Contributors: Explicit and implicit knowledge Utility Contributors Users 389 plugins registered (65% publicly shared) by over 75 users spanning 150+ domains
  • 15. 15 BioGPS architecture http://mygene.info
  • 17. 17 BioGPS as a cloud computing consumer EC2 Small EC2 Micro NGINX Total monthly cost: ~$100 EC2 Micro EC2 Micro
  • 18. 18 BioGPS as a cloud computing provider Use case: Create web application to display custom Affymetrix data Gene Annotation as a Service “CDK2” (GAaaS) “204252_at” Users Users Developers Developers Users 204252_at Developers Expression Data set samples
  • 19. 19 Gene query web service http://mygene.info/query?q=204252_at http://mygene.info/query?q=P24941 http://mygene.info/query?q=GO:0000307 http://mygene.info/query?q=cdk? http://mygene.info/query?q=cdk2
  • 20. 20 Gene annotation web service http://mygene.info/query?q=cdk* http://mygene.info/gene/1017
  • 21. 21 Optimized for performance in web apps 10 Time (s) 1 0.1 0.01 10 100 1000 10000 100000 # of query terms # of hits More documentation (paging, sorting, filtering, etc.) plus code snippets at http://mygene.info.
  • 22. 22 The future of BioGPS Third party content providers
  • 23. 23 The future of BioGPS Semantic interpretation, change detection, etc. Third party content providers
  • 24. 24 Group members Contact Erik Clarke Ian Macleod http://sulab.org Ben Good Chunlei Wu asu@scripps.edu Salvatore Loguercio @andrewsu +Andrew Su Funding and Support (BioGPS: GM83924, Gene Wiki: GM089820)

Editor's Notes

  1. next gen sequencing identifies candidate genesAlso Microarray data, proteomics, GWAS, methylation, post-translational modifications, translocation detection, etc.What do these genes do?
  2. MODs and portals
  3. Genetics resources
  4. Literature resources
  5. Protein resources
  6. Pathway and expression databases
  7. Pathway and expression databases
  8. Nginx -- load balancing and reverse proxyTornado – application server (python)
  9. Nginx -- load balancing and reverse proxyTornado – application server (python)