SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
IPRStats: a Visualization Tool for
         InterProScan


              Iddo Friedberg
               Microbiology and
    Computer Science & Software Engineering
             Miami University
      http://github.com/devrkel/IPRStats.git
Microbes are Everywhere
●
    1030 prokaryotic cells on Earth
    (give or take a couple)
●   Dominate the biosphere
    ●   90% of the cells in your body
        are prokaryotic (1014)
    ●   Found in the most hostile
        environments
t
                                   os
                               alm
            Microbes do Everything
●   Nutrient reservoir:
    ●
        4x1010 tons carbon (rivaling
        plants)
    ●
        1x1010 tons Nitrogen
    ●
        1x109 tons phosphorous
●
Of course there is health...
●   Communicable
    diseases
●   Heart disease
●   Gastric cancer
●   Irritable Bowel
    Syndrome
...and Wellness
Microbial Genomics

    Phage phi-X174 1978: 5.5Kbp




    H. influenzae 1995: 1.7Mbp
Classic microbial genomics
Classic microbial genomics
Classic microbial genomics
Microbes live in Communities
 & only 1% can be cultured
What is Metagenomics?
• Culture independent approach to study
  microbial communities
  – < 1% of microbes can be cultured
  – DNA directly isolated from environmental sample
    and sequenced
• Examining genomic content of organisms in
  community/environment to better understand:
  – Diversity of organisms
  – Their roles and interactions in the ecosystem
Metagenomics is the Application
 of Genomics to Communities
Some things we can learn using Metagenomics

 ●Taxonomic content: Taxon diversity in a habitat (using taxonomic
 markers)
 • Functional content: biological functions, qualitative and quantitative
 profiles
 • Coping with the environment: differences in functional content
 between habitats
 • Decompose the biotic / abiotic elements in a habitat: metadata
 analysis
A Metagenomic project
●   Sequencing
●   Assembly
●   Diversity analysis
●   Annotation
    ●   Gene finding
    ●   Function prediction
●   Diversity analysis
●   Comparative
    analysis
A Metagenomic project
●   Sequencing
●   Assembly
●   Diversity analysis
●   Annotation
    ●   Gene finding
    ●   Function prediction
●   Diversity analysis
●   Comparative
    analysis
A Metagenomic project

●   Sequencing
●   Assembly
●   Annotation
    ●   Gene finding
                              Population
    ●   Function prediction   analysis tools
●   Diversity analysis
●   Comparative
    analysis
InterProScan
●   Signature search against an
    integrated resource of domains
    and functional sites
●   Easy to install, cluster-enabled
    (pleasantly parallel)
●   Maintained by EBI
●   Can annotate whole genomes
●   PIR, Pfam, TIGRFam, Panther,
    Prodom, PRINTS,...
●   Needs a visualization tool for
    population / metagenomic
    annotation
Open XML file                                  Charting
   Python SAX Parser
                       GUI: wxPython
                       Excel export: xlwt

Full Databases

                                            IPRStats
                       File Help
                           PFAM
                            PIR
                          GENE3D
          Aggregate
           Queries




                          HAMAP
                         PANTHER
                          PRINTS
                         PRODOM
Resulting Tables         PROFILE
                         PROSITE
                          SMART
                       SUPERFAMILY
                        TIGRFAMs
IPRStats Architecture


                         IPRStats                     standalone
importers                (wx.Frame)

                                                                Menu
       XML                                                   (wx.MenuBar)


                                                           PropertiesDlg
        IPS                                                   (wx.Dialog)
                                      Settings
                                                                 Chart
                                                           (wx.StaticBitmap)
exporters
                                                                 Table
                                                        (wx.PyGridTableBase)
      HTML
                           StatsData
        XLS
     (using xlwt)
                                                     Results
                                                 (sqlite or pytables)
        IPS
?
What is PyTables?
   - package for creating data structures that can handle large amounts of data
   - uses NumPy (for in memory) and HDF5 (for disk storage) structures
   - uses Numexpr (jit compiler) for evaluating expressions (like queries)
   - in the context of IPRScan, it provides a way of accessing a huge table
     of data without requiring that all the data be in memory


                   Pros                                        Cons
- HDF5 provides very fast, compact and       - Large memory overhead (particularly
efficient indexing                           in comparison to smaller datasets)
- NumPy provides efficient in-memory         - Many large, complex dependencies
storage                                      including HDF5, NumPy, Numexpr and
- Minimizes disk and memory usage            Cython
- Very fast read times compared to           - Slow write times (particularly important
SQLite and MySQL                             since IPRStats bottlenecks with writing)
Multiple graph formats


                            Pie charts




Bar graphs
Conclusions & Future
●   A lightweight, machine-independent
    visualization tool for InterProScan annotations
●   License: AFL
●   Todo:
    ●   Comparative population analysis
    ●   Large dataset handling
    ●   More graphic options
    ●   Anything else you like...
        –   http://github.com/devrkel/IPRStats.git
Thanks
●   David Ream
●   Han Wang
●   Ian Fleming
●   David Vincent
●   Ryan Kelly
●   EBI
●   Miami University startup funding
●   Miami University Undergraduate Summer Scholars
    Program
The Friedberg Lab is Recruiting
●   Graduate students
●   Postdocs
●   Catch me later, email me, or look at
    iddo-friedberg.net to learn more

Contenu connexe

En vedette

Potential Public Health Impact of Activities Related to the Marcellus Shale
Potential Public Health Impact of Activities Related to the Marcellus ShalePotential Public Health Impact of Activities Related to the Marcellus Shale
Potential Public Health Impact of Activities Related to the Marcellus ShaleCary Institute of Ecosystem Studies
 
Camera buying guidelines
Camera buying guidelinesCamera buying guidelines
Camera buying guidelinesThomas Klose
 
pH Perfect Technology
pH Perfect TechnologypH Perfect Technology
pH Perfect TechnologyJean Smith
 
Tiga perwakilan indonesia bertanding di hongkong open memory championship
Tiga perwakilan indonesia bertanding  di hongkong open memory championshipTiga perwakilan indonesia bertanding  di hongkong open memory championship
Tiga perwakilan indonesia bertanding di hongkong open memory championshipYudi Lesmana
 
Unc Bedrijfspresentatie
Unc BedrijfspresentatieUnc Bedrijfspresentatie
Unc Bedrijfspresentatielouisa_stern
 
Twitterを利用した学生生活活性化案
Twitterを利用した学生生活活性化案Twitterを利用した学生生活活性化案
Twitterを利用した学生生活活性化案maruri0423
 
Europe the summer of 2013
Europe the summer of 2013Europe the summer of 2013
Europe the summer of 2013tomdinapoli
 
Building and publishing e book
Building and publishing e bookBuilding and publishing e book
Building and publishing e bookVera Akpokodje
 
Chap016 customer retention
Chap016 customer retentionChap016 customer retention
Chap016 customer retentionHee Young Shin
 
Processing from kelleman
Processing from kellemanProcessing from kelleman
Processing from kellemanVision of Hope
 
Chap001 business markets & business marketing
Chap001 business markets & business marketingChap001 business markets & business marketing
Chap001 business markets & business marketingHee Young Shin
 
AccessGreatStudents: Pharmaceutical Management BSc (Hons)
AccessGreatStudents: Pharmaceutical Management BSc (Hons)AccessGreatStudents: Pharmaceutical Management BSc (Hons)
AccessGreatStudents: Pharmaceutical Management BSc (Hons)AccessGreatStudents
 
Educatieve spelen voor upload naar moodle
Educatieve spelen voor upload naar moodleEducatieve spelen voor upload naar moodle
Educatieve spelen voor upload naar moodleCVO-SSH
 
Westweaves Profile
Westweaves ProfileWestweaves Profile
Westweaves Profileanantdamani
 

En vedette (18)

Potential Public Health Impact of Activities Related to the Marcellus Shale
Potential Public Health Impact of Activities Related to the Marcellus ShalePotential Public Health Impact of Activities Related to the Marcellus Shale
Potential Public Health Impact of Activities Related to the Marcellus Shale
 
Camera buying guidelines
Camera buying guidelinesCamera buying guidelines
Camera buying guidelines
 
pH Perfect Technology
pH Perfect TechnologypH Perfect Technology
pH Perfect Technology
 
Access Great Students
Access Great StudentsAccess Great Students
Access Great Students
 
Evaluacion de biologia 10°
Evaluacion de biologia 10°Evaluacion de biologia 10°
Evaluacion de biologia 10°
 
Tiga perwakilan indonesia bertanding di hongkong open memory championship
Tiga perwakilan indonesia bertanding  di hongkong open memory championshipTiga perwakilan indonesia bertanding  di hongkong open memory championship
Tiga perwakilan indonesia bertanding di hongkong open memory championship
 
Big data
Big data Big data
Big data
 
Unc Bedrijfspresentatie
Unc BedrijfspresentatieUnc Bedrijfspresentatie
Unc Bedrijfspresentatie
 
Twitterを利用した学生生活活性化案
Twitterを利用した学生生活活性化案Twitterを利用した学生生活活性化案
Twitterを利用した学生生活活性化案
 
Europe the summer of 2013
Europe the summer of 2013Europe the summer of 2013
Europe the summer of 2013
 
Building and publishing e book
Building and publishing e bookBuilding and publishing e book
Building and publishing e book
 
Chap016 customer retention
Chap016 customer retentionChap016 customer retention
Chap016 customer retention
 
Processing from kelleman
Processing from kellemanProcessing from kelleman
Processing from kelleman
 
Chap001 business markets & business marketing
Chap001 business markets & business marketingChap001 business markets & business marketing
Chap001 business markets & business marketing
 
AccessGreatStudents: Pharmaceutical Management BSc (Hons)
AccessGreatStudents: Pharmaceutical Management BSc (Hons)AccessGreatStudents: Pharmaceutical Management BSc (Hons)
AccessGreatStudents: Pharmaceutical Management BSc (Hons)
 
Offers Market Analysis
Offers Market AnalysisOffers Market Analysis
Offers Market Analysis
 
Educatieve spelen voor upload naar moodle
Educatieve spelen voor upload naar moodleEducatieve spelen voor upload naar moodle
Educatieve spelen voor upload naar moodle
 
Westweaves Profile
Westweaves ProfileWestweaves Profile
Westweaves Profile
 

Similaire à Friedberg bosc2010 iprstats

Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Barbera van Schaik
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging EnvironmentsPaul Groth
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopEvert Lammerts
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesYasset Perez-Riverol
 
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...David Peyruc
 
From Silicon to Software - IIT Madras
From Silicon to Software - IIT MadrasFrom Silicon to Software - IIT Madras
From Silicon to Software - IIT MadrasAanjhan Ranganathan
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewLuciano Resende
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPykammeyer
 
Data cycle microbes
Data cycle microbesData cycle microbes
Data cycle microbesjyotikhadake
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysiscursoNGS
 
Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1sairahul321
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Vince Smith
 

Similaire à Friedberg bosc2010 iprstats (20)

Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...Initial steps towards a production platform for DNA sequence analysis on the ...
Initial steps towards a production platform for DNA sequence analysis on the ...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Provenance for Data Munging Environments
Provenance for Data Munging EnvironmentsProvenance for Data Munging Environments
Provenance for Data Munging Environments
 
Py tables
Py tablesPy tables
Py tables
 
PyTables
PyTablesPyTables
PyTables
 
Large Data Analyze With PyTables
Large Data Analyze With PyTablesLarge Data Analyze With PyTables
Large Data Analyze With PyTables
 
iRODS
iRODSiRODS
iRODS
 
Large-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with HadoopLarge-Scale Data Storage and Processing for Scientists with Hadoop
Large-Scale Data Storage and Processing for Scientists with Hadoop
 
Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
PyTables
PyTablesPyTables
PyTables
 
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
tranSMART Community Meeting 5-7 Nov 13 - Session 1: Chilly-Mazarin Meeting Ob...
 
From Silicon to Software - IIT Madras
From Silicon to Software - IIT MadrasFrom Silicon to Software - IIT Madras
From Silicon to Software - IIT Madras
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
The Joy of SciPy
The Joy of SciPyThe Joy of SciPy
The Joy of SciPy
 
Data cycle microbes
Data cycle microbesData cycle microbes
Data cycle microbes
 
Computational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysisComputational infrastructure for NGS data analysis
Computational infrastructure for NGS data analysis
 
I say emulate
I say emulateI say emulate
I say emulate
 
Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1Presentation Ispass 2012 Session6 Presentation1
Presentation Ispass 2012 Session6 Presentation1
 
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...Making your data work for you: Scratchpads, publishing & the biodiversity dat...
Making your data work for you: Scratchpads, publishing & the biodiversity dat...
 

Plus de BOSC 2010

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsBOSC 2010
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesBOSC 2010
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 embossBOSC 2010
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evokerBOSC 2010
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorBOSC 2010
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorBOSC 2010
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfBOSC 2010
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perlBOSC 2010
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopythonBOSC 2010
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBOSC 2010
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaBOSC 2010
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytowebBOSC 2010
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloBOSC 2010
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptxBOSC 2010
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiBOSC 2010
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitBOSC 2010
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010BOSC 2010
 

Plus de BOSC 2010 (20)

Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
Langmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomicsLangmead bosc2010 cloud-genomics
Langmead bosc2010 cloud-genomics
 
Schultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-servicesSchultheiss bosc2010 persistance-web-services
Schultheiss bosc2010 persistance-web-services
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
Rice bosc2010 emboss
Rice bosc2010 embossRice bosc2010 emboss
Rice bosc2010 emboss
 
Morris bosc2010 evoker
Morris bosc2010 evokerMorris bosc2010 evoker
Morris bosc2010 evoker
 
Kono bosc2010 pathway_projector
Kono bosc2010 pathway_projectorKono bosc2010 pathway_projector
Kono bosc2010 pathway_projector
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
Gautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductorGautier bosc2010 pythonbioconductor
Gautier bosc2010 pythonbioconductor
 
Gardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasfGardler bosc2010 community_developmentattheasf
Gardler bosc2010 community_developmentattheasf
 
Fields bosc2010 bio_perl
Fields bosc2010 bio_perlFields bosc2010 bio_perl
Fields bosc2010 bio_perl
 
Chapman bosc2010 biopython
Chapman bosc2010 biopythonChapman bosc2010 biopython
Chapman bosc2010 biopython
 
Bonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_rubyBonnal bosc2010 bio_ruby
Bonnal bosc2010 bio_ruby
 
Puton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rnaPuton bosc2010 bio_python-modules-rna
Puton bosc2010 bio_python-modules-rna
 
Bader bosc2010 cytoweb
Bader bosc2010 cytowebBader bosc2010 cytoweb
Bader bosc2010 cytoweb
 
Talevich bosc2010 bio-phylo
Talevich bosc2010 bio-phyloTalevich bosc2010 bio-phylo
Talevich bosc2010 bio-phylo
 
Zmasek bosc2010 aptx
Zmasek bosc2010 aptxZmasek bosc2010 aptx
Zmasek bosc2010 aptx
 
Wilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadiWilkinson bosc2010 moby-to-sadi
Wilkinson bosc2010 moby-to-sadi
 
Venkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkitVenkatesan bosc2010 onto-toolkit
Venkatesan bosc2010 onto-toolkit
 
Taylor bosc2010
Taylor bosc2010Taylor bosc2010
Taylor bosc2010
 

Dernier

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Dernier (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Friedberg bosc2010 iprstats

  • 1. IPRStats: a Visualization Tool for InterProScan Iddo Friedberg Microbiology and Computer Science & Software Engineering Miami University http://github.com/devrkel/IPRStats.git
  • 2. Microbes are Everywhere ● 1030 prokaryotic cells on Earth (give or take a couple) ● Dominate the biosphere ● 90% of the cells in your body are prokaryotic (1014) ● Found in the most hostile environments
  • 3. t os alm Microbes do Everything ● Nutrient reservoir: ● 4x1010 tons carbon (rivaling plants) ● 1x1010 tons Nitrogen ● 1x109 tons phosphorous ●
  • 4. Of course there is health... ● Communicable diseases ● Heart disease ● Gastric cancer ● Irritable Bowel Syndrome
  • 6. Microbial Genomics Phage phi-X174 1978: 5.5Kbp H. influenzae 1995: 1.7Mbp
  • 10. Microbes live in Communities & only 1% can be cultured
  • 11. What is Metagenomics? • Culture independent approach to study microbial communities – < 1% of microbes can be cultured – DNA directly isolated from environmental sample and sequenced • Examining genomic content of organisms in community/environment to better understand: – Diversity of organisms – Their roles and interactions in the ecosystem
  • 12. Metagenomics is the Application of Genomics to Communities
  • 13. Some things we can learn using Metagenomics ●Taxonomic content: Taxon diversity in a habitat (using taxonomic markers) • Functional content: biological functions, qualitative and quantitative profiles • Coping with the environment: differences in functional content between habitats • Decompose the biotic / abiotic elements in a habitat: metadata analysis
  • 14. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  • 15. A Metagenomic project ● Sequencing ● Assembly ● Diversity analysis ● Annotation ● Gene finding ● Function prediction ● Diversity analysis ● Comparative analysis
  • 16. A Metagenomic project ● Sequencing ● Assembly ● Annotation ● Gene finding Population ● Function prediction analysis tools ● Diversity analysis ● Comparative analysis
  • 17. InterProScan ● Signature search against an integrated resource of domains and functional sites ● Easy to install, cluster-enabled (pleasantly parallel) ● Maintained by EBI ● Can annotate whole genomes ● PIR, Pfam, TIGRFam, Panther, Prodom, PRINTS,... ● Needs a visualization tool for population / metagenomic annotation
  • 18. Open XML file Charting Python SAX Parser GUI: wxPython Excel export: xlwt Full Databases IPRStats File Help PFAM PIR GENE3D Aggregate Queries HAMAP PANTHER PRINTS PRODOM Resulting Tables PROFILE PROSITE SMART SUPERFAMILY TIGRFAMs
  • 19. IPRStats Architecture IPRStats standalone importers (wx.Frame) Menu XML (wx.MenuBar) PropertiesDlg IPS (wx.Dialog) Settings Chart (wx.StaticBitmap) exporters Table (wx.PyGridTableBase) HTML StatsData XLS (using xlwt) Results (sqlite or pytables) IPS
  • 20. ? What is PyTables? - package for creating data structures that can handle large amounts of data - uses NumPy (for in memory) and HDF5 (for disk storage) structures - uses Numexpr (jit compiler) for evaluating expressions (like queries) - in the context of IPRScan, it provides a way of accessing a huge table of data without requiring that all the data be in memory Pros Cons - HDF5 provides very fast, compact and - Large memory overhead (particularly efficient indexing in comparison to smaller datasets) - NumPy provides efficient in-memory - Many large, complex dependencies storage including HDF5, NumPy, Numexpr and - Minimizes disk and memory usage Cython - Very fast read times compared to - Slow write times (particularly important SQLite and MySQL since IPRStats bottlenecks with writing)
  • 21. Multiple graph formats Pie charts Bar graphs
  • 22.
  • 23.
  • 24. Conclusions & Future ● A lightweight, machine-independent visualization tool for InterProScan annotations ● License: AFL ● Todo: ● Comparative population analysis ● Large dataset handling ● More graphic options ● Anything else you like... – http://github.com/devrkel/IPRStats.git
  • 25. Thanks ● David Ream ● Han Wang ● Ian Fleming ● David Vincent ● Ryan Kelly ● EBI ● Miami University startup funding ● Miami University Undergraduate Summer Scholars Program
  • 26. The Friedberg Lab is Recruiting ● Graduate students ● Postdocs ● Catch me later, email me, or look at iddo-friedberg.net to learn more