SlideShare a Scribd company logo
1 of 23
Important Protein Databases
&
Proteomics Softwares
MOHD KYUM
Punjab Agricultural University
• Protein databases have become a crucial part of modern biology.
• Searching databases is often the first step in the study of a new protein.
• Huge amounts of data for protein structures, functions, and particularly sequences are being
generated which cannot be handled without using computer databases.
• Without the prior knowledge obtained from such searches, known information about the
protein could be missed, or an experiment could be repeated unnecessarily.
• Comparison between proteins and protein classification provide information about the
relationship between proteins within a genome or across different species, and hence offer
much more information than can be obtained by studying only an isolated protein.
Introduction to Protein Databases
Protein Databases
• The databases can be classified in following categories:
 Sequence Databases
 2D Gel Databases
 3D Structure Databases
 Polymorphism and Mutation Database
 Chemistry Databases
 Enzyme and Pathway Databases
 Ontologies, Specialized Protein Databases
 Family and Domain Databases,
 Gene Expression Databases
 Genome Annotation Databases
 Organism Specific Databases
 Phylogenomic Databases
 Protein-Protein Interaction Databases,
 Proteomic Databases,
 PTM Databases
 Other Miscellaneous Databases.
Protein Sequence Databases
https://www.ncbi.nlm.nih.gov/refseq/
https://www.uniprot.org/
UniProt
• UniProt provides more annotations than any other sequence database with a minimal level
of redundancy. It has following three components:
1. Protein knowledgebase- including Swiss-Prot (manually annotated and reviewed) and
TrEMBL (automatically annotated).
2. UniRef- sequence clusters for fast sequence similarity searches.
3. UniParc- sequence archive for keeping track of sequences and their identifiers.
 UniProt, as a curated protein sequence database, offers a portal to a wide range of
annotations, covering areas such as function, family, domain parsing, post-translational
modifications, and variants.
RefSec-NCBI
• The National Center for Biotechnology Information Reference Sequence (NCBI RefSeq) database provides
curated non-redundant sequences of genomic regions, transcripts and proteins for taxonomically diverse
organisms including Archaea, Bacteria, Eukaryotes, and Viruses.
• RefSeq database is derived from the sequence data available in the redundant archival database GenBank.
• RefSeq sequences include coding regions, conserved domains, variations etc. and enhanced annotations such
as publications, names, symbols, aliases, Gene IDs, and database cross-references.
• The sequences and annotations are generated using a combined approach of collaboration, automated
prediction, and manual curation.
• The RefSeq records can be directly accessed from NCBI web sites bysearch of the Nucleotide or Protein
databases, BLAST searches against selected databases and FTP downloads
Protein Structure Databases
http://www.wwpdb.org/ https://scop.mrc-lmb.cam.ac.uk/
WWPDB
• The World Wide Protein Data Bank (WWPDB) was established in 2003 as an
international collaboration to maintain a single and publicly available PDB Archive of
protein structural data.
• The “PDB Archive” is a collection of flat files in three different formats:
(A) Legacy PDB format (B) PDBx/mmCIF format (C) Protein Data Bank Markup Language
(PDBML) format.
• Each member site serves as a deposition, data processing and distribution site for the PDB
Archive, and each provides its own view of the primary data and a variety of tools and
resources.
SCOP
• SCOP (Structural Classification of Proteins) contains information about the classification
of protein structures along with their sequences information.
• It classified works under sub-categories with their features:
1. Class - Global characteristics 2. Fold - Similar “topology”
3. Superfamily - Clear structural homology 4. Family - Clear sequence homology
5. Protein - Functionally identical 6. Species - Unique sequences
It aims to provide an accurate, detailed, and comprehensive description of the structural and
evolutionary relationships amongst all proteins of known structure.
Protein Family Databases
http://pfam.xfam.org/ http://pantherdb.org/ https://prosite.expasy.org/
Pfam
• Pfam is a database of protein families represented as multiple sequence alignments and Hidden
Markov Models (HMMs).
• Pfam entries can be classified as Family (related protein regions), Domain (protein structural unit),
Repeat (multiple short protein structural units), Motifs (short protein structural unit outside global
domains).
• Related Pfam entries are grouped into clans based on sequence, structure or profile HMM
similarity.
• The Pfam database web site provides search interface for querying by sequence, keyword, domain
architecture, taxonomy, and browse interfaces for analyzing protein sequences for Pfam matches
and viewing Pfam annotations in domain architectures, sequence alignments, interactions, species
and protein structures in PDB.
PANTHER
• PANTHER is a database of gene families, including a phylogenetic tree for each family in which nodes of the
tree are annotated with gene attributes
• The main goals of PANTHER is the accurate inference (and practical application) of gene and protein function
over large sequence databases, using phylogenetic trees to extrapolate from the relatively sparse experimental
information from a few model organisms.
• The three types of gene attribute currently annotated in PANTHER are:
(A) Subfamily membership (B) Protein class and (C) Gene function
• The PANTHER website provides tools for functional analysis of lists of genes or proteins.
• PANTHER now includes stable database identifiers for inferred ancestral genes, which are used to associate
inferred gene attributes with particular genes in the common ancestral genomes of extant species.
PROSITE
• PROSITE is a database of documentation entries describing protein domains, families and
functional sites as well as associated patterns and profiles to identify them.
• The entries are derived from multiple alignments of homologous sequences and have the
advantage of identifying distant relationships between sequences.
• PROSITE includes a collection of ProRules based on profiles and patterns of functionally
and/or structurally critical amino acids that can be used to increase PROSITE’s
discriminatory power.
• The PROSITE web site provides keyword-based search and allows browsing by
documentation entry, ProRule description, taxonomic scope and number of positive hits.
Proteomics – An Introduction
• Proteomics is the recent branch of molecular biology concerned with the study of
proteome.
• The term proteomics was introduced in 1994.
• It has many roles in molecular biology field such as: study of structure and
function of proteins, 3D structure of proteins and, qualitative and quantitative
analysis of proteins.
• It has many applications including Clinical research, Drug discovery, Biomarkers,
Neurology, etc.
Proteomics Softwares
http://www.funrich.org/ http://prohitsms.com/Prohits_download/list.php http://proteowizard.sourceforge.net/
FunRich
• FunRich software, is an open-access software that facilitates the analysis of
proteomics data, providing tools for functional enrichment and interaction
network analysis of genes and proteins.
• FunRich is a reinterpretation of proteomic software, a standalone tool
combining ease of use with customizable databases, free access, and
graphical representations.
ProHits
• ProHits is a complete open source software solution for MS (Mass Spectrometric) based
interaction proteomics that manages the entire pipeline from raw MS data files to fully
annotated protein-protein interaction data sets.
• It was designed to provide an intuitive user interface from the biologist’s perspective and
can accommodate multiple instruments within a facility, multiple user groups, multiple
laboratory locations and any number of parallel projects.
• ProHits can manage all project scales and supports common experimental pipelines,
including those using gel-based separation, gel-free analysis and multidimensional protein
or peptide separation.
ProteoWizard
• ProteoWizard provides a modular and extensible set of open-source, cross-platform tools
and libraries.
• The tools perform proteomics data analyses; the libraries enable rapid tool creation by
providing a robust, pluggable development framework that simplifies and unifies data file
access, and performs standard chemistry and LCMS dataset computations.
• The primary goal of ProteoWizard is to eliminate the existing barriers to proteomic
software development so that researchers can focus on the development of new analytic
approaches, rather than having to dedicate significant resources to mundane (if important)
tasks, like reading data files.
Proteomics Databases
https://www.proteomicsdb.org/
http:// ppdb.tc.cornell.edu https://www.ebi.ac.uk/pride/
PPDB
• PPDB is a Plant Proteome DataBase for Arabidopsis thaliana and maize (Zea
mays).
• Initially PPDB was dedicated to plant plastids, but has now expanded to the
whole plant proteome – hence it was renamed from Plastid PDB to Plant PDB
in November 2007.
• The PPDB stores experimental data from in-house proteome and mass
spectrometry analysis, curated information about protein function, protein
properties and subcellular localization.
PRIDE
• The PRoteomics IDEntifications database (PRIDE) is a repository for massspectrometry
based proteomics data including identifications of proteins, peptides and post-translational
modifications that have been described in the scientificliterature, together with supporting
mass spectra and related technical and biological metadata.
• PRIDE supports tandem MS (MS/MS) and Peptide Fingerprinting datasets with
search/analysis workflows originally analyzed by the submitters.
• PIRDE provides several services such as the Protein Identifier Cross-Reference (PICR),
the Ontology Lookup Service (OLS) and Database on Demand.
ProteomicsDB
• ProteomicsDB (Data base) is an effort of the Technische Universität
München (TUM).
• It is dedicated to expedite the identification of the human proteome
and its use across the scientific community.
Important protein databases and proteomics softwares

More Related Content

What's hot

What's hot (20)

Pubchem
PubchemPubchem
Pubchem
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Biological databases
Biological databasesBiological databases
Biological databases
 
PROTEIN DATABASE
PROTEIN DATABASEPROTEIN DATABASE
PROTEIN DATABASE
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Biological databases
Biological databases Biological databases
Biological databases
 
protein data bank
protein data bankprotein data bank
protein data bank
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Sequence similarity tools.pptx
Sequence similarity tools.pptxSequence similarity tools.pptx
Sequence similarity tools.pptx
 
Protein database
Protein  databaseProtein  database
Protein database
 
European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)European molecular biology laboratory (EMBL)
European molecular biology laboratory (EMBL)
 
Protein Data Bank
Protein Data BankProtein Data Bank
Protein Data Bank
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Cath
CathCath
Cath
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Bioinformatics in drug discovery
Bioinformatics in drug discoveryBioinformatics in drug discovery
Bioinformatics in drug discovery
 

Similar to Important protein databases and proteomics softwares

biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptxscience lover
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEPrashantSharma807
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyChrist College, Rajkot
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological databaseKAUSHAL SAHU
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.pptSanthiyaAK
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxVandana Yadav03
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsmaulikchaudhary8
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 

Similar to Important protein databases and proteomics softwares (20)

Proteins databases
Proteins databasesProteins databases
Proteins databases
 
PIR- Protein Information Resource
PIR- Protein Information ResourcePIR- Protein Information Resource
PIR- Protein Information Resource
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
biological databases.pptx
biological databases.pptxbiological databases.pptx
biological databases.pptx
 
Introduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASEIntroduction OF BIOLOGICAL DATABASE
Introduction OF BIOLOGICAL DATABASE
 
Proteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASyProteomics resources at the EBI & ExPASy
Proteomics resources at the EBI & ExPASy
 
Protein database
Protein databaseProtein database
Protein database
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Primary, secondary, tertiary biological database
Primary, secondary, tertiary biological databasePrimary, secondary, tertiary biological database
Primary, secondary, tertiary biological database
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Protein Database
Protein DatabaseProtein Database
Protein Database
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
protein databases.ppt
protein databases.pptprotein databases.ppt
protein databases.ppt
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 

Recently uploaded

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsSérgio Sacani
 

Recently uploaded (20)

Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Important protein databases and proteomics softwares

  • 1. Important Protein Databases & Proteomics Softwares MOHD KYUM Punjab Agricultural University
  • 2. • Protein databases have become a crucial part of modern biology. • Searching databases is often the first step in the study of a new protein. • Huge amounts of data for protein structures, functions, and particularly sequences are being generated which cannot be handled without using computer databases. • Without the prior knowledge obtained from such searches, known information about the protein could be missed, or an experiment could be repeated unnecessarily. • Comparison between proteins and protein classification provide information about the relationship between proteins within a genome or across different species, and hence offer much more information than can be obtained by studying only an isolated protein. Introduction to Protein Databases
  • 3. Protein Databases • The databases can be classified in following categories:  Sequence Databases  2D Gel Databases  3D Structure Databases  Polymorphism and Mutation Database  Chemistry Databases  Enzyme and Pathway Databases  Ontologies, Specialized Protein Databases  Family and Domain Databases,  Gene Expression Databases  Genome Annotation Databases  Organism Specific Databases  Phylogenomic Databases  Protein-Protein Interaction Databases,  Proteomic Databases,  PTM Databases  Other Miscellaneous Databases.
  • 5. UniProt • UniProt provides more annotations than any other sequence database with a minimal level of redundancy. It has following three components: 1. Protein knowledgebase- including Swiss-Prot (manually annotated and reviewed) and TrEMBL (automatically annotated). 2. UniRef- sequence clusters for fast sequence similarity searches. 3. UniParc- sequence archive for keeping track of sequences and their identifiers.  UniProt, as a curated protein sequence database, offers a portal to a wide range of annotations, covering areas such as function, family, domain parsing, post-translational modifications, and variants.
  • 6. RefSec-NCBI • The National Center for Biotechnology Information Reference Sequence (NCBI RefSeq) database provides curated non-redundant sequences of genomic regions, transcripts and proteins for taxonomically diverse organisms including Archaea, Bacteria, Eukaryotes, and Viruses. • RefSeq database is derived from the sequence data available in the redundant archival database GenBank. • RefSeq sequences include coding regions, conserved domains, variations etc. and enhanced annotations such as publications, names, symbols, aliases, Gene IDs, and database cross-references. • The sequences and annotations are generated using a combined approach of collaboration, automated prediction, and manual curation. • The RefSeq records can be directly accessed from NCBI web sites bysearch of the Nucleotide or Protein databases, BLAST searches against selected databases and FTP downloads
  • 7. Protein Structure Databases http://www.wwpdb.org/ https://scop.mrc-lmb.cam.ac.uk/
  • 8. WWPDB • The World Wide Protein Data Bank (WWPDB) was established in 2003 as an international collaboration to maintain a single and publicly available PDB Archive of protein structural data. • The “PDB Archive” is a collection of flat files in three different formats: (A) Legacy PDB format (B) PDBx/mmCIF format (C) Protein Data Bank Markup Language (PDBML) format. • Each member site serves as a deposition, data processing and distribution site for the PDB Archive, and each provides its own view of the primary data and a variety of tools and resources.
  • 9. SCOP • SCOP (Structural Classification of Proteins) contains information about the classification of protein structures along with their sequences information. • It classified works under sub-categories with their features: 1. Class - Global characteristics 2. Fold - Similar “topology” 3. Superfamily - Clear structural homology 4. Family - Clear sequence homology 5. Protein - Functionally identical 6. Species - Unique sequences It aims to provide an accurate, detailed, and comprehensive description of the structural and evolutionary relationships amongst all proteins of known structure.
  • 10. Protein Family Databases http://pfam.xfam.org/ http://pantherdb.org/ https://prosite.expasy.org/
  • 11. Pfam • Pfam is a database of protein families represented as multiple sequence alignments and Hidden Markov Models (HMMs). • Pfam entries can be classified as Family (related protein regions), Domain (protein structural unit), Repeat (multiple short protein structural units), Motifs (short protein structural unit outside global domains). • Related Pfam entries are grouped into clans based on sequence, structure or profile HMM similarity. • The Pfam database web site provides search interface for querying by sequence, keyword, domain architecture, taxonomy, and browse interfaces for analyzing protein sequences for Pfam matches and viewing Pfam annotations in domain architectures, sequence alignments, interactions, species and protein structures in PDB.
  • 12. PANTHER • PANTHER is a database of gene families, including a phylogenetic tree for each family in which nodes of the tree are annotated with gene attributes • The main goals of PANTHER is the accurate inference (and practical application) of gene and protein function over large sequence databases, using phylogenetic trees to extrapolate from the relatively sparse experimental information from a few model organisms. • The three types of gene attribute currently annotated in PANTHER are: (A) Subfamily membership (B) Protein class and (C) Gene function • The PANTHER website provides tools for functional analysis of lists of genes or proteins. • PANTHER now includes stable database identifiers for inferred ancestral genes, which are used to associate inferred gene attributes with particular genes in the common ancestral genomes of extant species.
  • 13. PROSITE • PROSITE is a database of documentation entries describing protein domains, families and functional sites as well as associated patterns and profiles to identify them. • The entries are derived from multiple alignments of homologous sequences and have the advantage of identifying distant relationships between sequences. • PROSITE includes a collection of ProRules based on profiles and patterns of functionally and/or structurally critical amino acids that can be used to increase PROSITE’s discriminatory power. • The PROSITE web site provides keyword-based search and allows browsing by documentation entry, ProRule description, taxonomic scope and number of positive hits.
  • 14. Proteomics – An Introduction • Proteomics is the recent branch of molecular biology concerned with the study of proteome. • The term proteomics was introduced in 1994. • It has many roles in molecular biology field such as: study of structure and function of proteins, 3D structure of proteins and, qualitative and quantitative analysis of proteins. • It has many applications including Clinical research, Drug discovery, Biomarkers, Neurology, etc.
  • 16. FunRich • FunRich software, is an open-access software that facilitates the analysis of proteomics data, providing tools for functional enrichment and interaction network analysis of genes and proteins. • FunRich is a reinterpretation of proteomic software, a standalone tool combining ease of use with customizable databases, free access, and graphical representations.
  • 17. ProHits • ProHits is a complete open source software solution for MS (Mass Spectrometric) based interaction proteomics that manages the entire pipeline from raw MS data files to fully annotated protein-protein interaction data sets. • It was designed to provide an intuitive user interface from the biologist’s perspective and can accommodate multiple instruments within a facility, multiple user groups, multiple laboratory locations and any number of parallel projects. • ProHits can manage all project scales and supports common experimental pipelines, including those using gel-based separation, gel-free analysis and multidimensional protein or peptide separation.
  • 18. ProteoWizard • ProteoWizard provides a modular and extensible set of open-source, cross-platform tools and libraries. • The tools perform proteomics data analyses; the libraries enable rapid tool creation by providing a robust, pluggable development framework that simplifies and unifies data file access, and performs standard chemistry and LCMS dataset computations. • The primary goal of ProteoWizard is to eliminate the existing barriers to proteomic software development so that researchers can focus on the development of new analytic approaches, rather than having to dedicate significant resources to mundane (if important) tasks, like reading data files.
  • 20. PPDB • PPDB is a Plant Proteome DataBase for Arabidopsis thaliana and maize (Zea mays). • Initially PPDB was dedicated to plant plastids, but has now expanded to the whole plant proteome – hence it was renamed from Plastid PDB to Plant PDB in November 2007. • The PPDB stores experimental data from in-house proteome and mass spectrometry analysis, curated information about protein function, protein properties and subcellular localization.
  • 21. PRIDE • The PRoteomics IDEntifications database (PRIDE) is a repository for massspectrometry based proteomics data including identifications of proteins, peptides and post-translational modifications that have been described in the scientificliterature, together with supporting mass spectra and related technical and biological metadata. • PRIDE supports tandem MS (MS/MS) and Peptide Fingerprinting datasets with search/analysis workflows originally analyzed by the submitters. • PIRDE provides several services such as the Protein Identifier Cross-Reference (PICR), the Ontology Lookup Service (OLS) and Database on Demand.
  • 22. ProteomicsDB • ProteomicsDB (Data base) is an effort of the Technische Universität München (TUM). • It is dedicated to expedite the identification of the human proteome and its use across the scientific community.