SlideShare une entreprise Scribd logo
1  sur  44
Télécharger pour lire hors ligne
Basic bioinformatics concepts,
                      databases and tools
                                                       Module 4
                                       Beyond the sequences

                                                    Dr. Joachim Jacob
                                                http://www.bits.vib.be

Updated Nov 2011
http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod4-intro_H1_2011_otherRelevantData.pdf
Module 4 broadens our view
To understand life, we need not only
sequences, but many other concepts
      
          Bioinformatics is also storing and analyzing
             −   gene information: variations, isoforms,...
             −   Expression data
             −   3D protein structure data
             −   Interaction data
             −   Pathways and network


                     “Storing all relevant biological data”
Schematic view II
GeneA                sequence     annotations – gene expr – pathway – struct,...

GeneB                sequence     annotations – gene expr – pathway – struct,...

GeneC                sequence     annotations – gene expr – pathway – struct,...


                       analysis                  Additional information
                                                        sources
                   results   results
Primary database
Other sequence
databases
The indispensable databases
      
          Gene Ontology – structuring
      
          KEGG – biochemical pathways
      
          PDB – Structure of proteins
      
          Intact – Interaction data
      
          dbSNP – database of genomic variation
      
          Expression sources – Microarray data
Gene Ontology structures the way we
communicate about life




Gene translation                  Protein production                 Protein synthesis



                                            http://www.arabidopsis.org/help/tutorials/go1.jsp
  http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
Gene Ontology structures life
               http://www.geneontology.org/
               Agreement on standardized keywords (often referred to as
                 'controlled vocabularies'), describing all natural processes in an
                 hierarchical way (ontology).
               Keywords are assigned to genes based different evidence
               Keywords are ordered in a hierarchical tree-like structure ( 'directed
                 acyclic graphs')
               Three GO 'trees' exists, describing:
                                 "Biological Process"
                                 "Cellular Component"
                                 "Molecular Function"
                                           http://www.arabidopsis.org/help/tutorials/go1.jsp
 http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
A gene can be given
different GO terms

 Example, cytochrome c:

     molecular function: oxidoreductase activity,

     biological process: oxidative phosphorylation and
 induction of cell death,

     cellular component: mitochondrial matrix and
 mitochondrial inner membrane.

 In each tree, the terms are organised in a directed acyclic
 graph: a network consisting of parents and child-terms (as
 nodes) and lines between them as relationships.
Different evidence codes can assign a
degree of confidence to the assignment
         http://www.geneontology.org/GO.evidence.shtml

         Evidence codes can be grouped by:
         
             Experimental (e.g. IDA – inferred from direct assay)
         
             Computational analysis
         
             Author statement
         
             Curator statement
         
             Inferred from electronic annotation (IEA)
         If available, each annotation has also a reference
Different evidence codes can assign a
degree of confidence to the assignment
Gene Ontology structures all genes
according to their biological significance
         The GO structure and the terms can be browsed by a browser
           called AmiGO.
         The Quick Go from EBI has some nice visualisation
         Excellent GO-wiki for all your questions
GO can be used to retrieve all gene
(products) related to one specific term
         You can search broad, e.g. Amigo search for Diabetes
           leads to following GO term
         http://amigo.geneontology.org/
GO can be used to retrieve all gene
(products) related to one specific term
              Amigo search for Diabetes
GO can be used to retrieve all gene
(products) related to one specific term
              Amigo search for Diabetes
GO is also useful to analyze and compare
different gene lists
          A lot of tools on GO are available on website.




                                http://www.geneontology.org/GO.tools.shtml
Some things to know about GO
         For analyses, one can make use of 'shrinked' GO sets,
           the so-called GO-slims
                –   GO slims are a subset of biologically more
                    relevant GO terms (available per species)
                –   GO ontologies can be downloaded in .obo
                    format.
         Not all information is captured by GO and need to be
           retrieved in other databases
                Metabolic pathways: KEGG, …
                Phenotype/diseases
                       •   Mapping files exists e.g. kegg2go
                              http://www.geneontology.org/GO.slims.shtml
Biological pathways databases organise
genes by molecular reactions
        3 important databases on biological pathways
        
            http://www.kegg.jp/




           http://www.reactome.org/ - EBI
           http://metacyc.org
Proteins with enzymatic function receive
an Enzyme Commission (EC) number
        http://www.chem.qmul.ac.uk/iubmb/enzyme/
        EC 6   Ligases
        EC 5   Isomerases
        EC 4   Lyases
        EC 3   Hydrolases
        EC 2   Transferases
        EC 1   Oxidoreductases
IntAct database contains interaction
information of proteins
         http://www.ebi.ac.uk/intact
         Three types of interactions stored
            
                Protein-protein
            
                Protein-dna
            
                Protein-small molecule
IntAct database represents all
interactions as binary: caution!
Interaction networks can be analysed on
your computer using Cytoscape




                    Cytoscape training material on the BITS website
PDB hosts 3-dimensional
structural data on molecules
PDB hosts 3-dimensional
structural data on molecules

         PDB = Protein DataBank
             http://www.pdb.org/pdb/home/home.do
         Only structures resolved through NMR and X-ray
           (or other accurate techniques)
         
             Proteins
         
             DNA
         
             RNA
         
             Ligands

         Understanding PDB data: tutorial
PDB files can be read by a lot of different
  tools to display the structure
                       Every entry in PDB contains its own PDB accession
                         number (often 1 digit and three letters)
                       The PDB file contains 3D coordinates from every
                         single atom in the structure, together with
                         variability of that position (last two digits)




http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203817:protein-structure-
PDB files can be read by a lot of different
tools to display the structure
         Tools to visualize (and some to analyze
           structures) (see BITS wiki)




                      http://www.bits.vib.be/wiki/index.php/Protein_structure
To find a structure for your protein
  sequence is to search for similarity
               Homology modeling
               Similarity on sequence level projected to a structure
                    Blast your query against PDB db by cblast , or at expasy
                    PSI-BLAST - can detect sequences with similar structures
                     (twilight zone!)
                    If still no success: 3D-jury (a meta approach, including fold
                     recognition and local structure prediction)
               Similarity on structural level: aligning structures
                    VAST (structure)
                    Distance mAtrix aLIgnment DALI

                                             BITS training on protein structure analysis
                http://www.ii.uib.no/~slars/bioinfocourse/PDFs/structpred_tutorial.pdf
Tools at EBI                           http://consurf.tau.ac.il/pe/protexpl/psbiores.htm
Structural information is used to classify
proteins              Database cross-references in PDB entry




             
                 SCOP
             Groups proteins based on evolutionary, domain
               architecture and structural information.
             
                 CATH
             Manually curated classification on protein domains

                                           http://scop.mrc-lmb.cam.ac.uk/scop/
                                                        http://www.cathdb.info/
dbSNP is a public-domain archive for
simple genetic polymorphisms
      
          Single Nucleotide Polymorphism database (NCBI)
      
          Each dbSNP entry has a code rsxx (RefSNP) or ssxx
          (submitted SNP)
          
              single-base nucleotide substitutions (also known as
              single nucleotide polymorphisms or SNPs),
          
              small-scale multi-base deletions or insertions (also
              called deletion insertion polymorphisms or DIPs)
          
              retroposable element insertions and microsatellite
              repeat variations (also called short tandem repeats or
              STRs).
      
          Synchronized with new genome builds
Expression data can be sequence-based
or hybridisation-based
      Sequence-based (ESTs - RNA seq - SAGE)
        
            Digital gene expression/northern
      Microarray databases – hybridisation based:
        
            GEO: gene expression omnibus (NCBI)
             −   Platform: GPLxxxxxxx
             −   Experiment: GSExxxxxx (= several samples)
             −   Sample: GSMxxxxxxxx
             −   Some experiments are curated: GDSxxxxx (online
                 analysis possible)
        
            ArrayExpress (EBI)
Example of expression data at GEO
Example of expression data at GEO
Example of expression data at GEO
Example at ArrayExpress
Example at ArrayExpress
Entrez interconnects the databases at
NCBI for easy querying
        
            UniGene : sequences grouped by gene
        
            PopSet : sequence alignments for population
            studies and phylogeny
        
            Structure : 3D structures (PDB)
        
            Genome : genomic maps of chromosomes and
            plasmids
        
            UniSTS (Sequence Tagged Sites)
        
            PubMed : literature abstracts (MEDLINE,…)
        
            OMIM (Online Mendelian Inheritance in Man) :
            literature reviews,
        
            Mesh (Medical Subject Headings) : keywords
        
            Taxonomy
Finding relevant data
Summarizing most important links to
discover everything you need ...
             Protein data
               Interpro (heavily integrated with EBI resources)
               http://www.interpro.org

             Gene data
               Entrez at NCBI : 'Entrez Gene'
               http://www.ncbi.nlm.nih.gov/Entrez/
               Ebeye Search at EBI : excellent for cross-species
               http://www.ebi.ac.uk/ebisearch/
Hold back your horses!

            Phew, where do I place this all?
Bioinformatics is all about different data,
as versatile as life itself
            Due to the strong cross-references between
              different databases, new databases and
              relevant info are rapidly integrated in existing
              databases.
            You can discover them by taking time to read the
              entries.
New tools are emerging everyday to
enable you to browse all data sources...
         BioGPS, all in one window!
New tools are emerging everyday to
enable you to browse all data sources...
Integrative resources are increasingly
being organised on a species basis
        
            EMAGE database of in situ gene expression in mouse
        
            OMIM Database of diseases in man
        
            Websites providing an interface to integrate all
            this data is increasingly important
        
            Often organized on a species basis
             −   TAIR
             −   Flybase
             −   Wormbase
The organizing biological data
information by species

                     By species, why?
  There is one biological information resource which stays
           more or less unchanged per species ...

Contenu connexe

Tendances (20)

Uni prot presentation
Uni prot presentationUni prot presentation
Uni prot presentation
 
Web based servers and softwares for genome analysis
Web based servers and softwares for genome analysisWeb based servers and softwares for genome analysis
Web based servers and softwares for genome analysis
 
EMBL
EMBLEMBL
EMBL
 
Protein Databases
Protein DatabasesProtein Databases
Protein Databases
 
Cath
CathCath
Cath
 
Tools and database of NCBI
Tools and database of NCBITools and database of NCBI
Tools and database of NCBI
 
prediction methods for ORF
prediction methods for ORFprediction methods for ORF
prediction methods for ORF
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)Gene Expression Omnibus (GEO)
Gene Expression Omnibus (GEO)
 
Protein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modelingProtein fold recognition and ab_initio modeling
Protein fold recognition and ab_initio modeling
 
The ensembl database
The ensembl databaseThe ensembl database
The ensembl database
 
Sequence Submission Tools
Sequence Submission ToolsSequence Submission Tools
Sequence Submission Tools
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
COMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.pptCOMPARATIVE GENOMICS.ppt
COMPARATIVE GENOMICS.ppt
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Gemome annotation
Gemome annotationGemome annotation
Gemome annotation
 
Protein sequence databases
Protein sequence databasesProtein sequence databases
Protein sequence databases
 
Kegg
KeggKegg
Kegg
 
Swiss prot database
Swiss prot databaseSwiss prot database
Swiss prot database
 

En vedette

BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITS
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 
The important bits of cloud computing
The important bits of cloud computingThe important bits of cloud computing
The important bits of cloud computingCarsonified Team
 
L01 ecture 01-
L01 ecture 01-L01 ecture 01-
L01 ecture 01-MUBOSScz
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyJoaquin Dopazo
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database SystemsDenis Shestakov
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databasesCharu Sharma
 
September 1 Day Workshop
September 1 Day WorkshopSeptember 1 Day Workshop
September 1 Day WorkshopThe Biome
 
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLSDRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLSNIPER MOHALI
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformaticsavrilcoghlan
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES nadeem akhter
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing Ayesha Aftab
 

En vedette (20)

BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.BITs: Genome browsers and interpretation of gene lists.
BITs: Genome browsers and interpretation of gene lists.
 
BITS: Basics of sequence databases
BITS: Basics of sequence databasesBITS: Basics of sequence databases
BITS: Basics of sequence databases
 
BITS: Basics of Sequence similarity
BITS: Basics of Sequence similarityBITS: Basics of Sequence similarity
BITS: Basics of Sequence similarity
 
BITS: Basics of sequence analysis
BITS: Basics of sequence analysisBITS: Basics of sequence analysis
BITS: Basics of sequence analysis
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
The important bits of cloud computing
The important bits of cloud computingThe important bits of cloud computing
The important bits of cloud computing
 
L01 ecture 01-
L01 ecture 01-L01 ecture 01-
L01 ecture 01-
 
Bioinformatics in dermato-oncology
Bioinformatics in dermato-oncologyBioinformatics in dermato-oncology
Bioinformatics in dermato-oncology
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Biological Database Systems
Biological Database SystemsBiological Database Systems
Biological Database Systems
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
Features of biological databases
Features of biological databasesFeatures of biological databases
Features of biological databases
 
September 1 Day Workshop
September 1 Day WorkshopSeptember 1 Day Workshop
September 1 Day Workshop
 
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLSDRUG DESIGN BASED ON BIOINFORMATICS TOOLS
DRUG DESIGN BASED ON BIOINFORMATICS TOOLS
 
Dotplots for Bioinformatics
Dotplots for BioinformaticsDotplots for Bioinformatics
Dotplots for Bioinformatics
 
Bioinformatics and Drug Discovery
Bioinformatics and Drug DiscoveryBioinformatics and Drug Discovery
Bioinformatics and Drug Discovery
 
BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES BIOLOGICAL SEQUENCE DATABASES
BIOLOGICAL SEQUENCE DATABASES
 
Multiple sequence alignment
Multiple sequence alignmentMultiple sequence alignment
Multiple sequence alignment
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Computer aided drug designing
Computer aided drug designing Computer aided drug designing
Computer aided drug designing
 

Similaire à BITS: Overview of important biological databases beyond sequences

Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introductionDrGopaSarma
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbgetSurendraKumar338
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfkigaruantony
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxRAJESHKUMAR428748
 

Similaire à BITS: Overview of important biological databases beyond sequences (20)

Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Biological database
Biological databaseBiological database
Biological database
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
Databases
DatabasesDatabases
Databases
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Data retriveal ,srg and dbget
Data retriveal ,srg and dbgetData retriveal ,srg and dbget
Data retriveal ,srg and dbget
 
Proteins databases
Proteins databasesProteins databases
Proteins databases
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Chibucos annot go_final
Chibucos annot go_finalChibucos annot go_final
Chibucos annot go_final
 
Introduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptxIntroduction to Biological database ppt(1).pptx
Introduction to Biological database ppt(1).pptx
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 

Plus de BITS

RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5BITS
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4BITS
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6BITS
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2BITS
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1BITS
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3BITS
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsBITS
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsBITS
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsBITS
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsBITS
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsBITS
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS
 

Plus de BITS (20)

RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5RNA-seq for DE analysis: detecting differential expression - part 5
RNA-seq for DE analysis: detecting differential expression - part 5
 
RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4RNA-seq for DE analysis: extracting counts and QC - part 4
RNA-seq for DE analysis: extracting counts and QC - part 4
 
RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6RNA-seq for DE analysis: the biology behind observed changes - part 6
RNA-seq for DE analysis: the biology behind observed changes - part 6
 
RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2RNA-seq: analysis of raw data and preprocessing - part 2
RNA-seq: analysis of raw data and preprocessing - part 2
 
RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1RNA-seq: general concept, goal and experimental design - part 1
RNA-seq: general concept, goal and experimental design - part 1
 
RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3RNA-seq: Mapping and quality control - part 3
RNA-seq: Mapping and quality control - part 3
 
Productivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformaticsProductivity tips - Introduction to linux for bioinformatics
Productivity tips - Introduction to linux for bioinformatics
 
Text mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformaticsText mining on the command line - Introduction to linux for bioinformatics
Text mining on the command line - Introduction to linux for bioinformatics
 
The structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformaticsThe structure of Linux - Introduction to Linux for bioinformatics
The structure of Linux - Introduction to Linux for bioinformatics
 
Managing your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformaticsManaging your data - Introduction to Linux for bioinformatics
Managing your data - Introduction to Linux for bioinformatics
 
Introduction to Linux for bioinformatics
Introduction to Linux for bioinformaticsIntroduction to Linux for bioinformatics
Introduction to Linux for bioinformatics
 
BITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics dataBITS - Genevestigator to easily access transcriptomics data
BITS - Genevestigator to easily access transcriptomics data
 
BITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra toolBITS - Comparative genomics: the Contra tool
BITS - Comparative genomics: the Contra tool
 
BITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome levelBITS - Comparative genomics on the genome level
BITS - Comparative genomics on the genome level
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
BITS - Introduction to comparative genomics
BITS - Introduction to comparative genomicsBITS - Introduction to comparative genomics
BITS - Introduction to comparative genomics
 
BITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry dataBITS - Protein inference from mass spectrometry data
BITS - Protein inference from mass spectrometry data
 
BITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysisBITS - Overview of sequence databases for mass spectrometry data analysis
BITS - Overview of sequence databases for mass spectrometry data analysis
 
BITS - Search engines for mass spec data
BITS - Search engines for mass spec dataBITS - Search engines for mass spec data
BITS - Search engines for mass spec data
 
BITS - Introduction to proteomics
BITS - Introduction to proteomicsBITS - Introduction to proteomics
BITS - Introduction to proteomics
 

Dernier

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 

Dernier (20)

How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 

BITS: Overview of important biological databases beyond sequences

  • 1. Basic bioinformatics concepts, databases and tools Module 4 Beyond the sequences Dr. Joachim Jacob http://www.bits.vib.be Updated Nov 2011 http://dl.dropbox.com/u/18352887/BITS_training_material/Link%20to%20mod4-intro_H1_2011_otherRelevantData.pdf
  • 2. Module 4 broadens our view
  • 3. To understand life, we need not only sequences, but many other concepts  Bioinformatics is also storing and analyzing − gene information: variations, isoforms,... − Expression data − 3D protein structure data − Interaction data − Pathways and network “Storing all relevant biological data”
  • 4. Schematic view II GeneA sequence annotations – gene expr – pathway – struct,... GeneB sequence annotations – gene expr – pathway – struct,... GeneC sequence annotations – gene expr – pathway – struct,... analysis Additional information sources results results Primary database Other sequence databases
  • 5. The indispensable databases  Gene Ontology – structuring  KEGG – biochemical pathways  PDB – Structure of proteins  Intact – Interaction data  dbSNP – database of genomic variation  Expression sources – Microarray data
  • 6. Gene Ontology structures the way we communicate about life Gene translation Protein production Protein synthesis http://www.arabidopsis.org/help/tutorials/go1.jsp http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
  • 7. Gene Ontology structures life http://www.geneontology.org/ Agreement on standardized keywords (often referred to as 'controlled vocabularies'), describing all natural processes in an hierarchical way (ontology). Keywords are assigned to genes based different evidence Keywords are ordered in a hierarchical tree-like structure ( 'directed acyclic graphs') Three GO 'trees' exists, describing: "Biological Process" "Cellular Component" "Molecular Function" http://www.arabidopsis.org/help/tutorials/go1.jsp http://www.geneontology.org/teaching_resources/tutorials/2005-09_BiB-journal-tutorial_jlomax
  • 8. A gene can be given different GO terms Example, cytochrome c: molecular function: oxidoreductase activity, biological process: oxidative phosphorylation and induction of cell death, cellular component: mitochondrial matrix and mitochondrial inner membrane. In each tree, the terms are organised in a directed acyclic graph: a network consisting of parents and child-terms (as nodes) and lines between them as relationships.
  • 9.
  • 10. Different evidence codes can assign a degree of confidence to the assignment http://www.geneontology.org/GO.evidence.shtml Evidence codes can be grouped by:  Experimental (e.g. IDA – inferred from direct assay)  Computational analysis  Author statement  Curator statement  Inferred from electronic annotation (IEA) If available, each annotation has also a reference
  • 11. Different evidence codes can assign a degree of confidence to the assignment
  • 12. Gene Ontology structures all genes according to their biological significance The GO structure and the terms can be browsed by a browser called AmiGO. The Quick Go from EBI has some nice visualisation Excellent GO-wiki for all your questions
  • 13. GO can be used to retrieve all gene (products) related to one specific term You can search broad, e.g. Amigo search for Diabetes leads to following GO term http://amigo.geneontology.org/
  • 14. GO can be used to retrieve all gene (products) related to one specific term Amigo search for Diabetes
  • 15. GO can be used to retrieve all gene (products) related to one specific term Amigo search for Diabetes
  • 16. GO is also useful to analyze and compare different gene lists A lot of tools on GO are available on website. http://www.geneontology.org/GO.tools.shtml
  • 17. Some things to know about GO For analyses, one can make use of 'shrinked' GO sets, the so-called GO-slims – GO slims are a subset of biologically more relevant GO terms (available per species) – GO ontologies can be downloaded in .obo format. Not all information is captured by GO and need to be retrieved in other databases Metabolic pathways: KEGG, … Phenotype/diseases • Mapping files exists e.g. kegg2go http://www.geneontology.org/GO.slims.shtml
  • 18. Biological pathways databases organise genes by molecular reactions 3 important databases on biological pathways  http://www.kegg.jp/  http://www.reactome.org/ - EBI  http://metacyc.org
  • 19. Proteins with enzymatic function receive an Enzyme Commission (EC) number http://www.chem.qmul.ac.uk/iubmb/enzyme/ EC 6 Ligases EC 5 Isomerases EC 4 Lyases EC 3 Hydrolases EC 2 Transferases EC 1 Oxidoreductases
  • 20. IntAct database contains interaction information of proteins http://www.ebi.ac.uk/intact Three types of interactions stored  Protein-protein  Protein-dna  Protein-small molecule
  • 21. IntAct database represents all interactions as binary: caution!
  • 22. Interaction networks can be analysed on your computer using Cytoscape Cytoscape training material on the BITS website
  • 24. PDB hosts 3-dimensional structural data on molecules PDB = Protein DataBank http://www.pdb.org/pdb/home/home.do Only structures resolved through NMR and X-ray (or other accurate techniques)  Proteins  DNA  RNA  Ligands Understanding PDB data: tutorial
  • 25. PDB files can be read by a lot of different tools to display the structure Every entry in PDB contains its own PDB accession number (often 1 digit and three letters) The PDB file contains 3D coordinates from every single atom in the structure, together with variability of that position (last two digits) http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203817:protein-structure-
  • 26. PDB files can be read by a lot of different tools to display the structure Tools to visualize (and some to analyze structures) (see BITS wiki) http://www.bits.vib.be/wiki/index.php/Protein_structure
  • 27. To find a structure for your protein sequence is to search for similarity Homology modeling Similarity on sequence level projected to a structure  Blast your query against PDB db by cblast , or at expasy  PSI-BLAST - can detect sequences with similar structures (twilight zone!)  If still no success: 3D-jury (a meta approach, including fold recognition and local structure prediction) Similarity on structural level: aligning structures  VAST (structure)  Distance mAtrix aLIgnment DALI BITS training on protein structure analysis http://www.ii.uib.no/~slars/bioinfocourse/PDFs/structpred_tutorial.pdf Tools at EBI http://consurf.tau.ac.il/pe/protexpl/psbiores.htm
  • 28. Structural information is used to classify proteins Database cross-references in PDB entry  SCOP Groups proteins based on evolutionary, domain architecture and structural information.  CATH Manually curated classification on protein domains http://scop.mrc-lmb.cam.ac.uk/scop/ http://www.cathdb.info/
  • 29. dbSNP is a public-domain archive for simple genetic polymorphisms  Single Nucleotide Polymorphism database (NCBI)  Each dbSNP entry has a code rsxx (RefSNP) or ssxx (submitted SNP)  single-base nucleotide substitutions (also known as single nucleotide polymorphisms or SNPs),  small-scale multi-base deletions or insertions (also called deletion insertion polymorphisms or DIPs)  retroposable element insertions and microsatellite repeat variations (also called short tandem repeats or STRs).  Synchronized with new genome builds
  • 30. Expression data can be sequence-based or hybridisation-based Sequence-based (ESTs - RNA seq - SAGE)  Digital gene expression/northern Microarray databases – hybridisation based:  GEO: gene expression omnibus (NCBI) − Platform: GPLxxxxxxx − Experiment: GSExxxxxx (= several samples) − Sample: GSMxxxxxxxx − Some experiments are curated: GDSxxxxx (online analysis possible)  ArrayExpress (EBI)
  • 31. Example of expression data at GEO
  • 32. Example of expression data at GEO
  • 33. Example of expression data at GEO
  • 36. Entrez interconnects the databases at NCBI for easy querying  UniGene : sequences grouped by gene  PopSet : sequence alignments for population studies and phylogeny  Structure : 3D structures (PDB)  Genome : genomic maps of chromosomes and plasmids  UniSTS (Sequence Tagged Sites)  PubMed : literature abstracts (MEDLINE,…)  OMIM (Online Mendelian Inheritance in Man) : literature reviews,  Mesh (Medical Subject Headings) : keywords  Taxonomy
  • 38. Summarizing most important links to discover everything you need ... Protein data Interpro (heavily integrated with EBI resources) http://www.interpro.org Gene data Entrez at NCBI : 'Entrez Gene' http://www.ncbi.nlm.nih.gov/Entrez/ Ebeye Search at EBI : excellent for cross-species http://www.ebi.ac.uk/ebisearch/
  • 39. Hold back your horses! Phew, where do I place this all?
  • 40. Bioinformatics is all about different data, as versatile as life itself Due to the strong cross-references between different databases, new databases and relevant info are rapidly integrated in existing databases. You can discover them by taking time to read the entries.
  • 41. New tools are emerging everyday to enable you to browse all data sources... BioGPS, all in one window!
  • 42. New tools are emerging everyday to enable you to browse all data sources...
  • 43. Integrative resources are increasingly being organised on a species basis  EMAGE database of in situ gene expression in mouse  OMIM Database of diseases in man  Websites providing an interface to integrate all this data is increasingly important  Often organized on a species basis − TAIR − Flybase − Wormbase
  • 44. The organizing biological data information by species By species, why? There is one biological information resource which stays more or less unchanged per species ...

Notes de l'éditeur

  1. 'translation', whereas another uses the phrase 'protein synthesis',
  2. 'translation', whereas another uses the phrase 'protein synthesis',
  3. 'translation', whereas another uses the phrase 'protein synthesis',
  4. GO hierarchy can be downloaded (obo format) GO Slim: selection of categories
  5. GO hierarchy can be downloaded (obo format) GO Slim: selection of categories
  6. Different types: Ribbon Cartoon Ball and stick Space filling
  7. Different types: Ribbon Cartoon Ball and stick Space filling