SlideShare a Scribd company logo
1 of 33
Understanding Genome
  -Biological Database Overview
               Part-1


            DAY-2, SESSION-1
                (25-10-2010)




                  Rajendra K. Labala
 Biomedical Informatics Centre, NICED, ICMR, Kolkata
Major Challenges with Genomes

 Scientific challenge of decoding a genome from its
  nucleotides to a set of functional elements
 Development of software which is capable of
  storing, manipulating, and evaluating genomes
 Challenge of providing comprehensive and
  informative access to a large amount of data in a
  user friendly way
The Genome Problem

 The problem with the genome (particularly human)
  is that it is “large, complicated, and opaque to
  analysis”
 Genome features to identify include:
    Genes: protein coding, RNA, pseudogenes
    Regulatory elements
    SNPs, repeats, etc….
Solutions

 Ensembl
 NCBI
 PATRIC




    You will learn
      Detailed overview
      Sequence related information/data mining!
The Ensembl Project

 Ensembl is a joint project between 3 organizations to
 develop a software system which produces and
 maintains automatic annotation on selected
 eukaryotic genomes
    EMBL- European Molecular Biology Laboratory
    EBI- European Bioinformatics Institute
    WTSI – Wellcome Trust Sanger Institute
What is Ensembl

 Ensembl is one of 3 main systems that are currently
 available that annotate and display genomic
 information
    Ensembl
        http://www.ensembl.org
    UCSC Genome Browser
        http://genome.ucsc.edu
    NCBI Genome Browser
        http://www.ncbi.nlm.nih.gov
 Public annotation of mammalian and other genomes
 Open source software
 Relational database system
Genomes and Annotation

 Ensembl does not assembly any genome project
 directly
    Works in relation with the sequencing centers that
     generate the genome assembly


 Ensembl provides high quality annotation for
 genomes that do not have existing annotation
    Works in relation with genomes that do have high quality
     annotation
Utilizes raw DNA
sequence data from public
sources

Creates a tracking
database (The “Ensembl
database”)

Joins the sequences -
based on a sequence
scaffold or “Golden Path”

Automatically finds
genes and other features
of the sequence

Associates sequence
and features with data
from other sources

Provides a publicly
                            Ensembl Genome Annotation
accessible web based
interface to the database
Ensembl
genomes
57
Species tree
Ensembl Software System

 Uses extensively BioPerl (www.bioperl.org)
 The free MySQL database
 Entire Ensembl code base is freely available under
  Apache open source license.
 Mainly written in Perl, extensions in C. Some
  viewers have been written in Java (e.g. Apollo).
 Software can be accessed by FTP
 Possible to set up a mirror of the entire Ensembl
  system.
Ensembl Databases

 4 Main Databases
    Ensembl Core Database
    Ensembl EST Database
    Ensembl Compara Database
    Ensembl Variation Database
 Ensembl uses MySQL to store information in relational
  databases
 Ensembl also utilizes APIs (Application Programme
  Interfaces)
    Serve as a connection between the databases and specific application
     programs
    Ensembl has Perl API and Java API
        Perl API more “complete” than Java API
Ensembl Databases

 Ensembl Core Databases
   Species specific Ensembl core databases that store
    genome sequence and annotation information
         Gene, transcript, and protein models that are annotated by the
          Ensembl automated genome analysis
     Databases also stores information about cDNA and
      protein alignments, as well as external references
         Ex. - NCBI Numbers AB012211
Ensembl Databases

 Ensembl Compara Database
   Is a multi-species database that stores the results of genome wide species
    comparisons
   The comparative genomic dataset allows for pairwise whole genome
    alignments
   The comparative proteomics dataset allows for orthologue predictions
    and protein family clusters
 Ensembl EST
   Species-specific Ensembl EST databases hold an independent EST gene set
    provided for all well-characterised species with a suitable amount of
    biological evidence. The layout of Ensembl EST Databases is identical to the
    Ensembl Core Database schema so that schema descriptions and API access
    are equally applicable
 Variation
   The large amount of genetic variation information is organised in a set of
    species-specific Ensembl Variation databases.
Data Mining with Ensembl

 BioMart
   Generic data management system built specifically for use in
    Ensembl
   Ensembl provide users the ability to conduct fast and powerful
    searches
   It simplifies the task of integrating external data sets (provided
    by the user) with the Ensembl databases


 Help & Documentation Link
   http://asia.ensembl.org/info/index.html
Data mining through BioMart

 Choose dataset
 Choose data to be retrieved (attributes)
 Narrow your dataset (filters)
BioMart
Dataset
Select your dataset
through the dropdown
list
Filters
Filter your query by the
given options
Attributes
Narrow your search
through these attributes
Try Yourself

 Retrieve all SNPs for „novel‟ human G-protein coupled receptor genes (GPCRs –
    IPR000276) on chromosome 2.
   Retrieve the sequences of the exons of the human MEFV gene in FASTA format.
   Retrieve the gene structure (i.e. start and end coordinates of exons) of the mouse
    gene ENSMUSG00000042351.
   Retrieve all human disease genes containing transmembrane domains located
    between p11.2 and q22.
   The file contains a list of probeset IDs from a microarray experiment using the
    Affymetrix array HG-U133 Plus 2.0 (human). Retrieve the 500 bp upstream of the
    transcripts matching these probeset IDs.
   Retrieve the sequences 5kb upstream of all human „known‟ genes between D1S2806
    and D1S464.
   Retrieve all human SNPs that have an ID from The SNP Consortium (TSC), from
    chromosome 6 between 15 Mb and 15.2 Mb, with 200 bases flanking sequence.
   Retrieve the mouse homologues of Homo sapiens genes CASP1, CASP2, CASP3, and
    CASP4.
NCBI

 Genome projects
   After DNA sequencing, several contigs were generated and are
    submitted to NCBI through WGS Submissions
 Whole Genome Shotgun Sequences
 WGS List
 Download (GenBank format  WGS  FASTA)
NCBI Genome
Project
Go for WGS Sequences
WGS
Home Page of WGS
where you can find the
WGS project lists
GenBank
format file for
the WGS
Click on the link for
detailed view of the
data
WGS project
page
Check out the FASTA
format
NCBI FTP

 For downloading the sequences/genomes in
 different required formats.
    FAA (amino acid file in fasta format)
    FNA (nucleic acid file in fasta format)
    FFN (Coding Sequences in fasta format)
    GBK (GenBank format)
    PTT (CDS file in tab delimited format)
NCBI FTP
Genome files
in different
formats
FAA (amino acid file in
fasta format)

FNA (nucleic acid file in
fasta format)

FFN (Coding Sequences
in fasta format)

GBK (GenBank format)

PTT (CDS file in tab
delimited format)
PATRIC

 WGS annotations download
 For details visit the website and the FAQ page


 http://www.patricbrc.org/portal/portal/patric/Hom
 e
PATRIC
home/search
page
http://www.patricbrc.o
rg/portal/portal/patric/
Home
CDS links
Check out the CDS links
for the searched
organism
Downloading
Check out different
downloading options
Exercise

 Check out all the databases thoroughly according to
 the given problem mentioned in “part-1.doc” file of
 “day-2” folder (in desktop).

More Related Content

What's hot

Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. BasicsMobiliuz
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identificationabhinav vedanbhatla
 
Macromolecular interaction
Macromolecular interactionMacromolecular interaction
Macromolecular interactionCharthaGaglani
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysisPinky Vincent
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular DockingSatarupa Deb
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABElisabeth Ortega
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsVikram Aditya
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure predictionSamvartika Majumdar
 
protein sequence analysis
protein sequence analysisprotein sequence analysis
protein sequence analysisRamikaSingla
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Jarle Pahr
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure DeterminationAmjad Ibrahim
 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Keiji Takamoto
 
2015 New trans-stilbene derivatives with large TPA values
2015 New trans-stilbene derivatives with large TPA values2015 New trans-stilbene derivatives with large TPA values
2015 New trans-stilbene derivatives with large TPA valuesvarun Kundi
 

What's hot (19)

MD Simulation
MD SimulationMD Simulation
MD Simulation
 
Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. Basics
 
Swaati pro sa web
Swaati pro sa webSwaati pro sa web
Swaati pro sa web
 
Protein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on RosettaProtein structure prediction with a focus on Rosetta
Protein structure prediction with a focus on Rosetta
 
Distributed approach for Peptide Identification
Distributed approach for Peptide IdentificationDistributed approach for Peptide Identification
Distributed approach for Peptide Identification
 
qring_study
qring_studyqring_study
qring_study
 
Macromolecular interaction
Macromolecular interactionMacromolecular interaction
Macromolecular interaction
 
Conformational analysis
Conformational analysisConformational analysis
Conformational analysis
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UAB
 
Molecular Docking Using Autodock Tools
Molecular Docking Using Autodock ToolsMolecular Docking Using Autodock Tools
Molecular Docking Using Autodock Tools
 
Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
MOLECULAR DOCKING
MOLECULAR DOCKINGMOLECULAR DOCKING
MOLECULAR DOCKING
 
Introduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEASTIntroduction to Bayesian phylogenetics and BEAST
Introduction to Bayesian phylogenetics and BEAST
 
protein sequence analysis
protein sequence analysisprotein sequence analysis
protein sequence analysis
 
Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...Project report: Investigating the effect of cellular objectives on genome-sca...
Project report: Investigating the effect of cellular objectives on genome-sca...
 
Protein Structure Determination
Protein Structure DeterminationProtein Structure Determination
Protein Structure Determination
 
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
Theoretical evaluation of shotgun proteomic analysis strategies; Peptide obse...
 
2015 New trans-stilbene derivatives with large TPA values
2015 New trans-stilbene derivatives with large TPA values2015 New trans-stilbene derivatives with large TPA values
2015 New trans-stilbene derivatives with large TPA values
 

Viewers also liked

Sistema nervioso-2
Sistema nervioso-2Sistema nervioso-2
Sistema nervioso-2sergimancera
 
La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...
La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...
La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...Irekia - EJGV
 
Similarity and difference factors of dissolution
Similarity and difference factors of dissolutionSimilarity and difference factors of dissolution
Similarity and difference factors of dissolutionJessica Fernandes
 
Sponsorship Research & ROI
Sponsorship Research & ROISponsorship Research & ROI
Sponsorship Research & ROINicholas Cameron
 
CTI101 笔记:修辞手法
CTI101 笔记:修辞手法CTI101 笔记:修辞手法
CTI101 笔记:修辞手法Cindy Ong
 
20130305 GB les 5
20130305 GB les 520130305 GB les 5
20130305 GB les 5mleeuwen
 
Liceo consejo Municipal Gestor
Liceo consejo Municipal GestorLiceo consejo Municipal Gestor
Liceo consejo Municipal GestorI.E. Santo Domingo
 
Curacao The Ultimate Guide To The World's Favourite Liqueur Flavour
Curacao The Ultimate Guide To The World's Favourite Liqueur FlavourCuracao The Ultimate Guide To The World's Favourite Liqueur Flavour
Curacao The Ultimate Guide To The World's Favourite Liqueur FlavourPhilip Duff
 
κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238
κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238
κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238Eirini Stauropoulou
 
Estudo estatístico de produção e venda dos produtos agropecuários no período ...
Estudo estatístico de produção e venda dos produtos agropecuários no período ...Estudo estatístico de produção e venda dos produtos agropecuários no período ...
Estudo estatístico de produção e venda dos produtos agropecuários no período ...Pedro Kangombe
 

Viewers also liked (20)

Unidad 4 módulo 1
Unidad 4 módulo 1Unidad 4 módulo 1
Unidad 4 módulo 1
 
Expand your time margins
Expand your time marginsExpand your time margins
Expand your time margins
 
Sistema nervioso-2
Sistema nervioso-2Sistema nervioso-2
Sistema nervioso-2
 
Diapositivas tics
Diapositivas ticsDiapositivas tics
Diapositivas tics
 
La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...
La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...
La ciudad de los niños y de las niñas - Unidad Didáctica de Movilidad Segura ...
 
Dispositivos de red
Dispositivos de redDispositivos de red
Dispositivos de red
 
Subredes (Subneting)
Subredes (Subneting)Subredes (Subneting)
Subredes (Subneting)
 
Facial fractures the upper face
Facial fractures   the upper faceFacial fractures   the upper face
Facial fractures the upper face
 
Similarity and difference factors of dissolution
Similarity and difference factors of dissolutionSimilarity and difference factors of dissolution
Similarity and difference factors of dissolution
 
Sponsorship Research & ROI
Sponsorship Research & ROISponsorship Research & ROI
Sponsorship Research & ROI
 
CTI101 笔记:修辞手法
CTI101 笔记:修辞手法CTI101 笔记:修辞手法
CTI101 笔记:修辞手法
 
Programas Delegacionales de Desarrollo 02
Programas Delegacionales de Desarrollo 02Programas Delegacionales de Desarrollo 02
Programas Delegacionales de Desarrollo 02
 
Insertar
InsertarInsertar
Insertar
 
20130305 GB les 5
20130305 GB les 520130305 GB les 5
20130305 GB les 5
 
El aborto
El abortoEl aborto
El aborto
 
Liceo consejo Municipal Gestor
Liceo consejo Municipal GestorLiceo consejo Municipal Gestor
Liceo consejo Municipal Gestor
 
Geogebra
GeogebraGeogebra
Geogebra
 
Curacao The Ultimate Guide To The World's Favourite Liqueur Flavour
Curacao The Ultimate Guide To The World's Favourite Liqueur FlavourCuracao The Ultimate Guide To The World's Favourite Liqueur Flavour
Curacao The Ultimate Guide To The World's Favourite Liqueur Flavour
 
κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238
κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238
κωνσταντινα τσαφου αμ 1049271_ειρηνη_σταυροπουλου_αμ 1049238
 
Estudo estatístico de produção e venda dos produtos agropecuários no período ...
Estudo estatístico de produção e venda dos produtos agropecuários no período ...Estudo estatístico de produção e venda dos produtos agropecuários no período ...
Estudo estatístico de produção e venda dos produtos agropecuários no período ...
 

Similar to Understanding Genome

Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformaticsAtai Rabby
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformaticsVinaKhan1
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuKAUSHAL SAHU
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisdrelamuruganvet
 
Sequencedatabases
SequencedatabasesSequencedatabases
SequencedatabasesAbhik Seal
 
Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3AhmedAbdElMoniem35
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Sreekanth Gali
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data miningSangeeta Das
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...Elufer Akram
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 

Similar to Understanding Genome (20)

Ensembl genome
Ensembl genomeEnsembl genome
Ensembl genome
 
Ncbi
NcbiNcbi
Ncbi
 
NCBI
NCBINCBI
NCBI
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Database in bioinformatics
Database in bioinformaticsDatabase in bioinformatics
Database in bioinformatics
 
Bioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahuBioinformatic, and tools by kk sahu
Bioinformatic, and tools by kk sahu
 
Whole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysisWhole genome sequencing of bacteria & analysis
Whole genome sequencing of bacteria & analysis
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
Genome comparision
Genome comparisionGenome comparision
Genome comparision
 
Sequencedatabases
SequencedatabasesSequencedatabases
Sequencedatabases
 
Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 
Proteome databases
Proteome databasesProteome databases
Proteome databases
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02Biodatabases 101220022654-phpapp02
Biodatabases 101220022654-phpapp02
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
Article
ArticleArticle
Article
 
Presentation on Biological database By Elufer Akram @ University Of Science ...
Presentation on Biological database  By Elufer Akram @ University Of Science ...Presentation on Biological database  By Elufer Akram @ University Of Science ...
Presentation on Biological database By Elufer Akram @ University Of Science ...
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 

Recently uploaded

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
 

Recently uploaded (20)

ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
 

Understanding Genome

  • 1. Understanding Genome -Biological Database Overview Part-1 DAY-2, SESSION-1 (25-10-2010) Rajendra K. Labala Biomedical Informatics Centre, NICED, ICMR, Kolkata
  • 2. Major Challenges with Genomes  Scientific challenge of decoding a genome from its nucleotides to a set of functional elements  Development of software which is capable of storing, manipulating, and evaluating genomes  Challenge of providing comprehensive and informative access to a large amount of data in a user friendly way
  • 3. The Genome Problem  The problem with the genome (particularly human) is that it is “large, complicated, and opaque to analysis”  Genome features to identify include:  Genes: protein coding, RNA, pseudogenes  Regulatory elements  SNPs, repeats, etc….
  • 4. Solutions  Ensembl  NCBI  PATRIC  You will learn  Detailed overview  Sequence related information/data mining!
  • 5. The Ensembl Project  Ensembl is a joint project between 3 organizations to develop a software system which produces and maintains automatic annotation on selected eukaryotic genomes  EMBL- European Molecular Biology Laboratory  EBI- European Bioinformatics Institute  WTSI – Wellcome Trust Sanger Institute
  • 6. What is Ensembl  Ensembl is one of 3 main systems that are currently available that annotate and display genomic information  Ensembl  http://www.ensembl.org  UCSC Genome Browser  http://genome.ucsc.edu  NCBI Genome Browser  http://www.ncbi.nlm.nih.gov  Public annotation of mammalian and other genomes  Open source software  Relational database system
  • 7. Genomes and Annotation  Ensembl does not assembly any genome project directly  Works in relation with the sequencing centers that generate the genome assembly  Ensembl provides high quality annotation for genomes that do not have existing annotation  Works in relation with genomes that do have high quality annotation
  • 8. Utilizes raw DNA sequence data from public sources Creates a tracking database (The “Ensembl database”) Joins the sequences - based on a sequence scaffold or “Golden Path” Automatically finds genes and other features of the sequence Associates sequence and features with data from other sources Provides a publicly Ensembl Genome Annotation accessible web based interface to the database
  • 11. Ensembl Software System  Uses extensively BioPerl (www.bioperl.org)  The free MySQL database  Entire Ensembl code base is freely available under Apache open source license.  Mainly written in Perl, extensions in C. Some viewers have been written in Java (e.g. Apollo).  Software can be accessed by FTP  Possible to set up a mirror of the entire Ensembl system.
  • 12. Ensembl Databases  4 Main Databases  Ensembl Core Database  Ensembl EST Database  Ensembl Compara Database  Ensembl Variation Database  Ensembl uses MySQL to store information in relational databases  Ensembl also utilizes APIs (Application Programme Interfaces)  Serve as a connection between the databases and specific application programs  Ensembl has Perl API and Java API  Perl API more “complete” than Java API
  • 13. Ensembl Databases  Ensembl Core Databases  Species specific Ensembl core databases that store genome sequence and annotation information  Gene, transcript, and protein models that are annotated by the Ensembl automated genome analysis  Databases also stores information about cDNA and protein alignments, as well as external references  Ex. - NCBI Numbers AB012211
  • 14. Ensembl Databases  Ensembl Compara Database  Is a multi-species database that stores the results of genome wide species comparisons  The comparative genomic dataset allows for pairwise whole genome alignments  The comparative proteomics dataset allows for orthologue predictions and protein family clusters  Ensembl EST  Species-specific Ensembl EST databases hold an independent EST gene set provided for all well-characterised species with a suitable amount of biological evidence. The layout of Ensembl EST Databases is identical to the Ensembl Core Database schema so that schema descriptions and API access are equally applicable  Variation  The large amount of genetic variation information is organised in a set of species-specific Ensembl Variation databases.
  • 15. Data Mining with Ensembl  BioMart  Generic data management system built specifically for use in Ensembl  Ensembl provide users the ability to conduct fast and powerful searches  It simplifies the task of integrating external data sets (provided by the user) with the Ensembl databases  Help & Documentation Link  http://asia.ensembl.org/info/index.html
  • 16. Data mining through BioMart  Choose dataset  Choose data to be retrieved (attributes)  Narrow your dataset (filters)
  • 18. Filters Filter your query by the given options
  • 20. Try Yourself  Retrieve all SNPs for „novel‟ human G-protein coupled receptor genes (GPCRs – IPR000276) on chromosome 2.  Retrieve the sequences of the exons of the human MEFV gene in FASTA format.  Retrieve the gene structure (i.e. start and end coordinates of exons) of the mouse gene ENSMUSG00000042351.  Retrieve all human disease genes containing transmembrane domains located between p11.2 and q22.  The file contains a list of probeset IDs from a microarray experiment using the Affymetrix array HG-U133 Plus 2.0 (human). Retrieve the 500 bp upstream of the transcripts matching these probeset IDs.  Retrieve the sequences 5kb upstream of all human „known‟ genes between D1S2806 and D1S464.  Retrieve all human SNPs that have an ID from The SNP Consortium (TSC), from chromosome 6 between 15 Mb and 15.2 Mb, with 200 bases flanking sequence.  Retrieve the mouse homologues of Homo sapiens genes CASP1, CASP2, CASP3, and CASP4.
  • 21. NCBI  Genome projects  After DNA sequencing, several contigs were generated and are submitted to NCBI through WGS Submissions  Whole Genome Shotgun Sequences  WGS List  Download (GenBank format  WGS  FASTA)
  • 22. NCBI Genome Project Go for WGS Sequences
  • 23. WGS Home Page of WGS where you can find the WGS project lists
  • 24. GenBank format file for the WGS Click on the link for detailed view of the data
  • 25. WGS project page Check out the FASTA format
  • 26. NCBI FTP  For downloading the sequences/genomes in different required formats.  FAA (amino acid file in fasta format)  FNA (nucleic acid file in fasta format)  FFN (Coding Sequences in fasta format)  GBK (GenBank format)  PTT (CDS file in tab delimited format)
  • 28. Genome files in different formats FAA (amino acid file in fasta format) FNA (nucleic acid file in fasta format) FFN (Coding Sequences in fasta format) GBK (GenBank format) PTT (CDS file in tab delimited format)
  • 29. PATRIC  WGS annotations download  For details visit the website and the FAQ page  http://www.patricbrc.org/portal/portal/patric/Hom e
  • 31. CDS links Check out the CDS links for the searched organism
  • 33. Exercise  Check out all the databases thoroughly according to the given problem mentioned in “part-1.doc” file of “day-2” folder (in desktop).