SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Bioinformatics in medicine
today
David Montaner
dmontaner@cipf.es
Centro de Investigación Príncipe Felipe
Institute of Computational Genomics
9 May 2013
in Valencia
David Montaner Bioinformatics in medicine 1/26
Genomics
“Progress in science depends on new techniques, new
discoveries and new ideas, probably in that order.”
Sydney Brenner, 1980
Microarray devices and high-throughput sequencing allow us
measuring thousands or millions of genomic characteristics.
David Montaner Bioinformatics in medicine 2/26
Genomics vs. genetics
Genetics:
• Single genes are responsible for biological changes.
• one gene → one hypothesis → one p-value → conclusions
Genomics:
• Genes or genomic features act together to produce
biological changes.
• many genes → many hypothesis → many p-value →
→ more data analysis
• Computational support is needed even for drawing
conclusions
David Montaner Bioinformatics in medicine 3/26
Genomic numbers
Microarray:
• 30.000 genes
• 2 million SNPs
• 100 Mb
Measured features:
• genes, isoforms
• SNPs, Polymorphisms
• IN-DELS
• loss of heterozygosity
• methylation
• copy number alterations
NGS:
• 30.000 genes
• 30.000 transcripts
• 20 million SNPs
• 10-100 GB
Registered information:
• Genomic characteristics:
position, chromosome ...
• Biological function
• Disease association
• miRNA targets
David Montaner Bioinformatics in medicine 4/26
Genomic databases
Nucleic Acid Research lists +1500 online databases!
http://www.oxfordjournals.org/nar/database/c
• Many different databases for each category, which should I
use?
• No standards: different IDs, methods, servers, formats, ...
• Lack of international initiatives, many local and small
databases
• Different gene IDs, more than 50
• In vivo vs in silico databases
David Montaner Bioinformatics in medicine 5/26
Biological databases (Wikipedia)
1 Primary nucleotide
sequence databases
2 Metadatabases
3 Genome databases
4 Protein sequence
databases
5 Proteomics databases
6 Protein structure
databases
7 Protein model databases
8 RNA databases
9 Carbohydrate structure
databases
10 Protein-protein interactions
11 Signal transduction
pathway databases
12 Metabolic pathway
databases
13 Experimental data
repositories (Microarrays
NGS, Sanger)
14 Exosomal databases
15 Mathematical model
databases
16 PCR / real time PCR
primer databases
17 Specialized databases
18 Taxonomic databases
19 Wiki-style databasesDavid Montaner Bioinformatics in medicine 6/26
Primary nucleotide sequence
databases
Contain any kind of nucleotide sequences, form genes to
genomes.
The International Nucleotide Sequence Database (INSD)
Collaboration:
• GenBank
National Center for Biotechnology Information (NCBI)
• European Nucleotide Archive (ENA)
European Bioinformatics Institute (EBI)
• DNA Data Bank of Japan (DDBJ)
David Montaner Bioinformatics in medicine 7/26
GenBank
Primary nucleotide sequence databases
• available on the NCBI ftp site:
http://www.ncbi.nlm.nih.gov/Ftp/
• A new release is made every two months.
• 3 types of entries:
• CoreNucleotide (the main collection)
• dbEST (Expressed Sequence Tags)
• dbGSS (Genome Survey Sequences)
Access:
• Search for sequence identifiers using Entrez Nucleotide:
http://www.ncbi.nlm.nih.gov/nucleotide/
• Align GenBank sequences to a query sequence using
BLAST (Basic Local Alignment Search Tool).
http://blast.ncbi.nlm.nih.gov/Blast.cgi
• Several other e-utilities (see book)
See an example of a GenBank record.
David Montaner Bioinformatics in medicine 8/26
Metadatabases
• Collect and organize data from primary nucleotide
sequence databases and may other resources.
• Make the information available in a convenient format and
provide data handling resources: web pages, application
programming interface (API) …
• Focus on particular species, diseases …
Examples
• Entrez: searches through almost all NCBI resources.
http://www.ncbi.nlm.nih.gov/sites/gquery
• GeneCards: provides genomic, proteomic, transcriptomic,
genetic and functional information for human genes (known
and predicted)
http://www.genecards.org/
David Montaner Bioinformatics in medicine 9/26
Entrez
Metadatabases
• Searches through almost all NCBI resources.
• Entrez search page: http://www.ncbi.nlm.nih.gov/sites/gquery
• queries can be saved if you have a a MyNCBI account
http://www.ncbi.nlm.nih.gov/
David Montaner Bioinformatics in medicine 10/26
Genome databases
Collect genome sequences and annotation (specification about
genes) for particular organisms, and try to improve them:
• Data curation.
• Complete missing information using insilico methods.
• Generate new relational organization.
• Complement feature IDs.
• Provide “easy” access, visualization …
Examples
• Ensembl: automatic annotation on selected eukaryote
genomes.
• UCSC Genome Browser: reference sequence and working
draft assemblies for a large collection of genomes
• Wormbase: genome of the model organism C.elegans.
David Montaner Bioinformatics in medicine 11/26
Ensembl
Genome databases
• Ensembl is a joint project between European Bioinformatics
Institute (EBI) the European Molecular Biology Laboratory
(EMBL) and the Wellcome Trust Sanger Institute.
• Develop a software system which produces and maintains
automatic annotation on selected vertebrate and
eukaryote genomes.
• http://www.ensembl.org
David Montaner Bioinformatics in medicine 12/26
UCSC Genome Browser
Genome databases
• UCSC: University of California, Santa Cruz.
• This site contains the reference sequence and working
draft assemblies for a large collection of genomes.
• http://genome.ucsc.edu/
David Montaner Bioinformatics in medicine 13/26
Protein sequence databases
• Most times proteins are the final unit of interest to research.
• There is a direct conversion from DNA/RNA sequences to
protein sequences.
• Gene IDs and protein IDs are equivalently used by
researchers (biologists not bioinformaticians …)
Examples
• UniProt: Universal Protein Resource (EBI)
• Swiss-Prot (Swiss Institute of Bioinformatics)
• InterPro Classifies proteins into families and predicts the
presence of domains and sites.
• Pfam Protein families database of alignments and HMMs
(Sanger Institute)
David Montaner Bioinformatics in medicine 14/26
RNA databases
• Contain information about RNA molecules.
• Most of them regarding gene regulatory factors. (Gene
information is usually in other repositories).
Examples
• mirBase: microRNAs
http://www.mirbase.org/
• TRANSFAC: transcription factors in eukaryote (Proprietary
database).
• JASPAR: transcription factor binding sites for eukaryote
(Open access, curated, non-redundant).
http://jaspar.genereg.net/
David Montaner Bioinformatics in medicine 15/26
Protein-protein interactions
• Proteins are the main functional units.
• But they do not work in isolation.
• Pretty useless at the moment but promising in the future …
• some information is experimental, but most of it is
generated insilico.
Examples
• IntAct: protein–small molecule
and protein–nucleic acid
interactions.
• BIND: Biomolecular Interaction
Network Database.
David Montaner Bioinformatics in medicine 16/26
Signal transduction pathway
databases
& Metabolic pathway databases
• Information about how genes (or proteins) interact among
them.
• not only physical interactions …
Examples
• Reactome: free online database of biological pathways.
http://www.reactome.org
• KEGG: Kyoto Encyclopedia of Genes and Genomes.
Metabolic pathways.
http://www.genome.jp/kegg/pathway.html
David Montaner Bioinformatics in medicine 17/26
KEGG
Metabolic pathway databases
David Montaner Bioinformatics in medicine 18/26
Experimental data repositories
Contain Microarray, NGS, Sanger, and other experimental high
throughput data.
• GEO: Gene Expression Omnibus (NCBI)
http://www.ncbi.nlm.nih.gov/geo/
• ArrayExpress: database of functional genomics
experiments including (EBI)
http://www.ebi.ac.uk/arrayexpress/
• The Cancer Genome Atlas (TCGA): Data on different
cancer related tissues.
http://cancergenome.nih.gov/
David Montaner Bioinformatics in medicine 19/26
Bioinformatics
Training
• Biology 1/3
• Statistics 1/3
• Computer science 1/3 ←−
Efficiently combine:
• Experimental information
• Database registered knowledge
Time and resources:
• As in the wet lab
David Montaner Bioinformatics in medicine 20/26
Example
David Montaner Bioinformatics in medicine 21/26
Example I
Autistic children
1 (microarray) NGS data processing
• data quality control, filtering...
• map against reference genome
• CNV calling
2 CNV filtering
• just 75 rare de novo CNV events (not registered in
databases)
• filter out the long ones
• keep the ones that contain genes
David Montaner Bioinformatics in medicine 22/26
Example II
3 move to the gene level
• 47 loci in total affecting 433 human genes
4 Building the background likelihood network
• GO annotations
• KEGG pathways
• InterPro domains
• protein-proteins interactions. Databases: BIND, BioGRID,
DIP, HPRD, InNetDB, IntAct, BiGG, MINT, and MIPS
• sequence homology between the gene pair (BLAST)
David Montaner Bioinformatics in medicine 23/26
Example III
5 Search for high scoring clusters affected by CNVs
6 Evaluating significance of cluster scores:
10.000 simulations
David Montaner Bioinformatics in medicine 24/26
Example IV
7 Functional characterization of the identified network
8 And, finally, draw conclusions
David Montaner Bioinformatics in medicine 25/26
Questions
Thanks
David Montaner Bioinformatics in medicine 26/26

Contenu connexe

Tendances

Tendances (20)

Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Careers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and JobsCareers in bioinformatics, Scope, Skills and Jobs
Careers in bioinformatics, Scope, Skills and Jobs
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics in present and its future
Bioinformatics in present and its futureBioinformatics in present and its future
Bioinformatics in present and its future
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Genome data management
Genome data managementGenome data management
Genome data management
 

Similaire à Bioinformatics Introduction

Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsphilmaweb
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016Christoph Steinbeck
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxVandana Yadav03
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomaticsnguyenpg
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptxSwarup Malakar
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
21 lecture genome_and_evolution
21 lecture genome_and_evolution21 lecture genome_and_evolution
21 lecture genome_and_evolutionveneethmathew
 
Introduction to bioinformatics.pptx
Introduction to bioinformatics.pptxIntroduction to bioinformatics.pptx
Introduction to bioinformatics.pptxMortezaGhandadi1
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary databaseKAUSHAL SAHU
 

Similaire à Bioinformatics Introduction (20)

Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
Protocols for genomics and proteomics
Protocols for genomics and proteomics Protocols for genomics and proteomics
Protocols for genomics and proteomics
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016World-wide data exchange in metabolomics, Wageningen, October 2016
World-wide data exchange in metabolomics, Wageningen, October 2016
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
 
Primary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptxPrimary Bioinformatics Database.pptx
Primary Bioinformatics Database.pptx
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
Basic of bioinformatics
Basic of bioinformaticsBasic of bioinformatics
Basic of bioinformatics
 
Primary Databases.pptx
Primary Databases.pptxPrimary Databases.pptx
Primary Databases.pptx
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Intro bioinfo
Intro bioinfoIntro bioinfo
Intro bioinfo
 
21 lecture genome_and_evolution
21 lecture genome_and_evolution21 lecture genome_and_evolution
21 lecture genome_and_evolution
 
Introduction to bioinformatics.pptx
Introduction to bioinformatics.pptxIntroduction to bioinformatics.pptx
Introduction to bioinformatics.pptx
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 

Plus de David Montaner

100,000 Genomes Project.
100,000 Genomes Project.100,000 Genomes Project.
100,000 Genomes Project.David Montaner
 
dmontaner at cipf_2014
dmontaner at cipf_2014dmontaner at cipf_2014
dmontaner at cipf_2014David Montaner
 
Biostatistics Unit at CIPF
Biostatistics Unit at CIPFBiostatistics Unit at CIPF
Biostatistics Unit at CIPFDavid Montaner
 
Dmontaner dissertation slides
Dmontaner dissertation slidesDmontaner dissertation slides
Dmontaner dissertation slidesDavid Montaner
 
Genometra Empresas Innovadoras Valencia
Genometra Empresas Innovadoras ValenciaGenometra Empresas Innovadoras Valencia
Genometra Empresas Innovadoras ValenciaDavid Montaner
 
Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...
Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...
Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...David Montaner
 

Plus de David Montaner (6)

100,000 Genomes Project.
100,000 Genomes Project.100,000 Genomes Project.
100,000 Genomes Project.
 
dmontaner at cipf_2014
dmontaner at cipf_2014dmontaner at cipf_2014
dmontaner at cipf_2014
 
Biostatistics Unit at CIPF
Biostatistics Unit at CIPFBiostatistics Unit at CIPF
Biostatistics Unit at CIPF
 
Dmontaner dissertation slides
Dmontaner dissertation slidesDmontaner dissertation slides
Dmontaner dissertation slides
 
Genometra Empresas Innovadoras Valencia
Genometra Empresas Innovadoras ValenciaGenometra Empresas Innovadoras Valencia
Genometra Empresas Innovadoras Valencia
 
Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...
Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...
Seguimiento y Evaluación OnLine de Trabajos de Prácticas en Asignaturas de Es...
 

Bioinformatics Introduction

  • 1. Bioinformatics in medicine today David Montaner dmontaner@cipf.es Centro de Investigación Príncipe Felipe Institute of Computational Genomics 9 May 2013 in Valencia David Montaner Bioinformatics in medicine 1/26
  • 2. Genomics “Progress in science depends on new techniques, new discoveries and new ideas, probably in that order.” Sydney Brenner, 1980 Microarray devices and high-throughput sequencing allow us measuring thousands or millions of genomic characteristics. David Montaner Bioinformatics in medicine 2/26
  • 3. Genomics vs. genetics Genetics: • Single genes are responsible for biological changes. • one gene → one hypothesis → one p-value → conclusions Genomics: • Genes or genomic features act together to produce biological changes. • many genes → many hypothesis → many p-value → → more data analysis • Computational support is needed even for drawing conclusions David Montaner Bioinformatics in medicine 3/26
  • 4. Genomic numbers Microarray: • 30.000 genes • 2 million SNPs • 100 Mb Measured features: • genes, isoforms • SNPs, Polymorphisms • IN-DELS • loss of heterozygosity • methylation • copy number alterations NGS: • 30.000 genes • 30.000 transcripts • 20 million SNPs • 10-100 GB Registered information: • Genomic characteristics: position, chromosome ... • Biological function • Disease association • miRNA targets David Montaner Bioinformatics in medicine 4/26
  • 5. Genomic databases Nucleic Acid Research lists +1500 online databases! http://www.oxfordjournals.org/nar/database/c • Many different databases for each category, which should I use? • No standards: different IDs, methods, servers, formats, ... • Lack of international initiatives, many local and small databases • Different gene IDs, more than 50 • In vivo vs in silico databases David Montaner Bioinformatics in medicine 5/26
  • 6. Biological databases (Wikipedia) 1 Primary nucleotide sequence databases 2 Metadatabases 3 Genome databases 4 Protein sequence databases 5 Proteomics databases 6 Protein structure databases 7 Protein model databases 8 RNA databases 9 Carbohydrate structure databases 10 Protein-protein interactions 11 Signal transduction pathway databases 12 Metabolic pathway databases 13 Experimental data repositories (Microarrays NGS, Sanger) 14 Exosomal databases 15 Mathematical model databases 16 PCR / real time PCR primer databases 17 Specialized databases 18 Taxonomic databases 19 Wiki-style databasesDavid Montaner Bioinformatics in medicine 6/26
  • 7. Primary nucleotide sequence databases Contain any kind of nucleotide sequences, form genes to genomes. The International Nucleotide Sequence Database (INSD) Collaboration: • GenBank National Center for Biotechnology Information (NCBI) • European Nucleotide Archive (ENA) European Bioinformatics Institute (EBI) • DNA Data Bank of Japan (DDBJ) David Montaner Bioinformatics in medicine 7/26
  • 8. GenBank Primary nucleotide sequence databases • available on the NCBI ftp site: http://www.ncbi.nlm.nih.gov/Ftp/ • A new release is made every two months. • 3 types of entries: • CoreNucleotide (the main collection) • dbEST (Expressed Sequence Tags) • dbGSS (Genome Survey Sequences) Access: • Search for sequence identifiers using Entrez Nucleotide: http://www.ncbi.nlm.nih.gov/nucleotide/ • Align GenBank sequences to a query sequence using BLAST (Basic Local Alignment Search Tool). http://blast.ncbi.nlm.nih.gov/Blast.cgi • Several other e-utilities (see book) See an example of a GenBank record. David Montaner Bioinformatics in medicine 8/26
  • 9. Metadatabases • Collect and organize data from primary nucleotide sequence databases and may other resources. • Make the information available in a convenient format and provide data handling resources: web pages, application programming interface (API) … • Focus on particular species, diseases … Examples • Entrez: searches through almost all NCBI resources. http://www.ncbi.nlm.nih.gov/sites/gquery • GeneCards: provides genomic, proteomic, transcriptomic, genetic and functional information for human genes (known and predicted) http://www.genecards.org/ David Montaner Bioinformatics in medicine 9/26
  • 10. Entrez Metadatabases • Searches through almost all NCBI resources. • Entrez search page: http://www.ncbi.nlm.nih.gov/sites/gquery • queries can be saved if you have a a MyNCBI account http://www.ncbi.nlm.nih.gov/ David Montaner Bioinformatics in medicine 10/26
  • 11. Genome databases Collect genome sequences and annotation (specification about genes) for particular organisms, and try to improve them: • Data curation. • Complete missing information using insilico methods. • Generate new relational organization. • Complement feature IDs. • Provide “easy” access, visualization … Examples • Ensembl: automatic annotation on selected eukaryote genomes. • UCSC Genome Browser: reference sequence and working draft assemblies for a large collection of genomes • Wormbase: genome of the model organism C.elegans. David Montaner Bioinformatics in medicine 11/26
  • 12. Ensembl Genome databases • Ensembl is a joint project between European Bioinformatics Institute (EBI) the European Molecular Biology Laboratory (EMBL) and the Wellcome Trust Sanger Institute. • Develop a software system which produces and maintains automatic annotation on selected vertebrate and eukaryote genomes. • http://www.ensembl.org David Montaner Bioinformatics in medicine 12/26
  • 13. UCSC Genome Browser Genome databases • UCSC: University of California, Santa Cruz. • This site contains the reference sequence and working draft assemblies for a large collection of genomes. • http://genome.ucsc.edu/ David Montaner Bioinformatics in medicine 13/26
  • 14. Protein sequence databases • Most times proteins are the final unit of interest to research. • There is a direct conversion from DNA/RNA sequences to protein sequences. • Gene IDs and protein IDs are equivalently used by researchers (biologists not bioinformaticians …) Examples • UniProt: Universal Protein Resource (EBI) • Swiss-Prot (Swiss Institute of Bioinformatics) • InterPro Classifies proteins into families and predicts the presence of domains and sites. • Pfam Protein families database of alignments and HMMs (Sanger Institute) David Montaner Bioinformatics in medicine 14/26
  • 15. RNA databases • Contain information about RNA molecules. • Most of them regarding gene regulatory factors. (Gene information is usually in other repositories). Examples • mirBase: microRNAs http://www.mirbase.org/ • TRANSFAC: transcription factors in eukaryote (Proprietary database). • JASPAR: transcription factor binding sites for eukaryote (Open access, curated, non-redundant). http://jaspar.genereg.net/ David Montaner Bioinformatics in medicine 15/26
  • 16. Protein-protein interactions • Proteins are the main functional units. • But they do not work in isolation. • Pretty useless at the moment but promising in the future … • some information is experimental, but most of it is generated insilico. Examples • IntAct: protein–small molecule and protein–nucleic acid interactions. • BIND: Biomolecular Interaction Network Database. David Montaner Bioinformatics in medicine 16/26
  • 17. Signal transduction pathway databases & Metabolic pathway databases • Information about how genes (or proteins) interact among them. • not only physical interactions … Examples • Reactome: free online database of biological pathways. http://www.reactome.org • KEGG: Kyoto Encyclopedia of Genes and Genomes. Metabolic pathways. http://www.genome.jp/kegg/pathway.html David Montaner Bioinformatics in medicine 17/26
  • 18. KEGG Metabolic pathway databases David Montaner Bioinformatics in medicine 18/26
  • 19. Experimental data repositories Contain Microarray, NGS, Sanger, and other experimental high throughput data. • GEO: Gene Expression Omnibus (NCBI) http://www.ncbi.nlm.nih.gov/geo/ • ArrayExpress: database of functional genomics experiments including (EBI) http://www.ebi.ac.uk/arrayexpress/ • The Cancer Genome Atlas (TCGA): Data on different cancer related tissues. http://cancergenome.nih.gov/ David Montaner Bioinformatics in medicine 19/26
  • 20. Bioinformatics Training • Biology 1/3 • Statistics 1/3 • Computer science 1/3 ←− Efficiently combine: • Experimental information • Database registered knowledge Time and resources: • As in the wet lab David Montaner Bioinformatics in medicine 20/26
  • 22. Example I Autistic children 1 (microarray) NGS data processing • data quality control, filtering... • map against reference genome • CNV calling 2 CNV filtering • just 75 rare de novo CNV events (not registered in databases) • filter out the long ones • keep the ones that contain genes David Montaner Bioinformatics in medicine 22/26
  • 23. Example II 3 move to the gene level • 47 loci in total affecting 433 human genes 4 Building the background likelihood network • GO annotations • KEGG pathways • InterPro domains • protein-proteins interactions. Databases: BIND, BioGRID, DIP, HPRD, InNetDB, IntAct, BiGG, MINT, and MIPS • sequence homology between the gene pair (BLAST) David Montaner Bioinformatics in medicine 23/26
  • 24. Example III 5 Search for high scoring clusters affected by CNVs 6 Evaluating significance of cluster scores: 10.000 simulations David Montaner Bioinformatics in medicine 24/26
  • 25. Example IV 7 Functional characterization of the identified network 8 And, finally, draw conclusions David Montaner Bioinformatics in medicine 25/26