SlideShare une entreprise Scribd logo
1  sur  36
Extracting biological meaning from large gene list with DAVID Huang et al., CurrProtoc Bioinformatics (2009) http://david.abcc.ncifcrf.gov/home.jsp Francesco Mattia Mancuso (francesco.mancuso@crg.es) Bioinfarmatics Core Facility Short Tutorial
Introduction ,[object Object]
 Proteomics
 Expression microarray
 Promoter microarray
ChIP-on-CHIPs
 …significant capabilities to study a large variety of biological mechanisms, including associations with diseases large ‘interesting’ gene list (ranging in size from hundreds to thousands of genes) involved in studied biological conditions.  Data Analysis of genes/proteins list ,[object Object]
Challenging task,[object Object]
Released in 2003 (Dennis et al., Genome Biol.; Hosack et al., Genome Biol.)
able to extract biological features/meaning associated with large gene lists
able to handle any type of gene listCommon strategy with other tool: ,[object Object]
gene ontology terms
to statistically highlight the most overrepresented biological annotation
enrichment,[object Object]
Main objectives of GO project Compile and provide GO terms; Use of structured vocabularies in the annotation of gene products; Provide open access to the GO database and Web resource. Independent sets of vocabularies Molecular Function (MF) – elemental activity or task performed, or potentially performed, by individual gene products (e.g. “DNA binding” and “catalytic activity”); Cellular Component (CC) – location of action for a gene product (e.g. “organelle membrane” and “cytoskeleton”); Biological Process (BP) – broad biological objective or goal in which a gene product participates. (e.g. “DNA replication” and “response to stimulus”).
[object Object]
The accession ID belongs with the definition.
if a term changes (e.g., from “chromatin” to “structural component of chromatin”), but not the definition of the term, the accession ID will remain the same.Directed acyclic graphs (DAGs) Semantic relationships between parent and child terms: ,[object Object]
part_of: the child is a component of the parent, such as a subprocess or physical part (e.g. nucleolus is part of nuclear lumen),[object Object]
Enrichment  and p-valuescalculatedwith a hypergeometricdistribution N = all genes (universe) M = all genes belonging to a pathway n = your gene list m = genes of your gene list that belongs to the pathway Other well-known statistical methods: χ2, Fisher’s exact test, Binomial probability
A 'good' gene list Contains many important genes (marker genes) as expected; Reasonable number of genes ranging from hundreds to thousands (e.g., 100–2,000 genes), not extremely low or high; Most of the genes significantly pass the statistical threshold; Portion of up- or down-regulated genes are involved in certain interesting biological processes, rather than being randomly spread throughout all possible biological processes; Consistently contain more enriched biology than that of a random list in the same size range; High reproducibility to generate a similar gene list under the same conditions; Data high quality can be confirmed by other independent experiments.
DAVIDhomepage:   http://david.abcc.ncifcrf.gov/home.jsp
The wide-range collection of heterogeneous functional annotations in the DAVID Knowledgebase
Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
GENE LIST MANAGEMENT PANEL: SUBMIT AND MANAGE USER’S GENE LISTS
Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
GENE NAME BATCH VIEWER: EXPLORE GENE NAMES BASED ON USER’S GENE IDs
ID CONVERSION TOOL: CONVERT USERS’ GENE IDs TO DIFFERENT TYPES
Exercise 1 Submit data and convert the IDs Cicala, C. et al. HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication. Proc. Natl. Acad. Sci. USA 99, 9380–9385 (2002). “Freshly isolated peripheral blood mononuclear cells were treated with an HIV envelope protein (gp120) and genome-wide gene expression changes were observed using Affymetrix U95A microarray chips. The aim of the experiment was to investigate cellular responses to viral envelope protein infection, which may help in understanding the mechanisms for HIV replication in resting or sub-optimally activated peripheral blood mononuclear cells.” DOWNLOAD THE DATASET FROM : http://www.nature.com/nprot/journal/v4/n1/suppinfo/nprot.2008.211_S1.html Supplementary Data 2
Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
GENE FUNCTIONAL CLASSIFICATION TOOL: CLASSIFY USERS’ GENES INTO CO-FUNCTIONAL GENE GROUPS
Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
FUNCTIONAL ANNOTATION TOOL: IDENTIFY ENRICHEDBIOLOGY WITHIN USERS’ GENE LISTS

Contenu connexe

Tendances (20)

Orthologs,Paralogs & Xenologs
 Orthologs,Paralogs & Xenologs  Orthologs,Paralogs & Xenologs
Orthologs,Paralogs & Xenologs
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Kegg databse
Kegg databseKegg databse
Kegg databse
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Entrez databases
Entrez databasesEntrez databases
Entrez databases
 
Genomics types
Genomics typesGenomics types
Genomics types
 
Blast
BlastBlast
Blast
 
UniProt
UniProtUniProt
UniProt
 
Computational Biology and Bioinformatics
Computational Biology and BioinformaticsComputational Biology and Bioinformatics
Computational Biology and Bioinformatics
 
Fasta
FastaFasta
Fasta
 
Genome annotation
Genome annotationGenome annotation
Genome annotation
 
Physical maps and their use in annotations
Physical maps and their use in annotationsPhysical maps and their use in annotations
Physical maps and their use in annotations
 
Ddbj
DdbjDdbj
Ddbj
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Structural databases
Structural databases Structural databases
Structural databases
 
Introduction to databases.pptx
Introduction to databases.pptxIntroduction to databases.pptx
Introduction to databases.pptx
 
Fasta
FastaFasta
Fasta
 
Plant genomics general overview
Plant genomics general overviewPlant genomics general overview
Plant genomics general overview
 
Protein database
Protein databaseProtein database
Protein database
 

Similaire à Extracting biological meaning from large gene lists with DAVID

Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolJesminBinti
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomicsAisha Kalsoom
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformaticaMartín Arrieta
 
Functional annotation
Functional annotationFunctional annotation
Functional annotationRavi Gandham
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein functionLars Juhl Jensen
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolGenomika Diagnósticos
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONcsandit
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data setsimprovemed
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europeopen_phacts
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSMSCW Mysore
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED
 
Bioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirBioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirKAUSHAL SAHU
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discoveryLars Juhl Jensen
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!adcobb
 
Identification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesIdentification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesYoann Pageaud
 

Similaire à Extracting biological meaning from large gene lists with DAVID (20)

Bioinformatics .pptx
Bioinformatics .pptxBioinformatics .pptx
Bioinformatics .pptx
 
rheumatoid arthritis
rheumatoid arthritisrheumatoid arthritis
rheumatoid arthritis
 
Bioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST ToolBioinformatics Introduction and Use of BLAST Tool
Bioinformatics Introduction and Use of BLAST Tool
 
Bioinformatics and functional genomics
Bioinformatics and functional genomicsBioinformatics and functional genomics
Bioinformatics and functional genomics
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 
Functional annotation
Functional annotationFunctional annotation
Functional annotation
 
Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
X-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage toolX-Meeting Poster 2015 - Vallys A Coverage tool
X-Meeting Poster 2015 - Vallys A Coverage tool
 
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSIONCOMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
COMPUTATIONAL METHODS FOR FUNCTIONAL ANALYSIS OF GENE EXPRESSION
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
How to analyse large data sets
How to analyse large data setsHow to analyse large data sets
How to analyse large data sets
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
IJSRED-V2I1P5
IJSRED-V2I1P5IJSRED-V2I1P5
IJSRED-V2I1P5
 
Bioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sirBioinformatics, application by kk sahu sir
Bioinformatics, application by kk sahu sir
 
Utilizing literature for biological discovery
Utilizing literature for biological discoveryUtilizing literature for biological discovery
Utilizing literature for biological discovery
 
Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!Introduction to Gene Mining Part A: BLASTn-off!
Introduction to Gene Mining Part A: BLASTn-off!
 
Identification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databasesIdentification of PFOA linked metabolic diseases by crossing databases
Identification of PFOA linked metabolic diseases by crossing databases
 

Dernier

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 

Dernier (20)

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 

Extracting biological meaning from large gene lists with DAVID

  • 1. Extracting biological meaning from large gene list with DAVID Huang et al., CurrProtoc Bioinformatics (2009) http://david.abcc.ncifcrf.gov/home.jsp Francesco Mattia Mancuso (francesco.mancuso@crg.es) Bioinfarmatics Core Facility Short Tutorial
  • 2.
  • 7.
  • 8.
  • 9. Released in 2003 (Dennis et al., Genome Biol.; Hosack et al., Genome Biol.)
  • 10. able to extract biological features/meaning associated with large gene lists
  • 11.
  • 13. to statistically highlight the most overrepresented biological annotation
  • 14.
  • 15. Main objectives of GO project Compile and provide GO terms; Use of structured vocabularies in the annotation of gene products; Provide open access to the GO database and Web resource. Independent sets of vocabularies Molecular Function (MF) – elemental activity or task performed, or potentially performed, by individual gene products (e.g. “DNA binding” and “catalytic activity”); Cellular Component (CC) – location of action for a gene product (e.g. “organelle membrane” and “cytoskeleton”); Biological Process (BP) – broad biological objective or goal in which a gene product participates. (e.g. “DNA replication” and “response to stimulus”).
  • 16.
  • 17. The accession ID belongs with the definition.
  • 18.
  • 19.
  • 20. Enrichment and p-valuescalculatedwith a hypergeometricdistribution N = all genes (universe) M = all genes belonging to a pathway n = your gene list m = genes of your gene list that belongs to the pathway Other well-known statistical methods: χ2, Fisher’s exact test, Binomial probability
  • 21. A 'good' gene list Contains many important genes (marker genes) as expected; Reasonable number of genes ranging from hundreds to thousands (e.g., 100–2,000 genes), not extremely low or high; Most of the genes significantly pass the statistical threshold; Portion of up- or down-regulated genes are involved in certain interesting biological processes, rather than being randomly spread throughout all possible biological processes; Consistently contain more enriched biology than that of a random list in the same size range; High reproducibility to generate a similar gene list under the same conditions; Data high quality can be confirmed by other independent experiments.
  • 22. DAVIDhomepage: http://david.abcc.ncifcrf.gov/home.jsp
  • 23. The wide-range collection of heterogeneous functional annotations in the DAVID Knowledgebase
  • 24. Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
  • 25. GENE LIST MANAGEMENT PANEL: SUBMIT AND MANAGE USER’S GENE LISTS
  • 26. Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
  • 27. GENE NAME BATCH VIEWER: EXPLORE GENE NAMES BASED ON USER’S GENE IDs
  • 28. ID CONVERSION TOOL: CONVERT USERS’ GENE IDs TO DIFFERENT TYPES
  • 29.
  • 30. Exercise 1 Submit data and convert the IDs Cicala, C. et al. HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication. Proc. Natl. Acad. Sci. USA 99, 9380–9385 (2002). “Freshly isolated peripheral blood mononuclear cells were treated with an HIV envelope protein (gp120) and genome-wide gene expression changes were observed using Affymetrix U95A microarray chips. The aim of the experiment was to investigate cellular responses to viral envelope protein infection, which may help in understanding the mechanisms for HIV replication in resting or sub-optimally activated peripheral blood mononuclear cells.” DOWNLOAD THE DATASET FROM : http://www.nature.com/nprot/journal/v4/n1/suppinfo/nprot.2008.211_S1.html Supplementary Data 2
  • 31. Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
  • 32. GENE FUNCTIONAL CLASSIFICATION TOOL: CLASSIFY USERS’ GENES INTO CO-FUNCTIONAL GENE GROUPS
  • 33.
  • 34.
  • 35. Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
  • 36. FUNCTIONAL ANNOTATION TOOL: IDENTIFY ENRICHEDBIOLOGY WITHIN USERS’ GENE LISTS
  • 37.
  • 38. Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
  • 39.
  • 41. Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
  • 43. Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009
  • 44. Functional Annotation Table There is no statistics applied in this report.
  • 45. Attention!!!!! DAVID enrichment analysis is more of an exploratory procedure than a pure statistical solution. “The final interpretation and analytic result decisions (in terms of accepting the results that make sense biologically in the context of the study, or rejecting ones that do not) should be made by the biologists/analysts themselves, rather than by any of the tools.” (Huang et al., 2009)
  • 46.
  • 47.
  • 48. EASE Score Threshold (Maximum Probability): the threshold of EASE Score, a modified Fisher Exact P-value, for gene-enrichment analysis. It ranges from 0 to 1. Fisher Exact P-Value = 0 represents perfect enrichment.
  • 49. The Fold Enrichment is defined as the ratio of the two proportions. For example, if 40/400 (i.e. 10%) of your input genes involved in "kinase activity" and the background information is 300/30000 genes (i.e.  1%) associating with "kinase activity", roughly 10% / 1% = 10 fold enrichment.
  • 50. In DAVID annotation system, Fisher Exact is adopted to measure the gene-enrichment in annotation terms. When members of two independent groups can fall into one of two mutually exclusive categories, Fisher Exact test is used to determine whether the proportions of those falling into each category differs by group.
  • 51. Benjamini-Hochberg, Bonferroni, FDR (False Discovery Rate) are different 'standard' statistics for multiple comparison corrections. They correct P-values to be more conservative in order to lower family-wise false discovery rate.
  • 52. LT (list total): number of genes in your gene list mapped to any term in this ontology ("system”)
  • 53. PH (population hits): number of genes with this GO term on the background list (the whole chip)
  • 54. PT (population total): number of genes on the background list (the whole chip) mapped to any term in this ontology ("system”)

Notes de l'éditeur

  1. GoMiner, GOstat, Onto-express, GoToolBox, FatiGO, GFINDer and GSEA
  2. 3 - (e.g., selecting genes by comparing gene expression between control and experimental cells with t-test statistics: fold changes greater than or equal to 2 and P-values less than or equal to 0.05)6 - e.g., by independent experiments under the same conditions or by leave-one-out statistical test
  3. Functional classification: ability for investigators to explore and view functionally related genes together, as a unit, to concentrate on the larger biological network rather than at the level of an individual gene.Functional Annotation chart: provides typical gene–term enrichment (overrepresented) analysis to identify the most relevant (overrepresented) biological terms associated with a given gene listFunctional Annotation Clustering: uses a similar fuzzy clustering concept as functional classification by measuring relationships among the annotation terms on the basis of the degree of their coassociation with genes within the user’s list to cluster somewhat heterogeneous, yet highly similar annotation into functional annotation groupsFunctional annotation table: is a query engine for the DAVID knowledgebase, without statistical calculations. For a given gene list, the tool can quickly query corresponding annotation for each gene and present them in a table format.