Extracting biological meaning from large gene lists with DAVID

Extracting biological meaning from large gene list with DAVID Huang et al., CurrProtoc Bioinformatics (2009) http://david.abcc.ncifcrf.gov/home.jsp Francesco Mattia Mancuso (francesco.mancuso@crg.es) Bioinfarmatics Core Facility Short Tutorial

…significant capabilities to study a large variety of biological mechanisms, including associations with diseases large ‘interesting’ gene list (ranging in size from hundreds to thousands of genes) involved in studied biological conditions. Data Analysis of genes/proteins list ,[object Object]

Challenging task,[object Object]

Released in 2003 (Dennis et al., Genome Biol.; Hosack et al., Genome Biol.)

able to extract biological features/meaning associated with large gene lists

able to handle any type of gene listCommon strategy with other tool: ,[object Object]

to statistically highlight the most overrepresented biological annotation

Main objectives of GO project Compile and provide GO terms; Use of structured vocabularies in the annotation of gene products; Provide open access to the GO database and Web resource. Independent sets of vocabularies Molecular Function (MF) – elemental activity or task performed, or potentially performed, by individual gene products (e.g. “DNA binding” and “catalytic activity”); Cellular Component (CC) – location of action for a gene product (e.g. “organelle membrane” and “cytoskeleton”); Biological Process (BP) – broad biological objective or goal in which a gene product participates. (e.g. “DNA replication” and “response to stimulus”).

The accession ID belongs with the definition.

if a term changes (e.g., from “chromatin” to “structural component of chromatin”), but not the definition of the term, the accession ID will remain the same.Directed acyclic graphs (DAGs) Semantic relationships between parent and child terms: ,[object Object]

part_of: the child is a component of the parent, such as a subprocess or physical part (e.g. nucleolus is part of nuclear lumen),[object Object]

Enrichment and p-valuescalculatedwith a hypergeometricdistribution N = all genes (universe) M = all genes belonging to a pathway n = your gene list m = genes of your gene list that belongs to the pathway Other well-known statistical methods: χ2, Fisher’s exact test, Binomial probability

A 'good' gene list Contains many important genes (marker genes) as expected; Reasonable number of genes ranging from hundreds to thousands (e.g., 100–2,000 genes), not extremely low or high; Most of the genes significantly pass the statistical threshold; Portion of up- or down-regulated genes are involved in certain interesting biological processes, rather than being randomly spread throughout all possible biological processes; Consistently contain more enriched biology than that of a random list in the same size range; High reproducibility to generate a similar gene list under the same conditions; Data high quality can be confirmed by other independent experiments.

DAVIDhomepage: http://david.abcc.ncifcrf.gov/home.jsp

The wide-range collection of heterogeneous functional annotations in the DAVID Knowledgebase

Analytic tools/modules in DAVID Huang et al., Nature Protocols, 2009

GENE LIST MANAGEMENT PANEL: SUBMIT AND MANAGE USER’S GENE LISTS

GENE NAME BATCH VIEWER: EXPLORE GENE NAMES BASED ON USER’S GENE IDs

ID CONVERSION TOOL: CONVERT USERS’ GENE IDs TO DIFFERENT TYPES

Exercise 1 Submit data and convert the IDs Cicala, C. et al. HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication. Proc. Natl. Acad. Sci. USA 99, 9380–9385 (2002). “Freshly isolated peripheral blood mononuclear cells were treated with an HIV envelope protein (gp120) and genome-wide gene expression changes were observed using Affymetrix U95A microarray chips. The aim of the experiment was to investigate cellular responses to viral envelope protein infection, which may help in understanding the mechanisms for HIV replication in resting or sub-optimally activated peripheral blood mononuclear cells.” DOWNLOAD THE DATASET FROM : http://www.nature.com/nprot/journal/v4/n1/suppinfo/nprot.2008.211_S1.html Supplementary Data 2

GENE FUNCTIONAL CLASSIFICATION TOOL: CLASSIFY USERS’ GENES INTO CO-FUNCTIONAL GENE GROUPS

FUNCTIONAL ANNOTATION TOOL: IDENTIFY ENRICHEDBIOLOGY WITHIN USERS’ GENE LISTS

Extracting biological meaning from large gene lists with DAVID

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Extracting biological meaning from large gene lists with DAVID

Similaire à Extracting biological meaning from large gene lists with DAVID (20)

Dernier

Dernier (20)

Extracting biological meaning from large gene lists with DAVID

Notes de l'éditeur