3. Recent Publications
R. Louhimo, T. Lepikhova, O. Monni, and S. Hautaniemi, ‖Comparative analysis of
algorithms for integration of copy number and expression data,‖ Nature
Methods, 2012.
The ENCODE Project Consortium, ―An integrated encyclopedia of DNA elements in
the human genome, ‖ Nature, 2012.
S. Aerts and J. Cools, ―Cancer: Mutations close in on gene regulation,‖ Nature, Jul.
2013.
V. J. H. Powell and A. Acharya, ―Disease Prevention: Data Integration,‖ Science, Dec.
2012.
A. Vinayagam, Y. Hu, M. Kulkarni, C. Roesel, R. Sopko, S. E. Mohr, and N. Perrimon
―Protein Complex–Based Analysis Framework for High-Throughput Data Sets,‖
Science Signaling, Feb. 2013.
Introduction to Data Integration in Bioinformatics
Dec. 2013
4. DNA the molecule of life
Protein-coding DNA makes up barely 2% of the human
genome, About 80% of the bases in the genome may be expressed
without an identified function.
Introduction to Data Integration in Bioinformatics
Dec. 2013
5. Gene Expression
DNA: Two long
biopolymers made of
nucleotides,composed of
nucleobase:
A: Adenine
T: Thymine
C: Cytosine
G: Guanine
termination codon
Poly-A tail
cap
start codon
Sequence of amino acids
Introduction to Data Integration in Bioinformatics
Dec. 2013
7. Next generation RNA-sequencing
EST: Expressed Sequence Tag
Reads of a single type of
nucleotide at one moment
(animation)
The number of nucleotide reads
at one moment
Reference:
Open Reading Frame
Introduction to Data Integration in Bioinformatics
Time
Dec. 2013
8. DNA structural variation: Copy number
CNV (Copy Number Variation):
• 12% of human genomic DNA
• 0.4% of the genome of unrelated people differ with respect
to copy number
• Range from 1000 nucleotide bases to several megabases
• Inherited or caused by de novo mutation (not inherited
from either parent).
Relation to disease:
Higher EGFR (Epidermal growth factor receptor) copy number
exist in Non-small cell lung cancer. (Cappuzzo et al. Journal of the
National Cancer Institute, 2005)
Higher copy number of CCL3L1 decreases susceptibility to HIV.
(Gonzalez et al. Nature, 2005)
Low copy number of FCGR3B increases susceptibility to
inflammatory autoimmune disorders (Aitman et al. Nature, 2006).
Introduction to Data Integration in Bioinformatics
Dec. 2013
9. Epigenome: DNA Methylation
Why we look so
different even we
have the exactly
identical genes ??
What, when and where
Epigenome
directions
Introduction to Data Integration in Bioinformatics
Genome
• Addition of a methyl group to the C or
A DNA nucleotides.
• Permanent and unidirectional
• Can be copied across cell divisions or
even passed on to offsprings
Dec. 2013
10. miRNA (microRNA)
Genome has protein-coding genes, also has genes that code for small RNA
e.g., ―transfer RNA‖ that is used in translation is coded by genes
e.g., ―ribosomal RNA‖ that forms part of the structure of the ribosome, is also
coded by genes
miRNA: 21-22 nucleotide non-coding RNA
miRNA Pathway
• Perfect complementary
binding leads to mRNA
degradation of the target
gene
• Imperfect pairing inhibits
translation of mRNA to
protein
RISC: RNA-induced silencing complex.
Use miRNA as a template for
recognizing complementary mRNA
Introduction to Data Integration in Bioinformatics
Dec. 2013
11. Clinical data
General clinical checkup data: temperature, blood pressure;
Pathology: blood test, antibody test;
Radiology: X-ray, CT (Computed tomography), Ultrasound, MRI (Magnetic
resonance imaging).
Texture Heterogeneity
High score
Low score
Introduction to Data Integration in Bioinformatics
Internal Arteries
High score
Low score
Dec. 2013
12. Challenges of data integration analysis
• Large highly connected data sources and
ontologies
• Heterogeneity: functions, structures, data access
and analysis methods, dissemination formats.
• Incomplete or overlapping data sources
• Frequent changes
Introduction to Data Integration in Bioinformatics
Dec. 2013
13. Case I
E. Segal et al.,―Decoding global gene expression programs in liver cancer by noninvasive
imaging,‖ nature biotechnology, May 2007.
E. Segal et al.
“, Module
network:
identifying
regulatory
modules and their
condition-specific
regulators from
gene expression
data,” nature
genetics, 2003.
Introduction to Data Integration in Bioinformatics
Dec. 2013
14. Case II
O. Gevaert et al., ―Non–Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers
by Leveraging Public Gene Expression Microarray Data—Methods and Preliminary Results
,‖ Radiology, Aug. 2012.
Introduction to Data Integration in Bioinformatics
Dec. 2013
Notes de l'éditeur
Researchers are now learning that another level of information—the epigenome—controls gene expression in part by controlling access to DNA. The gene-reading machinery is blocked when methyl molecules bind to DNA or histones.