SlideShare une entreprise Scribd logo
1  sur  14
Introduction to Data
Integration in Bioinformatics
Yan Xu

Dec. 2013
Data Integration
Copy
Number

Epigenome

Methylation

miRNA

Gene
Expression
Clinical data

Introduction to Data Integration in Bioinformatics

Pathways

Dec. 2013
Recent Publications
R. Louhimo, T. Lepikhova, O. Monni, and S. Hautaniemi, ‖Comparative analysis of
algorithms for integration of copy number and expression data,‖ Nature
Methods, 2012.
The ENCODE Project Consortium, ―An integrated encyclopedia of DNA elements in
the human genome, ‖ Nature, 2012.
S. Aerts and J. Cools, ―Cancer: Mutations close in on gene regulation,‖ Nature, Jul.
2013.
V. J. H. Powell and A. Acharya, ―Disease Prevention: Data Integration,‖ Science, Dec.
2012.
A. Vinayagam, Y. Hu, M. Kulkarni, C. Roesel, R. Sopko, S. E. Mohr, and N. Perrimon
―Protein Complex–Based Analysis Framework for High-Throughput Data Sets,‖
Science Signaling, Feb. 2013.

Introduction to Data Integration in Bioinformatics

Dec. 2013
DNA the molecule of life

Protein-coding DNA makes up barely 2% of the human
genome, About 80% of the bases in the genome may be expressed
without an identified function.

Introduction to Data Integration in Bioinformatics

Dec. 2013
Gene Expression
DNA: Two long
biopolymers made of
nucleotides,composed of
nucleobase:
A: Adenine
T: Thymine
C: Cytosine
G: Guanine

termination codon
Poly-A tail

cap

start codon
Sequence of amino acids

Introduction to Data Integration in Bioinformatics

Dec. 2013
Microarray

Reverse Transcription

Result

Introduction to Data Integration in Bioinformatics

Dec. 2013
Next generation RNA-sequencing
EST: Expressed Sequence Tag
Reads of a single type of
nucleotide at one moment

(animation)

The number of nucleotide reads
at one moment

Reference:
Open Reading Frame

Introduction to Data Integration in Bioinformatics

Time

Dec. 2013
DNA structural variation: Copy number
CNV (Copy Number Variation):
• 12% of human genomic DNA
• 0.4% of the genome of unrelated people differ with respect
to copy number
• Range from 1000 nucleotide bases to several megabases
• Inherited or caused by de novo mutation (not inherited
from either parent).
Relation to disease:
Higher EGFR (Epidermal growth factor receptor) copy number
exist in Non-small cell lung cancer. (Cappuzzo et al. Journal of the
National Cancer Institute, 2005)
Higher copy number of CCL3L1 decreases susceptibility to HIV.
(Gonzalez et al. Nature, 2005)
Low copy number of FCGR3B increases susceptibility to
inflammatory autoimmune disorders (Aitman et al. Nature, 2006).

Introduction to Data Integration in Bioinformatics

Dec. 2013
Epigenome: DNA Methylation
Why we look so
different even we
have the exactly
identical genes ??

What, when and where
Epigenome
directions

Introduction to Data Integration in Bioinformatics

Genome

• Addition of a methyl group to the C or
A DNA nucleotides.
• Permanent and unidirectional
• Can be copied across cell divisions or
even passed on to offsprings

Dec. 2013
miRNA (microRNA)
Genome has protein-coding genes, also has genes that code for small RNA
e.g., ―transfer RNA‖ that is used in translation is coded by genes
e.g., ―ribosomal RNA‖ that forms part of the structure of the ribosome, is also
coded by genes
miRNA: 21-22 nucleotide non-coding RNA

miRNA Pathway

• Perfect complementary
binding leads to mRNA
degradation of the target
gene
• Imperfect pairing inhibits
translation of mRNA to
protein

RISC: RNA-induced silencing complex.
Use miRNA as a template for
recognizing complementary mRNA

Introduction to Data Integration in Bioinformatics

Dec. 2013
Clinical data
General clinical checkup data: temperature, blood pressure;
Pathology: blood test, antibody test;

Radiology: X-ray, CT (Computed tomography), Ultrasound, MRI (Magnetic
resonance imaging).
Texture Heterogeneity

High score

Low score

Introduction to Data Integration in Bioinformatics

Internal Arteries

High score

Low score

Dec. 2013
Challenges of data integration analysis
• Large highly connected data sources and
ontologies

• Heterogeneity: functions, structures, data access
and analysis methods, dissemination formats.
• Incomplete or overlapping data sources
• Frequent changes

Introduction to Data Integration in Bioinformatics

Dec. 2013
Case I

E. Segal et al.,―Decoding global gene expression programs in liver cancer by noninvasive
imaging,‖ nature biotechnology, May 2007.

E. Segal et al.
“, Module
network:
identifying
regulatory
modules and their
condition-specific
regulators from
gene expression
data,” nature
genetics, 2003.

Introduction to Data Integration in Bioinformatics

Dec. 2013
Case II

O. Gevaert et al., ―Non–Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers
by Leveraging Public Gene Expression Microarray Data—Methods and Preliminary Results
,‖ Radiology, Aug. 2012.

Introduction to Data Integration in Bioinformatics

Dec. 2013

Contenu connexe

Tendances

Introduction to genes and gene theraph ysss
Introduction to genes and gene theraph ysssIntroduction to genes and gene theraph ysss
Introduction to genes and gene theraph ysss
farranajwa
 
Human genome
Human genomeHuman genome
Human genome
Dansfera
 

Tendances (18)

Drug Discovery: Proteomics, Genomics
Drug Discovery: Proteomics, GenomicsDrug Discovery: Proteomics, Genomics
Drug Discovery: Proteomics, Genomics
 
proteomics and system biology
proteomics and system biologyproteomics and system biology
proteomics and system biology
 
NetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver HartNetBioSIG2014-Talk by Traver Hart
NetBioSIG2014-Talk by Traver Hart
 
Analisis de la expresion de genes en la depresion
Analisis de la expresion de genes en la depresionAnalisis de la expresion de genes en la depresion
Analisis de la expresion de genes en la depresion
 
Integration of heterogeneous data
Integration of heterogeneous dataIntegration of heterogeneous data
Integration of heterogeneous data
 
Big Datasets and Highly Sensitive Data
Big Datasets and Highly Sensitive DataBig Datasets and Highly Sensitive Data
Big Datasets and Highly Sensitive Data
 
Genomics and proteomics by shreeman
Genomics and proteomics by shreemanGenomics and proteomics by shreeman
Genomics and proteomics by shreeman
 
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
Phylogeny of Bacterial and Archaeal Genomes Using Conserved Genes: Supertrees...
 
Introduction to genes and gene theraph ysss
Introduction to genes and gene theraph ysssIntroduction to genes and gene theraph ysss
Introduction to genes and gene theraph ysss
 
Genomics
GenomicsGenomics
Genomics
 
OMICS tecnology
OMICS tecnologyOMICS tecnology
OMICS tecnology
 
A linear motif atlas for phosphorylation-dependent signaling
A linear motif atlas for phosphorylation-dependent signalingA linear motif atlas for phosphorylation-dependent signaling
A linear motif atlas for phosphorylation-dependent signaling
 
Dr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 MedicineDr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 Medicine
 
Role of biotechnology in cancer control
Role of biotechnology in cancer controlRole of biotechnology in cancer control
Role of biotechnology in cancer control
 
NGS in cancer treatment
NGS in cancer treatmentNGS in cancer treatment
NGS in cancer treatment
 
Computational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IKComputational Genomics - Bioinformatics - IK
Computational Genomics - Bioinformatics - IK
 
Systems biology
Systems biologySystems biology
Systems biology
 
Human genome
Human genomeHuman genome
Human genome
 

En vedette

Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
Hong ChangBum
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seq
Paul Gardner
 
Transporte em nanoestruturas_3_algumas_consideracoes_fisicas
Transporte em nanoestruturas_3_algumas_consideracoes_fisicasTransporte em nanoestruturas_3_algumas_consideracoes_fisicas
Transporte em nanoestruturas_3_algumas_consideracoes_fisicas
REGIANE APARECIDA RAGI PEREIRA
 
E2LOGY Cloud presentation
E2LOGY Cloud presentationE2LOGY Cloud presentation
E2LOGY Cloud presentation
E2LOGY
 

En vedette (20)

Next-generation genomics: an integrative approach
Next-generation genomics: an integrative approachNext-generation genomics: an integrative approach
Next-generation genomics: an integrative approach
 
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical ImagesCloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
 
Introduction to RNA-seq
Introduction to RNA-seqIntroduction to RNA-seq
Introduction to RNA-seq
 
20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析20081216 06陳倩琪 紅麴菌基因體之定序與分析
20081216 06陳倩琪 紅麴菌基因體之定序與分析
 
Question 1
Question 1Question 1
Question 1
 
Unidad 5.
Unidad 5.Unidad 5.
Unidad 5.
 
Results
ResultsResults
Results
 
Tdd
TddTdd
Tdd
 
ONLINE STORE BUSINESS IN FAN PAGE
ONLINE STORE BUSINESS IN FAN PAGEONLINE STORE BUSINESS IN FAN PAGE
ONLINE STORE BUSINESS IN FAN PAGE
 
Unidad 5 (1).
Unidad 5 (1).Unidad 5 (1).
Unidad 5 (1).
 
Like a boss
Like a bossLike a boss
Like a boss
 
Transporte em nanoestruturas_3_algumas_consideracoes_fisicas
Transporte em nanoestruturas_3_algumas_consideracoes_fisicasTransporte em nanoestruturas_3_algumas_consideracoes_fisicas
Transporte em nanoestruturas_3_algumas_consideracoes_fisicas
 
What is Android L ?
What is Android L ?What is Android L ?
What is Android L ?
 
Pollution
PollutionPollution
Pollution
 
Yoursalespitchsuckspdf 140121071847-phpapp02
Yoursalespitchsuckspdf 140121071847-phpapp02Yoursalespitchsuckspdf 140121071847-phpapp02
Yoursalespitchsuckspdf 140121071847-phpapp02
 
Soldar.
Soldar.Soldar.
Soldar.
 
Iptek 2
Iptek 2Iptek 2
Iptek 2
 
My favourite house
My favourite houseMy favourite house
My favourite house
 
E2LOGY Cloud presentation
E2LOGY Cloud presentationE2LOGY Cloud presentation
E2LOGY Cloud presentation
 
Water
WaterWater
Water
 

Similaire à Introduction to data integration in bioinformatics

Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
Martín Arrieta
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
New York City College of Technology Computer Systems Technology Colloquium
 

Similaire à Introduction to data integration in bioinformatics (20)

Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Introducción a la bioinformatica
Introducción a la bioinformaticaIntroducción a la bioinformatica
Introducción a la bioinformatica
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
A New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The ClinicA New Generation Of Mechanism-Based Biomarkers For The Clinic
A New Generation Of Mechanism-Based Biomarkers For The Clinic
 
Genomics: Personalised Medicine in Brain Cancer?
Genomics: Personalised Medicine in Brain Cancer?Genomics: Personalised Medicine in Brain Cancer?
Genomics: Personalised Medicine in Brain Cancer?
 
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
 
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMERGENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
GENE-GENE INTERACTION ANALYSIS IN ALZHEIMER
 
Personalized medicine through wes and big data analytics
Personalized medicine through wes and big data analyticsPersonalized medicine through wes and big data analytics
Personalized medicine through wes and big data analytics
 
Next Generation Sequencing
Next Generation SequencingNext Generation Sequencing
Next Generation Sequencing
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Applications of molecular genetics
Applications of molecular geneticsApplications of molecular genetics
Applications of molecular genetics
 
Integrative analysis of medical imaging and omics
Integrative analysis of medical imaging and omicsIntegrative analysis of medical imaging and omics
Integrative analysis of medical imaging and omics
 
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
Pharmacology Powered by Computational Analysis: Predicting Cardiotoxicity of ...
 
INBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria LópezINBIOMEDvision Workshop at MIE 2011. Victoria López
INBIOMEDvision Workshop at MIE 2011. Victoria López
 
bjr.20230211.pdf
bjr.20230211.pdfbjr.20230211.pdf
bjr.20230211.pdf
 
Maria A. Diroma – MEWAs: sviluppo di un sistema bioinformatico per studi di a...
Maria A. Diroma – MEWAs: sviluppo di un sistema bioinformatico per studi di a...Maria A. Diroma – MEWAs: sviluppo di un sistema bioinformatico per studi di a...
Maria A. Diroma – MEWAs: sviluppo di un sistema bioinformatico per studi di a...
 
P4 Medicine: A Vision For Your Molecular Health
P4 Medicine: A Vision For Your Molecular HealthP4 Medicine: A Vision For Your Molecular Health
P4 Medicine: A Vision For Your Molecular Health
 
G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...
G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...
G. Poste. Big Data and the Evolution of Precision Medicine, Cambridge 2nd Ann...
 
MLGG_for_linkedIn
MLGG_for_linkedInMLGG_for_linkedIn
MLGG_for_linkedIn
 
Amia tb-review-13
Amia tb-review-13Amia tb-review-13
Amia tb-review-13
 

Plus de Yan Xu

Plus de Yan Xu (20)

Kaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales ForecastingKaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales Forecasting
 
Basics of Dynamic programming
Basics of Dynamic programming Basics of Dynamic programming
Basics of Dynamic programming
 
Walking through Tensorflow 2.0
Walking through Tensorflow 2.0Walking through Tensorflow 2.0
Walking through Tensorflow 2.0
 
Practical contextual bandits for business
Practical contextual bandits for businessPractical contextual bandits for business
Practical contextual bandits for business
 
Introduction to Multi-armed Bandits
Introduction to Multi-armed BanditsIntroduction to Multi-armed Bandits
Introduction to Multi-armed Bandits
 
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
A Data-Driven Question Generation Model for Educational Content - by Jack WangA Data-Driven Question Generation Model for Educational Content - by Jack Wang
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
 
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Linear algebra and probability (Deep Learning chapter 2&3)
Linear algebra and probability (Deep Learning chapter 2&3)Linear algebra and probability (Deep Learning chapter 2&3)
Linear algebra and probability (Deep Learning chapter 2&3)
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Secrets behind AlphaGo
Secrets behind AlphaGoSecrets behind AlphaGo
Secrets behind AlphaGo
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural Network
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

Introduction to data integration in bioinformatics

  • 1. Introduction to Data Integration in Bioinformatics Yan Xu Dec. 2013
  • 3. Recent Publications R. Louhimo, T. Lepikhova, O. Monni, and S. Hautaniemi, ‖Comparative analysis of algorithms for integration of copy number and expression data,‖ Nature Methods, 2012. The ENCODE Project Consortium, ―An integrated encyclopedia of DNA elements in the human genome, ‖ Nature, 2012. S. Aerts and J. Cools, ―Cancer: Mutations close in on gene regulation,‖ Nature, Jul. 2013. V. J. H. Powell and A. Acharya, ―Disease Prevention: Data Integration,‖ Science, Dec. 2012. A. Vinayagam, Y. Hu, M. Kulkarni, C. Roesel, R. Sopko, S. E. Mohr, and N. Perrimon ―Protein Complex–Based Analysis Framework for High-Throughput Data Sets,‖ Science Signaling, Feb. 2013. Introduction to Data Integration in Bioinformatics Dec. 2013
  • 4. DNA the molecule of life Protein-coding DNA makes up barely 2% of the human genome, About 80% of the bases in the genome may be expressed without an identified function. Introduction to Data Integration in Bioinformatics Dec. 2013
  • 5. Gene Expression DNA: Two long biopolymers made of nucleotides,composed of nucleobase: A: Adenine T: Thymine C: Cytosine G: Guanine termination codon Poly-A tail cap start codon Sequence of amino acids Introduction to Data Integration in Bioinformatics Dec. 2013
  • 6. Microarray Reverse Transcription Result Introduction to Data Integration in Bioinformatics Dec. 2013
  • 7. Next generation RNA-sequencing EST: Expressed Sequence Tag Reads of a single type of nucleotide at one moment (animation) The number of nucleotide reads at one moment Reference: Open Reading Frame Introduction to Data Integration in Bioinformatics Time Dec. 2013
  • 8. DNA structural variation: Copy number CNV (Copy Number Variation): • 12% of human genomic DNA • 0.4% of the genome of unrelated people differ with respect to copy number • Range from 1000 nucleotide bases to several megabases • Inherited or caused by de novo mutation (not inherited from either parent). Relation to disease: Higher EGFR (Epidermal growth factor receptor) copy number exist in Non-small cell lung cancer. (Cappuzzo et al. Journal of the National Cancer Institute, 2005) Higher copy number of CCL3L1 decreases susceptibility to HIV. (Gonzalez et al. Nature, 2005) Low copy number of FCGR3B increases susceptibility to inflammatory autoimmune disorders (Aitman et al. Nature, 2006). Introduction to Data Integration in Bioinformatics Dec. 2013
  • 9. Epigenome: DNA Methylation Why we look so different even we have the exactly identical genes ?? What, when and where Epigenome directions Introduction to Data Integration in Bioinformatics Genome • Addition of a methyl group to the C or A DNA nucleotides. • Permanent and unidirectional • Can be copied across cell divisions or even passed on to offsprings Dec. 2013
  • 10. miRNA (microRNA) Genome has protein-coding genes, also has genes that code for small RNA e.g., ―transfer RNA‖ that is used in translation is coded by genes e.g., ―ribosomal RNA‖ that forms part of the structure of the ribosome, is also coded by genes miRNA: 21-22 nucleotide non-coding RNA miRNA Pathway • Perfect complementary binding leads to mRNA degradation of the target gene • Imperfect pairing inhibits translation of mRNA to protein RISC: RNA-induced silencing complex. Use miRNA as a template for recognizing complementary mRNA Introduction to Data Integration in Bioinformatics Dec. 2013
  • 11. Clinical data General clinical checkup data: temperature, blood pressure; Pathology: blood test, antibody test; Radiology: X-ray, CT (Computed tomography), Ultrasound, MRI (Magnetic resonance imaging). Texture Heterogeneity High score Low score Introduction to Data Integration in Bioinformatics Internal Arteries High score Low score Dec. 2013
  • 12. Challenges of data integration analysis • Large highly connected data sources and ontologies • Heterogeneity: functions, structures, data access and analysis methods, dissemination formats. • Incomplete or overlapping data sources • Frequent changes Introduction to Data Integration in Bioinformatics Dec. 2013
  • 13. Case I E. Segal et al.,―Decoding global gene expression programs in liver cancer by noninvasive imaging,‖ nature biotechnology, May 2007. E. Segal et al. “, Module network: identifying regulatory modules and their condition-specific regulators from gene expression data,” nature genetics, 2003. Introduction to Data Integration in Bioinformatics Dec. 2013
  • 14. Case II O. Gevaert et al., ―Non–Small Cell Lung Cancer: Identifying Prognostic Imaging Biomarkers by Leveraging Public Gene Expression Microarray Data—Methods and Preliminary Results ,‖ Radiology, Aug. 2012. Introduction to Data Integration in Bioinformatics Dec. 2013

Notes de l'éditeur

  1. Researchers are now learning that another level of information—the epigenome—controls gene expression in part by controlling access to DNA. The gene-reading machinery is blocked when methyl molecules bind to DNA or histones.