4. Bioinformatics Fundamentals
Plan
PART I: Fundamentals • From biotechnology to
PART II: Career
PART III: Applications bioinformatics
• Bioinformatics world
• FOCUS: main areas
• 2 Key concepts between biology
and computer science
4
4
5. Bioinformatics Fundamentals
From Biotechnology to Bioinformatics 1
"Any technological application that uses biological
systems, living organisms, or derivatives thereof,
to make or modify products or processes for specific use.“1
1The United Nations Convention on Biological Diversity, 2008 5
6. Bioinformatics Fundamentals
From Biotechnology to Bioinformatics: apply area 2
Reduce dependence
on fertilizer,
pesticide,
agrochemical Good yield
Increase
nutritional quality
Novel substance
Agriculture Reduce
vulnerability
in crop plant
Pharmacogenomics Bio-process
Biochemical
Gene therapy
Biosystems
Genetic test
(DNA)
Medicine Bioinformatics
DNA Vaccines
Organism adapt.
Cloning
Environment
Clinical trials contamination
Education
Biotechnology Training Programs (BTPs)
2 Spellex BioScientific, v.2011 6 6
8. Bioinformatics Fundamentals
Challenges 4
• Accumulating mass of data
• Biological systems complexity
• Development of new research
interest on DNA
1950 1960 1970 1980 1990 2000 2010
4 Attwood T. K., 2012 8
9. Bioinformatics Fundamentals
Challenges 5
• Accumulating mass of data
• Biological systems complexity
• Development of new research
interest on DNA
9
5 MiPPI, 2007 9
10. Bioinformatics Fundamentals
Informatics world 6
• Data Manipulation / • Math
– Calculus
Management – Representation tools
– Modeling & predicting tools
–Creation (Learning, interpreting,
deducing, simulation, .. )
–Acquire / Collect
– Formalisms
– Exploration tools
– Optimization tools
• Process
– Theories –Experiment
–Organize
– Inference tools process design
–Store
– Statistics –Algorithm
–Secure – Graphics (Surfaces, Volumes) –Process
–Validate (standard, norms, safety) – Comparison and 3D Matching
(Vision, recognition)
–Workflow
–Analyze (statistics, mining)
–Visualize • Material
–Share (security, import, export, clean, …) – Server
– Archiving – Network
– Storage supports
– Processor
• Art & music
• Physics
• Software – Quantum computing
– Design (Human
machine interaction)
– Data manipulation tools – Signal treatment tools – Usefulness (beauty,
– Programming tools attractiveness)
– Biomedical material
– Artificial intelligence tools interaction (electric, optic – Philosophy
– High computing tools fiber, Wi-Fi, radio wave) – Signal
– Singling tools – Electrostatics
– Web 10
6Etienne Gnimpieba, 2012 – Robotics 10
11. Bioinformatics Fundamentals
Bioinformatics World: some topics 7
Genome Sequence Protein Sequence
• Finding Genes in Genomic DNA • Sequence Alignment
• Characterizing Repeats in Genomic DNA Dynamic Programming for Local vs Global Alignment
• Duplications in the Genome • Multiple Alignment and Consensus Patterns
• Secondary Structure “Prediction” • Scoring schemes and Matching statistics
(How to tell if a given alignment or match is statistically significant)
Genomics
• Expression Analysis Structures
• Large scale cross referencing of information
• Basic Protein Geometry and Least-Squares Fitting
• Function Classification and Orthologs
• Calculating a helix axis in 3D via fitting a line
• The Genomic vs. Single molecule Perspective
• Calculation of Volume and Surface
• Genome Comparisons
• Structural Alignment
• Structural Genomics
• Genome Trees
Databases
• Relational Database Concepts
• Natural Join as "where“ selection on cross product
Modeling & Simulation • Array Referencing (perl/dbm)
• Protein Units?
• Molecular Simulation • sequence, structure
• How to measure the change in a vector • motifs, modules, domains
(gradient) • Clustering and Trees
• UPGMA
• Parameter Sets
• single-linkage
• Number Density • multiple linkage
• Poisson-Boltzman Equation • Parsimony, Maximum likelihood
• Lattice Models and Simplification • The Bias Problem
7 Etienne Gnimpieba, 2012 11
12. Bioinformatics Fundamentals
Bioinformatics World: some topics 8
Experiment Compulation
Information Technology
Hardware & Instrumentation Mathematical & Physical Models
Methodology & Expertise
DNA Sequence
Genome sequencing Geomonic data Statistical
Gene & Genome
Organization analysis genetics
Sequence Physiology (and beyond)
Molecular
Evolution
Proteomics Protein structure prediction,
Protein Structure,
Folding, Function,
protein dynamics, protein folding
& Interaction and design
Metabolic
Pathways
Functional
Regulation genomics Data standards,
Signaling (microarrays, data representations, Dynamical
Networks
2D-PAGE, etc.) and analytical tools for systems modeling
Physiology & Cell complex biological data
Biology
Interspecies
Interaction High-tech
Ecology & field ecology
Environment
Computational ecology
8 SABU M. THAMPI, Dept. of CSE, LBS College of Engineering, Kasaragod, Kerala-671542, 2011 12
13. Bioinformatics Fundamentals
Key concept: central dogma of Molecular Biology 9,10
DNA DNA
E Transcription
Degradation
Gene mRNA
Repression
Translation
Degradation
E
Catalyse
S P
13
9 Barbeillini, 2003 10 Etienne Gnimpieba, 2012 13
14. Bioinformatics Fundamentals
Key concept: Lactose Operon (Lac) 11
Genes and its binding
sites
In the "induced" state, the lac repressor In the "repressed" state, the repressor IS
is NOT bound to the operator site bound to the operator.
11 blc.arizona.edu
14
18. Bioinformatics Career
Where can you be a bioinformatician? 12
• Public institution
– University( research project, training)
– Research center (research project)
– State & Federal agency (FDA, )
• Companies
– Pharmaceuticals
– Biotech
– Agricultural & food
– Health
– Information systems
Fundamental research Development research (product) Used, commercialization, market
Apply research
• Owner (your own boss)
– Contractor (entrepreneur)
– Consultant
• International institutions
– WHO
– UN
12 Etienne Gnimpieba, 2012 18
19. Bioinformatics Career
What do you do in Bioinformatics?
As informaticians, you have a lot of tasks • DNA computing
• Algorithms • Neural computing
• Databases and information systems • Evolutionary computing
• Web technologies • Immuno-computing
• Artificial intelligence and soft • Swarm-computing
computing • Cellular-computing
• Information and computation theory • Visualization
• Software engineering • Decisions making
• Data mining • Sequence Assembly
• Image processing • Genomic Sequence Analysis
• Modeling and simulation • Functional genomics
• Signal processing • Genotyping
• Discrete mathematics • Proteomics
• Control and system theory • Pharmacogenomics
• Statistics
• Integrative computing
• Database Administration
19
20. Bioinformatics Career
How to become a bioinformatician?
Skills Needed
• Database administration and programming skills
• (SQL Server, Oracle, Sybase, MySQL, CORBA, PERL, Java, C, C++,
web scripting).
• Genomic sequence analysis ,
• Molecular modeling programs,
• Biologist and computers scientists,
• Skills for data analysis, storage and retrieval.
• Skills filter information and from possible relationships between
datasets.
Training Eligibility biopharmaceutical :
• Bachelor • Life Sciences Graduates
• Master • Computer Sciences Graduates
• Databases Specialists
• MD
• Engineering Graduates
• PhD
• Marketing and Management Graduates
• High school diploma • MD-s, RN-s and Medical Professionals
20
21. Bioinformatics Career
Who does bioinformatics?
More than 100 profile denominations according to: country, company, domain, experience,
education profile, competence
From BIO based profile to Informatics based profile
• Bioinformatician • Biostatistician
– Cheminformatician • Scientist
– Computational Biologist • Biomedical Chemist
– Gene Analyst • Clinical Data Manager
– Genomic Scientist • Molecular Microbiologist
– Molecular Modeler • Software/Database
– Phylogenitist Programmer
– Protein Analyst • Medical Writer/Technical
– Scientific Curator Writer
– Structural Analyst • Research Associates and
• Biomedical Computer Scientist Research Scientists
• Geneticist • Data analyst
• Computational Biologist • Data designer
21
22. Bioinformatics Career
Career profile: an example
An example of a
Bioinformatician
work profile
22
22
23. Bioinformatics Career
Summary Part II 13
Data manipulation
• Cloud
• Databank
• Database
• Data designer
• Information manipulation
Informatics
• Create/collect information
Bio/life
• Statistic analysis
• Date inference, learning
• Model from data
• Model from SB
• Large scale model
Modeling & learning SB
13 Etienne Gnimpieba, 2012 23
25. Bioinformatics Applications
Overview 14
Pharma-
Biology Ad Hoc Interface
cology
PART I: Fundamentals
PART II: Career
PART III: Applications
Tools Tools
Ad Hoc Interface
Ad Hoc Interface
Tools
Tools
Ecology Medicine
CORE
Tools
Tools
Tools Tools
Computer Molecular
Science Ad Hoc Interface Nutrition
14 COSBI Report, 2010 25
26. Bioinformatics Applications
Small synopsis view of bioinformatics 15
15 Korean Bioinformation Center, 2010 26
27. Bioinformatics Applications
Informatician’s view of bioinformatics
• Data manipulation
– Data analysis
– Designing database and databank
– Management (collect, store, explore, secure)
– Inference/ mining
– Statistics
• Model design
– From biological process to mathematical formalism
– Model checking and validation
• Program building
– Data analyzing tools (implement algorithm)
– Integration tools (data, program, model)
– Modeling & Simulink tools
– Data protection tools
– …
27
29. Lab #1 Molecular online tools and server 16
Context Biological Hypothesis
Statement of problem / Case study:
The FXN gene provides instructions for making a protein called frataxin. This protein is found in cells throughout the body, with the highest levels in the heart, spinal cord, liver, pancreas, and muscles. The Reduced expression of frataxin is the
protein is used for voluntary movement (skeletal muscles). Within cells, frataxin is found in energy-producing structures called mitochondria. Although its function is not fully understood, frataxin appears to help assemble cause of Friedrich's ataxia (FRDA), a
clusters of iron and sulfur molecules that are critical for the function of many proteins, including those needed for energy production. Mutations in the FXN gene cause Friedreich ataxia. Friedreich ataxia is a genetic lethal neurodegenerative disease, how
condition that affects the nervous system and causes movement problems. Most people with Friedreich ataxia begin to experience the signs and symptoms of the disorder around puberty. about liver cancer?
0. Specification & aims Resolution process
Aim: T1. Genome exploration:
The purpose of this experiment is to initiate online Objective: used of Ensembl online tools to localize the FXN on the human genome and
biological exploration tools of the human genome. We identify the genes implicate in pancreatic cancer disease. After, getting an appropriate
simulated the application (FXN gene and pancreatic data (sequence) on FASTA and Blast format.
cancer). Now we can understand how a researcher can
come to identify cross biological knowledge available T1.1. Locate a given gene on human genome
in data banks. T1.2. Get a genomic sequence from NCBI
T1.3. Get the protein information and sequence from EBI
Keywords:
T1.4. Save the export sequences data in data folder
Bio: FXN, Frataxin, pancreatic cancer, CDKN4
Math: HMM,
Informatics: programing, bioinformatics tools, getting T2. Sequences manipulation
and exporting data Frataxin molecule structure
Objective: Find similar sequence using BLAST tools and make an alignment on given
FXN on chromosome
9
(pymol) sequences.
T2.1. Find similar sequences using BLAST tool
T2.2. Align generated sequences with ClustalW tool
T1.3. Visualized result using phylogenic tree on Jalview
Biological DB
?
T2. Bioextract server
Objective: used server tool to optimized data manipulation process, apply on Bioextract server.
Tools
T3.1. Server Initialization
T3.2. Pancreatic cancer & Frataxin (FXN)
T3.3. Mapping, Alignment
Pancreas anatomy Pancreatic cancer T3.4. Workflow save & reused
Acquired skills
Online and server tools:
- Query biological DB (fasta, Html, txt, figure formats) Conclusion: ?
- Sequence tools (protein and gene)
Mapping (tmap)
Alignment (clustalw2)
- Manage data result (select, keep, map, export)
- Built and reuse workflow
16 Korean Bioinformation Center, 2010 29
31. Bioinformatics Applications
Biostatistics: gene expression data analysis
Gene expression data (microarray, NGS) analysis process
Biological question
Differentially expressed genes
Sample class prediction etc.
Experimental design
Microarray experiment
Image analysis
Normalization
Estimation Testing Clustering Discrimination
Biological verification
and interpretation
31
32. Bioinformatics Applications
Example 3
Model design
Mathematical modeling of molecular nutrition
From food to molecule: folate absorption,
metabolism, and distribution
32
33. Bioinformatics Applications
Model design: Molecular nutrition and nutrigenomic 17
17 Achuthsankar S. Nair, 2007
34. Bioinformatics Applications
Example 2
Model design
Mathematical modeling of Biological systems
Folate mediate one carbon metabolism: MTHFR
(gene) mutation and cancer genesis
34
35. cle
105
4
10
Bioinformatics Applications 15
15
10
100
5
0 0.2 0.4 0.6 0.8 1 1.2 1.4
10 2
5
5 Mathematical modeling of Biological systems
Time(Hours)
18
0 0.2 0.4 0.6 0.8 1 1.2 1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4
Time(Hours) 00 0.5 1 1.5 2 2.5 3
Time(Hours) 0 0.2 0.4 0.6 0.8 1 1.2 1.4
Time(hours)
Folate metabolism (folic acid or Vitamin B9) and pathogenesis
Time(Hours)
45
140 AdoMet/AdoHc 40
60
135 AdoMet 35 y
Formalization of the model of metabolic networks
50
130
30
Unit AdoMet/AdoHcy
40
125 25 S dm ( t , P )
AdoHcy(µM)
AdoMet(µM)
Unit
rij(Eij,Vij)
Transmethylation pathway
Vc ( t , m ( t , P ),P ) Vr ( t ))
UM
120 20 30
dt
15
115
m
20 m rii(Eii,Vii) m ( t0 , P ) m0 ( P )
110
10
105
5
i
10
rji(Eji,Vji)
j
vij f (t , mij , Pij )
0
100 0 0.20 0.4 0.6 0.8 1 1.2 1.4
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6
Time(Hours) 0.8 1 1.2 1.4
Time(Hours) IntraCellCp.DNA Time(Hours)
States versus Time IntraCellCp.DNA_CH3
States versus Time
kc
18
10
20.02
Homocystei ne Methionine 10.18
20 10.16
16 45 0.7
DNA 9
19.98 DNA-CH3 10.14
40
d Homocystei ne
14
0.6
35
8
19.96 10.12
kc . Homocystei ne
dUMP(µM)
dTMP(µM)
12
0.5
19.94 10.1
Unit
Unit
7
30
dt
AdoHcy/AdoMet
AdoMet/AdoHcy
UM
UM Amount (µM)
Amount (µM)
10
25 6
19.92 0.4 10.08
8
20 5
19.9
0.3 d Methionine 10.06
6 15 19.88 kc . Homocystei ne 10.04
4
10
4
19.86
0.2
dt 10.02
3
0.1
5 19.84 10
2 0 5 10 15 0
2
0 0 Time(Hours)
0 0.2 0.4 0.6 0.8 1 1.2 1.4 0 0.2 0.4 0.6 0.8 1 1.2 1.4
0
Time(Hours) 1
0 1 2 3 4
Time(Hours) 5 6
1 2 3 4 5 6 Time (Hours)
Time (Hours)
20.02 2.01 10.18 0.514
Uracile methylation
20
dUMP 10.16
0.512
19.98
2
10.14
dUMP/dTMP
0.51
19.96 1.99 10.12
Unit
Unit dUMP/dTMP
dTMP/dUMP
dUMP(µM)
dTMP(µM)
19.94 10.1 0.508
UM
1.98
19.92 10.08 0.506
19.9 1.97 10.06
0.504
19.88 10.04
1.96 0.502
19.86 10.02
19.84 1.95 10 0.5
0 5 10 15 0 0 5 5 10 10 15 15 0
Time(Hours) Time(Hours)
Time(Hours)
18 J. M.
2.01 Scott, 1994 0.514 35
36. Bioinformatics Applications
Example 4
Model design
Drug-DNA interaction
[1] Saffroy & al., 2004
[2] Chango & al., 2008 36
37. Bioinformatics Applications
Model design: drug-DNA interaction 19
Protein/DNA
Ligand (drug molecule)
Evaluate the uploaded molecule Predict the possible target protein
through the Lipinski's Rule of Five allosteric site Target Protein ready for Docking
Target Protein ready for Docking Docking & Scoring
[1] Saffroy & al., 2004 37
19 B. Jayaram, 2011 [2] Chango & al., 2008 37
38. Bioinformatics Applications
Example 5
Model design
3D Modeling /simulation in biology
[1] Saffroy & al., 2004
[2] Chango & al., 2008 38
39. Bioinformatics Applications
Model design: 3D Modeling 20, 21
Google Body browser E-cell project
20 Google, 2011 21 E-Cell.org, 2011 39
40. Bioinformatics Applications
Example 6
Model design
Cancer tumor model
[1] Saffroy & al., 2004
[2] Chango & al., 2008 40
Hi,Welcome to this bioinformatics course.I’m Etienne, I have a bachelor in Mathematics and informatics, a master in Computer science, a Master Degree in computer science and mathematics for integrative biology, and a PhD in Biotechnology and bioinformatics.During this course, we can name informatician persons who have a background in information systems like a computer scientist. We can also name biology persons who have a background in the life sciences like biology, ecology, agriculture, and so on …[Next]
We will talk about fundamentals, career opportunities, and end with the applications of bioinformatics.Don’t try to retain the slide content. The aim of this talk is to give you the overview of bioinformatics as discipline.At the end of the talk, you will be able to indicate what role bioinformatics plays in the whole biotechnology area, some applications examples, and the career opportunities for bioinformaticians. It is an interactive talk, if you have a question, don’t hesitate to interrupt me.
Bioinformatics fundamentals depend on the public, here, we have to use some terms most often used byinformaticians, but I’m sure you have strong computational backgrounds.
- This part talk about bioinformatics in biotechnology area,- About What Bioinformatics is, Main areas developed in bioinformatics as Metabolic pathway, epigenetics, genomics, transcriptomics andproteomics- In this course we will also talk about2 Key concepts in biology and computer science: the lactose operon, and the central dogma of molecular biology
We cannot speak about bioinformatics fundamentalswithout biotechnology. To be referring to the united nation convention on biological diversity, biotechnology describe any technological application that uses for biological systems, living organism, or derivative thereof.....…. thereof to make or modify product or process for specific used, as biochemistry in genetics for genetically modify organism (GMO).From this conception .....
From this conception, we can draw a simple map of biotechnology area.This map organized around four principals axes corresponding to:The green biotech is for plants.The red biotech is for animals and healthcare. The Blue Biotech is for aquatic bioengineering. And the White biotech is for the industry area It is important to note that this card is just a representation of the area. For an idea, an reference bioscience dictionary "SpellexBioScience" has identified more than 13,000 biotechnical term in his 2011 version.So you may ask me, where in the figure does bioinformatics fall?Informatics (or information technology with computer) is used in each of these four branches.Then, the bioinformatics (informatics in bio world) tools are used in every apply domains concerning……….……… concerning Agriculture for green Biotech. Bioengineering like biodegradation and bioremediation for white BiotechMedicine for the green engineeringAnd Education area applicationsNow, let us talk about two important fields, the bio world and the informatics one.
The Bio world is too large and complex.Large because more than billions different elements to study.From the smallest element, invisible (the nucleotides, ACGTU) to ecosystem and environmental factors. Complex because each element interact with every one, and some interactions are unknownThe researchers in biology work on a specific domain (physiology), on a specialized theme (Cells growth and apoptosis process in cancer development), and for a given moment, researchers work on one subject based on only one hypothesis to verify (in epigenetic context, DNA methylation modify the cells growth profile).The problem here is having the right information at a given moment in their research process.
Fig. Historical milestones that have placed bioinformatics at the heart of 21st century biology, from the determination of the first amino acid sequence, to the development of an archive of 500 billion nucleotide sequences. Some major milestones are denoted in black; key computing innovations are indicated in purple; example databases are indicated in blue; organizations and institutions in green; numbers of sequences in red, the growing mass of which is highlighted both in the red curve and the background gradient – the impact of genomic sequencing in the mid ‘90s is clear.
In the other hand, we have the informatics worldSpecialize on information manipulation (remind that information is not data).
In this focus we have the bioinformatics world which rotates about:
We have spoken about biological species and interactions between them. the main issue is how mathematician can understand these interactions? Biologists propose two key tools in this direction. the lactose operon to detail the mechanism of gene expression, and the central dogma of molecular biology for interactions.The gene regions of the DNA in the nucleus of the cell is copied (transcribed) into the RNA andRNA travels to protein production sites and is translated into proteins. In short, DNA , RNA Proteins, is the Central Dogma of Molecular Biology. Imagine, there are trillions of cells in your body, the DNA of each of them is churning out thousands of RNAs which in turn cause thousands of proteins to be produced, every moment. One of them is making your hair strong, another giving the glitter in your eyes, another one carrying oxygen to different parts, and yet another one helping in the making of proteins themselves!No wonder that famous life scientist Russel Doolittle exclaimed: “We are our proteins”
With the study of the lactose operon, François Jacob, André Lwoff and Jacques Monod were the first scientists to describe a system for regulating gene transcription. They propose the existence of two classes of genes that differ in their function: the structural genes and regulatory genes. It is from this work was born the concept of gene regulation. (Nobel Prize for Physiology or Medicine in 1965).
Rearrange for poster
Research and academic institutes have also become big players in the employment market as more candidates look to acquire a PhD and some essential researching skills in the hope that it would lead to better opportunities in the future.
This figure represents an informatician’s perception of bioinformatics. So Bioinformatics has many side view applications.Biology, pharmacology, ….Each application side is accessible through an ad hoc interface adapted to the user's environment.
In the other hand, we can observe bioinformatics like an integrate based tool, process and databank according to the aims of our work.In that case, we have
In all cases, The practice of bioinformatics depends on several parameters. One of the most important is the context and profile of the practitioner. Thus, a basic user is a specialist in life sciences, meaningwe have different practices.From a biological standpoint, the practice has taken back in bioinformatics computing tools useful to the resolution of known biological problems (BIO-INFORMATICS).From a computing point of view, bioinformatics is the construction program and processes relevant to biology (computer-or computational Biology).And here lies the relationship between bioinformatics and computational biologyWhy do I bother to take this precision?What emerges from the perception of bioinformatics is a biologist and bioinformatician will be both a computer scientist. there are computer scientists who work on bioinformatics with a rudimentary level in biology gained during their experience.For aninformatician, bioinformatics refers to threethings:- Data manupulation- Programmbuiding- Model designingFor example….next
This is the lab template: The context is a biological context based on a real biological problem. And a given hypothesisI don’t use computer science, strong word.When you read this template, you have a different view than an informatician.You want to understand the process to build the used tools.The architecture of the systemThe algorithm implementationThe quality of the resulting dataAnd so on
This model illustrates the growth of a tumor and how it resists chemical treatment. A tumor consists of two kinds of cells: stem cells (blue) and transitory cells (all other colors).HOW IT WORKSDuring mitosis, a stem cell can divide either asymmetrically or symmetrically. In asymmetric mitosis, one of the two daughter cells remains a stem cell, replacing its parent. So a stem cell effectively never dies - it is quasi reincarnated after each division. The other daughter cell turns into a transitory cell that moves outward.Young transitory cells may divide, breeding other transitory cells. The transitory cells stop dividing at a certain age and change color from red to white to black, eventually dying.A stem cell may also divide symmetrically into two stem cells (blue). In this example the original stem cell divides symmetrically only once. The first stem cell remains static, but the second stem cell moves to the right. This activity, in which the cell advances into distant sites and creates another tumor colony, is called metastasis. Notice that the metastasis is red. It is made of cells that die young, when they are still red, rather than ending as black dots as in the static tumor. As the disease progresses, cells die younger and younger.
Epidemiologists use bioinformatics models as well. Our last example models epidemiology inquiries.HIV or other diseases can be modeled to show how it will spread or subside under certain conditions.