SlideShare une entreprise Scribd logo
1  sur  44
Exploring Your Personal
Genome with Free, Online
Bioinformatics Tools
by
Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP
.org 2014 Tech Conference
What is the future of genomic sciences and bioinformatics?
Ethical considerations of newborn screening:
privacy, inaccuracy, discrimination, eugenics
Video: Gattica (1997): http://www.youtube.com/watch?v=1Q67bMYOm7E
Reduced cost: The $1,000 genome
“Illumina’s DNA Supercomputer Ushers in the $1,000 Human Genome” (January 14, 2014)
http://www.businessweek.com/articles/2014-01-14/illuminas-dna-supercomputer-ushers-in-the-1-000-human-genome
+ Genome sequencing at birth:
“Baby DNA Analysis Ushers in Brave New World of Treatment” (January 16, 2014)
http://www.bloomberg.com/news/2014-01-16/baby-dna-analysis-ushers-in-brave-new-world-of-treatment-health.html
= Big industry
“Illumina and a Billionaire Want to Jump-Start Genomics Upstarts” (February 17, 2014)
http://www.businessweek.com/articles/2014-02-17/illimuna-and-billionaire-yuri-milner-to-aid-genomics-startups
The future of genomic sciences and bioinformatics is NOW.
Presentation Overview: Predictive Pathology
Hopefully you will learn a great deal today about the biological basis of disease.
Specifically, we will discuss the following pathways in which disease can occur:
• At conception, chromosomes from both parents combine to pass on genetic material to a child. Sometimes when chromosomes
combine there are problems that occur in this crossing over process called chiasma, and these variations are not inherited.
• Chromosomal abnormalities like an addition, deletion, translocation, inversion, or insertion, are inherited. A common example of a
structural variation would be Down Syndrome where there is an additional copy of chromosome 21.
• Also at conception, because chromosomes contain DNA, the specific traits (called phenotypes) and the genetic code (called
genotypes) are also transferred. Genotypes are always present, while phenotypes may be expressed (dominant) or hidden
(recessive) in an individual. Recessive traits can be passed on through generations expressing themselves down the family line, and
dominant traits can skip generations. A common example of an autosomal recessive heritable disease is sickle cell anemia.
• During childhood and adulthood, factors like the environment (such as exposure to chemicals), diet, exercise, aging, et cetera can
also damage genes, mutating them, and this may lead to disease. The branch of study examining context dependent, non-inherited
factors is called epigenetics. An example of this is Protein misfolding.
• Inherited and de novo (chiasmic, protein misfolding, and epigenetically-caused) variations can be studied in detail when looking at
the level of either proteins or DNA (which is made of amino acids). Therefore, sequencing of plant, animal, and other forms of life
have been done to try to understand and control biology, specifically biological function. The field of functional genomics designs
technology tools that aid in diagnoses when biology malfunctions. About 40-60% of genes in a sequenced genome are related to
biological function. Under different conditions, proteins may express themselves in novel, transient ways. These gene expressions
are difficult to detect.
Trained professionals identify specific biomarkers, like JAK2, that have a high association with diseases. Knowing these in advance
can sometimes influence a person’s lifestyle choices, such as having children, diet, and medical decisions. Because bioinformatics is a
very new field, a genetic counselor should interpret test results to provide patients with guidance on two items. First, their level of
risk by percentage, and second, the level of confidence scientists have that a specific biomarker actually causes a disease. Scientists
determine this looking across species, through phylogenetics. But most importantly they learn about the genetic basis of human
disease by using bioinformatics tools to compare DNA of patients who share the same disease and creating cell lines. That is why
projects like the Personal Genome Project not only benefit the individual participant, but also contribute to advances in medicine
and personalized medicine. “Personalized medicine is an emerging practice of medicine that uses an individual's genetic profile to
guide decisions made in regard to the prevention, diagnosis, and treatment of disease” (NLM ‘s GHR glossary).
Having your genome sequenced provides an overview of your genetic background
as well as the state of your genes at a given time.
Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts
Genome: all hereditary genetic material of an organism
Chromosome: DNA, protein, and RNA found in cells
Gene: strands of 5’ to 3’ DNA (promoters, exons, introns)
(Humans have about 22,000 genes)
Allele: one of 2 or more variants of each gene (two of
which are inherited from parents)
Genotype: coded information
2 Types: Homozygote: same alleles – AA, aa
Heterozygote: different alleles – Aa
Phenotype: physical manifestation of a characteristic
Dominant Trait: expressed
Recessive Trait: not expressed
a) Autosomal Recessive: Two abnormal copies must
be present to get the disorder
b) X-linked Recessive: Females are carriers only
GENETICS (GENES/CHROMOSOMES)
A Short Overview of Biological Inheritance (Heredity)
Described Through Cell Biology CELL
GENOME
CHROMOSOME
GENE
DNA
AMINO ACIDS
Image Courtesy of Mayo Clinic:
http://www.mayoclinic.org/procedure/genetic-testing/multimedia/genetic-disorders/sls-20076216
If you have a genetic disorder or are a carrier,
will your children inherit it?
NOT NECESSARILY. SEE AN MD OR GENETIC COUNSELOR.
GENETICS (GENES/CHROMOSOMES)
Mitosis v. Meiosis
Some chromosome abnormalities are
not inherited. De novo variants appear
for the first time in an individual.
They can occur in recombination or
“crossing over” during mitosis or meiosis.
Image Credit: OpenStax College. "Laws of
Inheritance." Connexions. February 24, 2014.
http://cnx.org/content/m44479/1.3.
Mitosis occurs with somatic cells. It results in
two cells that are duplicates of the original cell.
In other words, one cell with 46 chromosomes
becomes two cells with 46 chromosomes each.
This kind of cell division occurs throughout the
body, except in the reproductive organs. This is
how most of the cells that make up our body
are made and replaced. These mutations are
not passed on to children.
Meiosis occurs with germ cells. It results in
cells with half the number of chromosomes (in
diploid humans, 23 instead of the normal 46).
These are the eggs and sperm. These mutations
can be passed on to children in their stem cells.
During gestation, the stem cells gain specificity
as somatic cells of various types and germ cells
to become a male or female child.
Source: http://www.genome.gov/11508982#6
Chiasma
During meiosis
chromosomal material
crosses over
Video: Cell Division and the Cell Cycle
http://www.youtube.com/watch?v=Q6ucKWIIFmg
BIOCHEMISTRY (PROTEINS)
A Short Overview of Molecular Biology
and Bioinformatics
Video: Central dogma of molecular biology (1958): replication, transcription and translation
Variations (mutations) can occur during these processes, sometimes causing diseases
that can be passed on to children.
http://www.youtube.com/watch?v=Q_WRFw8KQk4
http://www.youtube.com/watch?v=D3fOXt4MrOM
Video animation:
The central dogma of molecular biology
"DNA The Secret of Life” by PBS
After proteins are formed they fold into various shapes based on their chemical makeup.
Misfolding is a second cause of de novo variants. Misfolding sometimes causes disease, and is
passed on to children. A linear analysis of amino acid chains in a protein cannot anticipate
amino acids near each other when proteins fold so 3D modeling is used.
http://www.youtube.com/watch?v=Pjt1Q2ZZVjA
“Simulating How Proteins Self-Assemble, Or Fold” by Stanford University
Video: Protein folding
When cells go bad,
control decisions must be made
that regulate the micro-“society.”
Reform or Remove?
DNA ligase, an enzyme,
(shown left, in color)
repairs mistakes in DNA.
Some proteins, like p53,
(shown below)
enforce cell death (apoptosis).
P53 malfunction is one cause of
cancer, where cells with mutations
grow out of control.
The Life Cycle of
DNA
Sir John Gurdon:
Epigenetics Founder &
Nobel Laureate
"for the discovery that
mature cells can be
reprogrammed to become
pluripotent"
Turning back the clock on disease:
Mature, specialized cells can be reverted to their embryonic stem cell state.
University of Cambridge, 2012, the year Gurdon won the Nobel Prize
Xenopus
Protein-Protein Interaction
How proteins interact with one another
is key to understanding their function in the body.
Only 1%
of the human genome
codes for 20,000 our proteins.
Function is largely determined
on how proteins interact.
Epigenetics
“Epigenetic mechanisms are affected by
several factors and processes including
development in utero and in
childhood, environmental chemicals, drugs
and pharmaceuticals, aging, and diet. DNA
methylation is what occurs when methyl
groups, an epigenetic factor found in some
dietary sources, can tag DNA and activate
or repress genes. Histones are proteins
around which DNA can wind for
compaction and gene regulation. Histone
modification occurs when the binding of
epigenetic factors to histone “tails” alters
the extent to which DNA is wrapped
around histones and the availability of
genes in the DNA to be activated. All of
these factors and processes can have an
effect on people’s health and influence
their health possibly resulting in
cancer, autoimmune disease, mental
disorders, or diabetes among other Image and description credit:
National Institutes of Health
Comparative Genomics and Phylogeny
To locate new disease markers and learn how pathogens function,
it is helpful to examine ultra-conserved regions in cross-species protein & nucleic acid production,
because these are most often linked to important bodily functions, disease and health.
(See: 1) Kumar S, Sanderford M, Gray VE, Ye J, Liu Li. Evolutionary diagnosis method for variants
in personal exomes. Nature Methods (2012) p;9(9):855-6. doi:10.1038/nmeth.2147.

2) Liu L, Kumar S. (2013) Evolutionary Balancing is Critical for Correctly Forecasting Disease
Associated Amino Acid Variants. Molecular Biology and Evolution 30:1252-1257 (Epub 2013 March 5))
About 5%-10% of the human genome are regulatory motifs across species, that turn genes “on” and “off”
to control gene expression, in addition to the 1% used for coding proteins.
Visualization of a Phylogenic Tree Using MEGA 6
Newick notation:
((((Cucumis sativus,Ricinus communis), Solanum lycopersicum), Medicago truncatula)(Arabidopsis thaliana,Capsella rubella))
“Proteins are clustered on branches on the basis of the
similarity of their amino acid sequences. The phylogenetic
representation tends to cluster structurally (and sometimes
functionally) related proteins. Drugs targeting a specific
protein are more likely to be active against other proteins on
the same branch. Distinct phylogenetic branches are
highlighted with distinct colours (in the case of the malignant
brain tumour (MBT) family, where only a few MBT domains
are actually binding methyl-lysines, the red colour coding
indicates the branch where all known methyl-lysine-binding
domains are clustered). We assembled protein families by
looking for domains associated with 'writing', 'reading' and
'erasing' acetyl and methyl marks in the Human Protein
Reference Database, and by complementing the list with
data from the literature, as well as data from the Pfam
protein family database and the SMART (Simple Modular
Architecture Research Tool) database. The phylogeny
outlined in the trees is derived from multiple sequence
alignments of the domain after which the family was named
(full-length sequences were used for acetyltransferases as
the catalytic domain is not always clearly defined for this
family). If a domain is present multiple times in a protein, the
protein is shown multiple times in the corresponding tree,
followed by the sequential iteration of the domain in
parenthesis for example, L3MBTL(2) corresponds to the
second MBT domain of the protein L3MBTL. If multiple
variants with insertions or deletions were reported for a
gene, the variant number according to Swiss-Prot
nomenclature is indicated after a hyphen: for example,
TRIM33-2 in the tree of bromodomain-containing proteins
corresponds to the second Swiss-Prot variant of the TRIM33
(tripartite motif-containing protein 33) bromodomain. For
each tree, a seed alignment was derived from available
protein structures by aligning residues that were
superimposed in the three-dimensional space. Additional
sequences were appended by aligning them to the closest
seed sequence..”http://www.nature.com/nrd/journal/v11/n5/fig_tab/nrd3674_F2.html
Phylogenetic trees of epigenetic protein families.
Mega-genomics
and
Next Generation
Sequence
Analysis
Sequencing human DNA:
The Human Genome Project and the Personal Genome Project
First Human Genomes Sequenced:
1) Dr. J. Craig Venter
2) Dr. James D. Watson:
Molecular Biology Founder &
Nobel Laureate
3) Personal Genome Project
4) Hundred Person Wellness Project
5) UK Personal Genome ProjectCold Spring Harbor Laboratory, 2006
Genome-Wide Association Studies (GWAS)
compare one human genome to another to
look for similarities and differences that
might cause disease.
Current understanding of the human genome,
categorized by function of each gene product,
given both as number of genes
and as percentage of all genes.
Image description and credit: Mikael Häggström (Wikimedia Commons)
Our understanding of function within the human genome is incomplete.
More samples are needed for improved results.
The Cost Reduction for Sequencing Genomes
Greatly Outpaced Moore’s Law
State Direct-to-Consumer Testing Statutes and Regulations
Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts
Limitations of GINA
“The Genetic Information Nondiscrimination Act,
known as GINA, does not apply to three types of
insurance — life, disability and long-term care —
that are especially important to people who may
have serious inherited diseases … The American
Medical Association’s code of ethics states that
‘it may be necessary’ for doctors to maintain a
separate file for genetic test results so the
information is not sent to insurers.”
-- “Fearing Punishment for Bad Genes,”
The New York Times, April 7, 2014.
http://www.nytimes.com/2014/04/08/science/fearing-punishment-for-bad-genes.html?_r=1
Genetic Information Nondiscrimination Act (GINA) of 2008: http://www.genome.gov/24519851
Henrietta Lacks:
The Ethics of Cell Line Development and Research
Henrietta Lacks, 1945. Image courtesy of The Lacks Family. (Source: Wikipedia).
Do you own
your DNA?
Testing Companies
23andMe
454 Life Sciences
Advanced Healthcare, Inc
AIBioTech
Ancestry DNA
Atlas Sports Genetics
Athleticode
Biologis Personal Genomics
Service
Bioresolve
Counsyl
Complete Genomics
deCODE Genetics
deCODEme.com
DNA-CARDIOCHECK
DNA DTC
DNATraits
Eastern Biotech & Life Sciences
easyDNA
EnteroLab
Family Tree DNA
Future Genetics
Geenitesti
Genelex
GenePlanet
Genetic Health
Genetic Technologies
Genetic Testing Laboratories
Geneyouin
Genographic Project
Genotek
Gentle Labs
Graceful Earth
HealthCheckUSA
HelloGene / HelloGenome
Holistic Health
IDNA.com
i-gene
Illumina
Indian Biosciences
InoLife Technologies
Interleukin Genetics
JCVI
Knome
Lumigenix
Map My Gene
MapMyGenome
meragenome.com
MyGene23
Navigenics
Oxford Nanopore Technologies
Pacific Biosciences
Pathway Genomics
Pediatrix Medical Group
Perkin Elmer Genetics
Personal Genome Project
Personalis
PHENOM Biosciences
Positive Bioscience
Sequenom
SNPedia Test Country
Ubiome
Viaguard/Accu-metrics
vuGene
Xcode Life Sciences
As of March 10, 2014, 23andMe
had 650,000+ genotyped customers
Screenings
More than 420 Conditions and Traits are Screened for During Genetic Testing.
CONDITIONS
CANCER
LIVER
HEART
HEARING
SIGHT
DIABETES
PSYCHIATRIAC/PSYCHOLOGICAL
REPRODUCTIVE / STD (FERTILITY)
REGULATORY FUNCTIONS
(BREATHING, SLEEP, WEIGHT, RENAL)
ADDICTION (ALCOHOL, DRUG)
IMMUNE SYSTEM (HIV, AIDS)
MUSCULO-SKELETAL (MARFANS)
PHARMACOGENOMICS/DRUG EFFICACY
(CANCER, WARFARIN)
NEUROLOGICAL
(PARKINSON’S, ALZHEIMER’S, MS)
SKIN
ABILITIES & PHYSICAL TRAITS
INTELLIGENCE
ENDURANCE
EYE & HAIR COLOR
NCBI Resources
https://www.ncbi.nlm.nih.gov/variation
http://www.ncbi.nlm.nih.gov/guide/genetics-
medicine
http://www.ncbi.nlm.nih.gov/books/NBK1116
http://www.ncbi.nlm.nih.gov/medgen
http://www.ncbi.nlm.nih.gov/mesh
Other Resources
http://www.omim.org
http://www.orpha.net
http://www.genome.gov
http://www.dnapolicy.org
See
handout
for specific
tests
Asclepius
How to Submit Your DNA for Sequencing and Analysis with the Personal Genome Project
Basic eligibility:
1. US citizen age 21 or older
2. Additional details: http://www.personalgenomes.org/harvard/protocols
How it Works:
We will be using an existing volunteer’s genome for this presentation.
Steps:
1. Provide Open Consent (form)
2. Supply Medical History (form)
3. Donate DNA Samples (saliva, hair, blood, tissue) by self-collection or at a designated facility
4. Samples Sent to Lab (blood=dna, tissue=exome, saliva=microbiome); tissue samples may be
used to develop cell lines for research purposes
5. Harvard’s PGP Team Analyzes Data for Anomalies and Creates a Personalized Health
Prognosis Report
6. The PGP Team Publishes Your Information Online
(Your data is associated with a volunteer number, but your name can also be used if you
would like to do this)
7. Safety follow-up monitoring by email
8. Additional details: http://www.personalgenomes.org/harvard/howitworks
Volunteer
huA90CE6
John Lauerman
In His Own Words
Whole Genome Sequence (WGS) Analysis
http://youtu.be/YGIxMYiPLOU
Volunteer huA90CE6 = Case Study: John Lauerman (Harvard Analysis)
JAK2-V617F and APOE-C130R variations
Step 1
Create a C:data folder and download John Laurman’s genome from the PGP website:
https://my.pgp-hms.org/profile/huA90CE6. Examine the variant report on the same page.
Locating and Interpreting Errors: Cytogenetic Location
JAK2-V617F is located on the short arm of chromosome 9p (9pLOH).
Sources:
Kralovics R1, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, Tichelli A, Cazzola
M, Skoda RC.
A gain-of-function mutation of JAK2 in myeloproliferative disorders. N Engl JMed. 2005
Apr 28;352(17):1779-90.
There are 22 chromosomes and X or Y.
The first integer is the chromosome number.
The second integer is the letter p or q, where p is the
“short arm” and q is the “long arm.”
The position is usually designated by two digits
(representing a region and a band), which are
sometimes followed by a decimal point and one or
more additional digits (representing sub-bands within
a light or dark area)
http://ghr.nlm.nih.gov/handbook/howgeneswork/
genelocation
LIST OF COMMON ERRORS BY CHROMOSOME NUMBER: http://ghr.nlm.nih.gov/chromosomes
9pLOH
Janus kinase 2 –
Cytogenetic Location: 9p24
http://ghr.nlm.nih.gov/chromosome/9
Human Gene JAK2
Transcript (Including UTRs)
Position: chr9:4,985,245-
5,128,183 Size: 142,939
Total Exon Count: 25 Strand: +
Coding Region
Position: chr9:5,021,988-
5,126,791 Size: 104,804 Coding Exon
Count: 23
JAK2-V617F
Human Reference Genome - “Normal” JAK2 using UCSCGB
Right Click over JAK2
and choose
“Get DNA for JAK2.”
Then, in the popup
window, choose
“get DNA.” Using the
shift key, highlight all
the information.
“Save As” JAK2. .
Open the file with
notepad to see JAK2
in more detail.
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr9%3A4985245-5128183
We will examine a volunteer’s “Variant” JAK2 with
two free bioinformatics tools using Windows.
At the end of the talk there will be a list of additional non-Windows
compatible tools for other systems like Linux, MAC, and iPad.
PGA
BLAST National Center for Biotechnical Information (NCBI)
(Web-based)
Personal Genome Analyzer from Archivopedia
Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA & PyMOL)
Step 1
Download and install Python https://www.python.org/download/releases/2.7.5
Windows X86-64 MSI Installer (2.7.5) [1] (sig)
Step 2
Download and install the PyMOL extension for a free 3D molecule viewer
http://www.lfd.uci.edu/~gohlke/pythonlibs/#pymol (pymol-1.7.1.0.win-amd64-py2.7.exe)
Find application file C:Python27PyMOL
Find PyMOL application file in the list and create shortcut.
Drag shortcut to the desktop.
Double click icon on desktop to run PyMOL.
SETTING UP PYTHON (Win 7, 64-bit)
Note: Installing the extension may open
a C prompt window to compile.
Step 3
Download and install the wxPython extension:
http://downloads.sourceforge.net/
wxpython/wxPython3.0-win64-3.0.0.0-py27.exe
Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA)
Step 4
Use the Python-driven tool designed for this project to convert an isolated chromosome in
your whole genome sequence from TSV to FASTA and SQL formats in under 5 minutes.
Note: The following sources were used to create the tool:
Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup
Human reference genome (rCRS) -
http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta
Insert
• Browse your hard drive for the Volunteer’s Whole Genome Sequence
• File name: huA90CE6--GS000006909-ASM.tsv
Insert
• Enter a single chromosome number you wish to examine
• 1-22, X, or Y; or leave blank for whole genome. [Enter 9]
Insert
• Enter an exact location or leave at defaults if you wish to scan the
whole chromosome or whole genome. [Use Default]
Check mark “Generate FA” for FASTA
Check mark “Generate SQL” for SQL
Click the PROCESS BUTTON.
Go to C:data
for the converted files in
FASTA and SQL formats.
Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA)
UNDER DEVELOPMENT
After a single search of a whole
genome or chromosome,
use PGA to view the FASTA file in
the “View FASTA” window.
Or, view the exact location of
variants simply by clicking on the
“Variants” tab.
This image shows some variants in
John Lauerman’s Chr1 compared
to the Human Reference Genome.
Note: The following sources were used to create the tool:
Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup
Human reference genome (rCRS) -
http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta
Future plans include adding
reports with graphs and other
visualizations.
Volunteer huA90CE6 Case Study:
John Lauerman
(Looking Closer with BLAST)
Step 5
Use the generated FASTA file
to perform a BLASTn search.
In this case, John Lauerman’s Chr9 file
was used
(after using PGA, it is located in C:data
with a .fa extension).
Volunteer huA90CE6 Case Study:
John Lauerman
(Looking Closer with BLAST)
Free Tools for Other Platforms
CGATools
(MacOS or LINUX only)
Download Complete Genomics Analysis Tools software and User Guide documentation:
http://cgatools.sourceforge.net
CGA Tools 1.8.0 Software:
CGA Tools 1.8.0 User Guide:
http://cgatools.sourceforge.net/docs/1.8.0/cgatools-user-guide.pdf
Illumina’s MyGenome App
Requires iOS 6.1 or later. Compatible with iPad.
http://www.illumina.com/clinical/clinical_informatics/mygenome_app.ilmn
Complete Genomics’ Genome Voyager
http://www.completegenomics.com/analysis-tools/voyager
Complete Genomics’ List of Third Party Tools:
http://www.completegenomics.com/analysis-tools/third-party-tools
PyMOL for Linux and Mac:
http://www.pymolwiki.org/index.php/Linux_Install
http://www.pymolwiki.org/index.php/MAC_Install
Using a mySQL database, it is possible to import many whole human genome sequences from
the PGP project by following the example in Slide #36 using PGA.
2. Consider the needed space allocation.
● Each unzipped TSV file of an entire genome is about 1.3 MB
TO GET STARTED
1. Determine the minimum and ideal sample sizes (number of volunteer DNA sequences)
for significance in your study (usually 10,000).
The PGP aims for a collection of 100,000 sequenced genomes.
In silico human genome scientific studies can be conducted for the following applications:
● disease biomarker identification
● pharmacogenetics
3. Consider needed time for conversion and import into a mySQL database.
Create your own database
Analyzing Collections of Whole Human Genomes
Through Multiple Sequence Alignments and Analysis
Selected Bibliography
ALBERTS, B. (1983). Molecular biology of the cell. New York, Garland Pub.
CAREY, N. (2013). Epigenetics revolution: how modern biology is rewriting our
understanding of genetics, disease, and inheritance.
CHURCH, G. M., & REGIS, E. (2012). Regenesis: how synthetic biology will reinvent
nature and ourselves. New York, Basic Books.
SCHRÖDINGER, E. (2012). What is life?: the physical aspect of the living cell. Cambridge,
Univ. Press.
SKLOOT, R. (2010). The immortal life of Henrietta Lacks. New York, Crown Publishers.
VENTER, J. C. (2007). A life decoded: my genome, my life. New York, Viking.
WATSON, J. D. (1968). The double helix; a personal account of the discovery of the
structure of DNA. New York, Atheneum. [SIGNED FIRST EDITION]
WATSON, J. D. (2008). Molecular biology of the gene. San Francisco, Pearson/Benjamin
Cummings.
ZVELEBIL, M. J., & BAUM, J. O. (2008). Understanding bioinformatics. New York,
Garland Science.
Credits
Personal Genome Project (Harvard)
MITx: 7.00x: Introduction to Biology - The Secret of Life
(14 weeks) : Eric Lander (MIT, Harvard)
Bioinformatic Methods I | Coursera
(6 weeks): Nicholas Provart - (University of Toronto)
Bioinformatic Methods II | Coursera
(6 weeks): Nicholas Provart - (University of Toronto)
Illumina
Gattica (screenshot)
Genetics & Public Policy Center
Mayo Clinic
Stanford University
Mega 6
JMOL
NIH
UCSC Genome Database
PyMOL
CGA Tools
Complete Genomics
MyGenome App
NCBI – BLAST
PBS
MG – RAST
Nature
John Lauerman
Tracy Kovach
Mikael Häggström
Database of Genomic Variants
National Human Genome Research Institute
International Society of Genetic Genealogy
Personal Genome Analyzer:
Architect: S. Bohle, Programmers: D. Yount
Contact Information
Archivopedia.com

Contenu connexe

Tendances

Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshop
Chris Mungall
 
Hum. reprod. 2013-enciso-1707-15
Hum. reprod. 2013-enciso-1707-15Hum. reprod. 2013-enciso-1707-15
Hum. reprod. 2013-enciso-1707-15
t7260678
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and Diabetes
Chris Mungall
 
Human genome project 2007
Human genome project 2007Human genome project 2007
Human genome project 2007
Hesham Gaber
 
1 s2.0-s1472648313005798-main
1 s2.0-s1472648313005798-main1 s2.0-s1472648313005798-main
1 s2.0-s1472648313005798-main
鋒博 蔡
 
Presentation istanbul
Presentation istanbulPresentation istanbul
Presentation istanbul
pranayashakya
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
Eli Rosenthal
 

Tendances (20)

Genetic Research
Genetic ResearchGenetic Research
Genetic Research
 
Embryonic stem cells – Promises and Issues
Embryonic stem cells – Promises and IssuesEmbryonic stem cells – Promises and Issues
Embryonic stem cells – Promises and Issues
 
Uberon EBI industry workshop
Uberon EBI industry workshopUberon EBI industry workshop
Uberon EBI industry workshop
 
Hum. reprod. 2013-enciso-1707-15
Hum. reprod. 2013-enciso-1707-15Hum. reprod. 2013-enciso-1707-15
Hum. reprod. 2013-enciso-1707-15
 
GIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype DataGIGA2 Structuring Phenotype Data
GIGA2 Structuring Phenotype Data
 
Genome power point presentation (2)
Genome power point presentation (2)Genome power point presentation (2)
Genome power point presentation (2)
 
History of Cloning and Ethical Issues of Human Cloning
History of Cloning and Ethical Issues of Human CloningHistory of Cloning and Ethical Issues of Human Cloning
History of Cloning and Ethical Issues of Human Cloning
 
Mapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and DiabetesMapping Phenotype Ontologies for Obesity and Diabetes
Mapping Phenotype Ontologies for Obesity and Diabetes
 
research paper
research paperresearch paper
research paper
 
Genetic variation
Genetic variationGenetic variation
Genetic variation
 
Introduction to the concept of genomics april quiapo ho maed2_s
Introduction to the concept of genomics april quiapo ho maed2_sIntroduction to the concept of genomics april quiapo ho maed2_s
Introduction to the concept of genomics april quiapo ho maed2_s
 
Human genome project 2007
Human genome project 2007Human genome project 2007
Human genome project 2007
 
Project powerpoint
Project powerpointProject powerpoint
Project powerpoint
 
Extinct Animal Cloning
Extinct Animal CloningExtinct Animal Cloning
Extinct Animal Cloning
 
Organ cloning
Organ cloningOrgan cloning
Organ cloning
 
1 s2.0-s1472648313005798-main
1 s2.0-s1472648313005798-main1 s2.0-s1472648313005798-main
1 s2.0-s1472648313005798-main
 
An Introduction to Genomics
An Introduction to GenomicsAn Introduction to Genomics
An Introduction to Genomics
 
다낭성증후군이 환자 자궁에 미치는 영향에 관한 연구
다낭성증후군이 환자 자궁에 미치는 영향에 관한 연구 다낭성증후군이 환자 자궁에 미치는 영향에 관한 연구
다낭성증후군이 환자 자궁에 미치는 영향에 관한 연구
 
Presentation istanbul
Presentation istanbulPresentation istanbul
Presentation istanbul
 
Dna profiling presentation x2
Dna profiling presentation x2Dna profiling presentation x2
Dna profiling presentation x2
 

Similaire à Exploring your personal genome with free, online bioinformatics tools

Adler migge genetics research march 13-template
Adler migge genetics research march 13-templateAdler migge genetics research march 13-template
Adler migge genetics research march 13-template
MorganScience
 
Applications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptxApplications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptx
GayatriHande1
 
Applications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptxApplications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptx
GayatriHande1
 

Similaire à Exploring your personal genome with free, online bioinformatics tools (18)

Genetics research
Genetics researchGenetics research
Genetics research
 
Genetics research
Genetics researchGenetics research
Genetics research
 
Genetics research
Genetics researchGenetics research
Genetics research
 
Genetics research
Genetics researchGenetics research
Genetics research
 
Genetic engineering[1]
Genetic engineering[1]Genetic engineering[1]
Genetic engineering[1]
 
Adler migge genetics research march 13-template
Adler migge genetics research march 13-templateAdler migge genetics research march 13-template
Adler migge genetics research march 13-template
 
Genetics research
Genetics researchGenetics research
Genetics research
 
Dr Nitika Sobti epigenetics ADBHUT MATRUTVA
Dr Nitika Sobti   epigenetics ADBHUT MATRUTVADr Nitika Sobti   epigenetics ADBHUT MATRUTVA
Dr Nitika Sobti epigenetics ADBHUT MATRUTVA
 
Applications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptxApplications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptx
 
Applications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptxApplications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptx
 
Applications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptxApplications of Cell Biology & Genetics.pptx
Applications of Cell Biology & Genetics.pptx
 
Genetic Tests for Health Purposes
Genetic Tests for Health PurposesGenetic Tests for Health Purposes
Genetic Tests for Health Purposes
 
Dna is not destiny
Dna is not destinyDna is not destiny
Dna is not destiny
 
Organ cloning project revised
 Organ cloning project revised Organ cloning project revised
Organ cloning project revised
 
Epigenetic Modifications
Epigenetic ModificationsEpigenetic Modifications
Epigenetic Modifications
 
Organ cloning
Organ cloningOrgan cloning
Organ cloning
 
Baby designing
Baby designingBaby designing
Baby designing
 
Gene therapy
Gene therapyGene therapy
Gene therapy
 

Plus de 01archivist

The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of FlightThe NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
01archivist
 
Biological R/evolutions
Biological R/evolutionsBiological R/evolutions
Biological R/evolutions
01archivist
 
SciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and DiscussionSciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and Discussion
01archivist
 

Plus de 01archivist (17)

Taxonomy and the Conservation of Endangered Species
Taxonomy and the Conservation of Endangered SpeciesTaxonomy and the Conservation of Endangered Species
Taxonomy and the Conservation of Endangered Species
 
Communicating Science Accurately Through Entertainment
Communicating Science Accurately Through EntertainmentCommunicating Science Accurately Through Entertainment
Communicating Science Accurately Through Entertainment
 
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of FlightThe NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
The NACA at Lewis Laboratory, a Legacy of Ohioans Solving the Problem of Flight
 
Reproducibility
ReproducibilityReproducibility
Reproducibility
 
Creating a 21st Century Science Library: How and Why
Creating a 21st Century Science Library: How and WhyCreating a 21st Century Science Library: How and Why
Creating a 21st Century Science Library: How and Why
 
Second Digital Generation
Second Digital GenerationSecond Digital Generation
Second Digital Generation
 
Open data tools -turning data into actionable intelligence
Open data tools -turning data into actionable intelligenceOpen data tools -turning data into actionable intelligence
Open data tools -turning data into actionable intelligence
 
Virtual worlds as portals for information discovery
Virtual worlds as portals for information discoveryVirtual worlds as portals for information discovery
Virtual worlds as portals for information discovery
 
Penn State's Educational Robotics Projects and Exhibits
Penn State's Educational Robotics Projects and ExhibitsPenn State's Educational Robotics Projects and Exhibits
Penn State's Educational Robotics Projects and Exhibits
 
Basic AIML Class
Basic AIML ClassBasic AIML Class
Basic AIML Class
 
Machinima Best Practices
Machinima Best PracticesMachinima Best Practices
Machinima Best Practices
 
Barbara McClintock
Barbara McClintockBarbara McClintock
Barbara McClintock
 
Biological R/evolutions
Biological R/evolutionsBiological R/evolutions
Biological R/evolutions
 
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
DEFENSE TECHNICAL INFORMATION CENTER (DTIC)
 
SciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and DiscussionSciLands Best Practices in Education Panel and Discussion
SciLands Best Practices in Education Panel and Discussion
 
Archives In Second Life
Archives In Second LifeArchives In Second Life
Archives In Second Life
 
Archivopedia
ArchivopediaArchivopedia
Archivopedia
 

Dernier

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 

Dernier (20)

Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 

Exploring your personal genome with free, online bioinformatics tools

  • 1. Exploring Your Personal Genome with Free, Online Bioinformatics Tools by Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP .org 2014 Tech Conference
  • 2. What is the future of genomic sciences and bioinformatics? Ethical considerations of newborn screening: privacy, inaccuracy, discrimination, eugenics Video: Gattica (1997): http://www.youtube.com/watch?v=1Q67bMYOm7E
  • 3. Reduced cost: The $1,000 genome “Illumina’s DNA Supercomputer Ushers in the $1,000 Human Genome” (January 14, 2014) http://www.businessweek.com/articles/2014-01-14/illuminas-dna-supercomputer-ushers-in-the-1-000-human-genome + Genome sequencing at birth: “Baby DNA Analysis Ushers in Brave New World of Treatment” (January 16, 2014) http://www.bloomberg.com/news/2014-01-16/baby-dna-analysis-ushers-in-brave-new-world-of-treatment-health.html = Big industry “Illumina and a Billionaire Want to Jump-Start Genomics Upstarts” (February 17, 2014) http://www.businessweek.com/articles/2014-02-17/illimuna-and-billionaire-yuri-milner-to-aid-genomics-startups The future of genomic sciences and bioinformatics is NOW.
  • 4. Presentation Overview: Predictive Pathology Hopefully you will learn a great deal today about the biological basis of disease. Specifically, we will discuss the following pathways in which disease can occur: • At conception, chromosomes from both parents combine to pass on genetic material to a child. Sometimes when chromosomes combine there are problems that occur in this crossing over process called chiasma, and these variations are not inherited. • Chromosomal abnormalities like an addition, deletion, translocation, inversion, or insertion, are inherited. A common example of a structural variation would be Down Syndrome where there is an additional copy of chromosome 21. • Also at conception, because chromosomes contain DNA, the specific traits (called phenotypes) and the genetic code (called genotypes) are also transferred. Genotypes are always present, while phenotypes may be expressed (dominant) or hidden (recessive) in an individual. Recessive traits can be passed on through generations expressing themselves down the family line, and dominant traits can skip generations. A common example of an autosomal recessive heritable disease is sickle cell anemia. • During childhood and adulthood, factors like the environment (such as exposure to chemicals), diet, exercise, aging, et cetera can also damage genes, mutating them, and this may lead to disease. The branch of study examining context dependent, non-inherited factors is called epigenetics. An example of this is Protein misfolding. • Inherited and de novo (chiasmic, protein misfolding, and epigenetically-caused) variations can be studied in detail when looking at the level of either proteins or DNA (which is made of amino acids). Therefore, sequencing of plant, animal, and other forms of life have been done to try to understand and control biology, specifically biological function. The field of functional genomics designs technology tools that aid in diagnoses when biology malfunctions. About 40-60% of genes in a sequenced genome are related to biological function. Under different conditions, proteins may express themselves in novel, transient ways. These gene expressions are difficult to detect. Trained professionals identify specific biomarkers, like JAK2, that have a high association with diseases. Knowing these in advance can sometimes influence a person’s lifestyle choices, such as having children, diet, and medical decisions. Because bioinformatics is a very new field, a genetic counselor should interpret test results to provide patients with guidance on two items. First, their level of risk by percentage, and second, the level of confidence scientists have that a specific biomarker actually causes a disease. Scientists determine this looking across species, through phylogenetics. But most importantly they learn about the genetic basis of human disease by using bioinformatics tools to compare DNA of patients who share the same disease and creating cell lines. That is why projects like the Personal Genome Project not only benefit the individual participant, but also contribute to advances in medicine and personalized medicine. “Personalized medicine is an emerging practice of medicine that uses an individual's genetic profile to guide decisions made in regard to the prevention, diagnosis, and treatment of disease” (NLM ‘s GHR glossary). Having your genome sequenced provides an overview of your genetic background as well as the state of your genes at a given time.
  • 5. Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts
  • 6. Genome: all hereditary genetic material of an organism Chromosome: DNA, protein, and RNA found in cells Gene: strands of 5’ to 3’ DNA (promoters, exons, introns) (Humans have about 22,000 genes) Allele: one of 2 or more variants of each gene (two of which are inherited from parents) Genotype: coded information 2 Types: Homozygote: same alleles – AA, aa Heterozygote: different alleles – Aa Phenotype: physical manifestation of a characteristic Dominant Trait: expressed Recessive Trait: not expressed a) Autosomal Recessive: Two abnormal copies must be present to get the disorder b) X-linked Recessive: Females are carriers only GENETICS (GENES/CHROMOSOMES) A Short Overview of Biological Inheritance (Heredity) Described Through Cell Biology CELL GENOME CHROMOSOME GENE DNA AMINO ACIDS Image Courtesy of Mayo Clinic: http://www.mayoclinic.org/procedure/genetic-testing/multimedia/genetic-disorders/sls-20076216 If you have a genetic disorder or are a carrier, will your children inherit it? NOT NECESSARILY. SEE AN MD OR GENETIC COUNSELOR.
  • 7. GENETICS (GENES/CHROMOSOMES) Mitosis v. Meiosis Some chromosome abnormalities are not inherited. De novo variants appear for the first time in an individual. They can occur in recombination or “crossing over” during mitosis or meiosis. Image Credit: OpenStax College. "Laws of Inheritance." Connexions. February 24, 2014. http://cnx.org/content/m44479/1.3. Mitosis occurs with somatic cells. It results in two cells that are duplicates of the original cell. In other words, one cell with 46 chromosomes becomes two cells with 46 chromosomes each. This kind of cell division occurs throughout the body, except in the reproductive organs. This is how most of the cells that make up our body are made and replaced. These mutations are not passed on to children. Meiosis occurs with germ cells. It results in cells with half the number of chromosomes (in diploid humans, 23 instead of the normal 46). These are the eggs and sperm. These mutations can be passed on to children in their stem cells. During gestation, the stem cells gain specificity as somatic cells of various types and germ cells to become a male or female child. Source: http://www.genome.gov/11508982#6 Chiasma During meiosis chromosomal material crosses over
  • 8. Video: Cell Division and the Cell Cycle http://www.youtube.com/watch?v=Q6ucKWIIFmg
  • 9. BIOCHEMISTRY (PROTEINS) A Short Overview of Molecular Biology and Bioinformatics
  • 10.
  • 11. Video: Central dogma of molecular biology (1958): replication, transcription and translation Variations (mutations) can occur during these processes, sometimes causing diseases that can be passed on to children. http://www.youtube.com/watch?v=Q_WRFw8KQk4
  • 12. http://www.youtube.com/watch?v=D3fOXt4MrOM Video animation: The central dogma of molecular biology "DNA The Secret of Life” by PBS
  • 13. After proteins are formed they fold into various shapes based on their chemical makeup. Misfolding is a second cause of de novo variants. Misfolding sometimes causes disease, and is passed on to children. A linear analysis of amino acid chains in a protein cannot anticipate amino acids near each other when proteins fold so 3D modeling is used. http://www.youtube.com/watch?v=Pjt1Q2ZZVjA “Simulating How Proteins Self-Assemble, Or Fold” by Stanford University Video: Protein folding
  • 14. When cells go bad, control decisions must be made that regulate the micro-“society.” Reform or Remove? DNA ligase, an enzyme, (shown left, in color) repairs mistakes in DNA. Some proteins, like p53, (shown below) enforce cell death (apoptosis). P53 malfunction is one cause of cancer, where cells with mutations grow out of control. The Life Cycle of DNA
  • 15. Sir John Gurdon: Epigenetics Founder & Nobel Laureate "for the discovery that mature cells can be reprogrammed to become pluripotent" Turning back the clock on disease: Mature, specialized cells can be reverted to their embryonic stem cell state. University of Cambridge, 2012, the year Gurdon won the Nobel Prize Xenopus
  • 16. Protein-Protein Interaction How proteins interact with one another is key to understanding their function in the body. Only 1% of the human genome codes for 20,000 our proteins. Function is largely determined on how proteins interact. Epigenetics “Epigenetic mechanisms are affected by several factors and processes including development in utero and in childhood, environmental chemicals, drugs and pharmaceuticals, aging, and diet. DNA methylation is what occurs when methyl groups, an epigenetic factor found in some dietary sources, can tag DNA and activate or repress genes. Histones are proteins around which DNA can wind for compaction and gene regulation. Histone modification occurs when the binding of epigenetic factors to histone “tails” alters the extent to which DNA is wrapped around histones and the availability of genes in the DNA to be activated. All of these factors and processes can have an effect on people’s health and influence their health possibly resulting in cancer, autoimmune disease, mental disorders, or diabetes among other Image and description credit: National Institutes of Health
  • 17. Comparative Genomics and Phylogeny To locate new disease markers and learn how pathogens function, it is helpful to examine ultra-conserved regions in cross-species protein & nucleic acid production, because these are most often linked to important bodily functions, disease and health. (See: 1) Kumar S, Sanderford M, Gray VE, Ye J, Liu Li. Evolutionary diagnosis method for variants in personal exomes. Nature Methods (2012) p;9(9):855-6. doi:10.1038/nmeth.2147. 2) Liu L, Kumar S. (2013) Evolutionary Balancing is Critical for Correctly Forecasting Disease Associated Amino Acid Variants. Molecular Biology and Evolution 30:1252-1257 (Epub 2013 March 5)) About 5%-10% of the human genome are regulatory motifs across species, that turn genes “on” and “off” to control gene expression, in addition to the 1% used for coding proteins.
  • 18. Visualization of a Phylogenic Tree Using MEGA 6 Newick notation: ((((Cucumis sativus,Ricinus communis), Solanum lycopersicum), Medicago truncatula)(Arabidopsis thaliana,Capsella rubella))
  • 19. “Proteins are clustered on branches on the basis of the similarity of their amino acid sequences. The phylogenetic representation tends to cluster structurally (and sometimes functionally) related proteins. Drugs targeting a specific protein are more likely to be active against other proteins on the same branch. Distinct phylogenetic branches are highlighted with distinct colours (in the case of the malignant brain tumour (MBT) family, where only a few MBT domains are actually binding methyl-lysines, the red colour coding indicates the branch where all known methyl-lysine-binding domains are clustered). We assembled protein families by looking for domains associated with 'writing', 'reading' and 'erasing' acetyl and methyl marks in the Human Protein Reference Database, and by complementing the list with data from the literature, as well as data from the Pfam protein family database and the SMART (Simple Modular Architecture Research Tool) database. The phylogeny outlined in the trees is derived from multiple sequence alignments of the domain after which the family was named (full-length sequences were used for acetyltransferases as the catalytic domain is not always clearly defined for this family). If a domain is present multiple times in a protein, the protein is shown multiple times in the corresponding tree, followed by the sequential iteration of the domain in parenthesis for example, L3MBTL(2) corresponds to the second MBT domain of the protein L3MBTL. If multiple variants with insertions or deletions were reported for a gene, the variant number according to Swiss-Prot nomenclature is indicated after a hyphen: for example, TRIM33-2 in the tree of bromodomain-containing proteins corresponds to the second Swiss-Prot variant of the TRIM33 (tripartite motif-containing protein 33) bromodomain. For each tree, a seed alignment was derived from available protein structures by aligning residues that were superimposed in the three-dimensional space. Additional sequences were appended by aligning them to the closest seed sequence..”http://www.nature.com/nrd/journal/v11/n5/fig_tab/nrd3674_F2.html Phylogenetic trees of epigenetic protein families.
  • 21. Sequencing human DNA: The Human Genome Project and the Personal Genome Project First Human Genomes Sequenced: 1) Dr. J. Craig Venter 2) Dr. James D. Watson: Molecular Biology Founder & Nobel Laureate 3) Personal Genome Project 4) Hundred Person Wellness Project 5) UK Personal Genome ProjectCold Spring Harbor Laboratory, 2006 Genome-Wide Association Studies (GWAS) compare one human genome to another to look for similarities and differences that might cause disease.
  • 22. Current understanding of the human genome, categorized by function of each gene product, given both as number of genes and as percentage of all genes. Image description and credit: Mikael Häggström (Wikimedia Commons) Our understanding of function within the human genome is incomplete. More samples are needed for improved results.
  • 23. The Cost Reduction for Sequencing Genomes Greatly Outpaced Moore’s Law
  • 24. State Direct-to-Consumer Testing Statutes and Regulations Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts
  • 25. Limitations of GINA “The Genetic Information Nondiscrimination Act, known as GINA, does not apply to three types of insurance — life, disability and long-term care — that are especially important to people who may have serious inherited diseases … The American Medical Association’s code of ethics states that ‘it may be necessary’ for doctors to maintain a separate file for genetic test results so the information is not sent to insurers.” -- “Fearing Punishment for Bad Genes,” The New York Times, April 7, 2014. http://www.nytimes.com/2014/04/08/science/fearing-punishment-for-bad-genes.html?_r=1 Genetic Information Nondiscrimination Act (GINA) of 2008: http://www.genome.gov/24519851
  • 26. Henrietta Lacks: The Ethics of Cell Line Development and Research Henrietta Lacks, 1945. Image courtesy of The Lacks Family. (Source: Wikipedia). Do you own your DNA?
  • 27. Testing Companies 23andMe 454 Life Sciences Advanced Healthcare, Inc AIBioTech Ancestry DNA Atlas Sports Genetics Athleticode Biologis Personal Genomics Service Bioresolve Counsyl Complete Genomics deCODE Genetics deCODEme.com DNA-CARDIOCHECK DNA DTC DNATraits Eastern Biotech & Life Sciences easyDNA EnteroLab Family Tree DNA Future Genetics Geenitesti Genelex GenePlanet Genetic Health Genetic Technologies Genetic Testing Laboratories Geneyouin Genographic Project Genotek Gentle Labs Graceful Earth HealthCheckUSA HelloGene / HelloGenome Holistic Health IDNA.com i-gene Illumina Indian Biosciences InoLife Technologies Interleukin Genetics JCVI Knome Lumigenix Map My Gene MapMyGenome meragenome.com MyGene23 Navigenics Oxford Nanopore Technologies Pacific Biosciences Pathway Genomics Pediatrix Medical Group Perkin Elmer Genetics Personal Genome Project Personalis PHENOM Biosciences Positive Bioscience Sequenom SNPedia Test Country Ubiome Viaguard/Accu-metrics vuGene Xcode Life Sciences As of March 10, 2014, 23andMe had 650,000+ genotyped customers
  • 28. Screenings More than 420 Conditions and Traits are Screened for During Genetic Testing. CONDITIONS CANCER LIVER HEART HEARING SIGHT DIABETES PSYCHIATRIAC/PSYCHOLOGICAL REPRODUCTIVE / STD (FERTILITY) REGULATORY FUNCTIONS (BREATHING, SLEEP, WEIGHT, RENAL) ADDICTION (ALCOHOL, DRUG) IMMUNE SYSTEM (HIV, AIDS) MUSCULO-SKELETAL (MARFANS) PHARMACOGENOMICS/DRUG EFFICACY (CANCER, WARFARIN) NEUROLOGICAL (PARKINSON’S, ALZHEIMER’S, MS) SKIN ABILITIES & PHYSICAL TRAITS INTELLIGENCE ENDURANCE EYE & HAIR COLOR NCBI Resources https://www.ncbi.nlm.nih.gov/variation http://www.ncbi.nlm.nih.gov/guide/genetics- medicine http://www.ncbi.nlm.nih.gov/books/NBK1116 http://www.ncbi.nlm.nih.gov/medgen http://www.ncbi.nlm.nih.gov/mesh Other Resources http://www.omim.org http://www.orpha.net http://www.genome.gov http://www.dnapolicy.org See handout for specific tests Asclepius
  • 29. How to Submit Your DNA for Sequencing and Analysis with the Personal Genome Project Basic eligibility: 1. US citizen age 21 or older 2. Additional details: http://www.personalgenomes.org/harvard/protocols How it Works: We will be using an existing volunteer’s genome for this presentation. Steps: 1. Provide Open Consent (form) 2. Supply Medical History (form) 3. Donate DNA Samples (saliva, hair, blood, tissue) by self-collection or at a designated facility 4. Samples Sent to Lab (blood=dna, tissue=exome, saliva=microbiome); tissue samples may be used to develop cell lines for research purposes 5. Harvard’s PGP Team Analyzes Data for Anomalies and Creates a Personalized Health Prognosis Report 6. The PGP Team Publishes Your Information Online (Your data is associated with a volunteer number, but your name can also be used if you would like to do this) 7. Safety follow-up monitoring by email 8. Additional details: http://www.personalgenomes.org/harvard/howitworks
  • 30. Volunteer huA90CE6 John Lauerman In His Own Words Whole Genome Sequence (WGS) Analysis http://youtu.be/YGIxMYiPLOU
  • 31. Volunteer huA90CE6 = Case Study: John Lauerman (Harvard Analysis) JAK2-V617F and APOE-C130R variations Step 1 Create a C:data folder and download John Laurman’s genome from the PGP website: https://my.pgp-hms.org/profile/huA90CE6. Examine the variant report on the same page.
  • 32. Locating and Interpreting Errors: Cytogenetic Location JAK2-V617F is located on the short arm of chromosome 9p (9pLOH). Sources: Kralovics R1, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, Tichelli A, Cazzola M, Skoda RC. A gain-of-function mutation of JAK2 in myeloproliferative disorders. N Engl JMed. 2005 Apr 28;352(17):1779-90. There are 22 chromosomes and X or Y. The first integer is the chromosome number. The second integer is the letter p or q, where p is the “short arm” and q is the “long arm.” The position is usually designated by two digits (representing a region and a band), which are sometimes followed by a decimal point and one or more additional digits (representing sub-bands within a light or dark area) http://ghr.nlm.nih.gov/handbook/howgeneswork/ genelocation LIST OF COMMON ERRORS BY CHROMOSOME NUMBER: http://ghr.nlm.nih.gov/chromosomes 9pLOH Janus kinase 2 – Cytogenetic Location: 9p24 http://ghr.nlm.nih.gov/chromosome/9 Human Gene JAK2 Transcript (Including UTRs) Position: chr9:4,985,245- 5,128,183 Size: 142,939 Total Exon Count: 25 Strand: + Coding Region Position: chr9:5,021,988- 5,126,791 Size: 104,804 Coding Exon Count: 23
  • 33. JAK2-V617F Human Reference Genome - “Normal” JAK2 using UCSCGB Right Click over JAK2 and choose “Get DNA for JAK2.” Then, in the popup window, choose “get DNA.” Using the shift key, highlight all the information. “Save As” JAK2. . Open the file with notepad to see JAK2 in more detail. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr9%3A4985245-5128183
  • 34. We will examine a volunteer’s “Variant” JAK2 with two free bioinformatics tools using Windows. At the end of the talk there will be a list of additional non-Windows compatible tools for other systems like Linux, MAC, and iPad. PGA BLAST National Center for Biotechnical Information (NCBI) (Web-based) Personal Genome Analyzer from Archivopedia
  • 35. Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA & PyMOL) Step 1 Download and install Python https://www.python.org/download/releases/2.7.5 Windows X86-64 MSI Installer (2.7.5) [1] (sig) Step 2 Download and install the PyMOL extension for a free 3D molecule viewer http://www.lfd.uci.edu/~gohlke/pythonlibs/#pymol (pymol-1.7.1.0.win-amd64-py2.7.exe) Find application file C:Python27PyMOL Find PyMOL application file in the list and create shortcut. Drag shortcut to the desktop. Double click icon on desktop to run PyMOL. SETTING UP PYTHON (Win 7, 64-bit) Note: Installing the extension may open a C prompt window to compile. Step 3 Download and install the wxPython extension: http://downloads.sourceforge.net/ wxpython/wxPython3.0-win64-3.0.0.0-py27.exe
  • 36. Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA) Step 4 Use the Python-driven tool designed for this project to convert an isolated chromosome in your whole genome sequence from TSV to FASTA and SQL formats in under 5 minutes. Note: The following sources were used to create the tool: Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup Human reference genome (rCRS) - http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta Insert • Browse your hard drive for the Volunteer’s Whole Genome Sequence • File name: huA90CE6--GS000006909-ASM.tsv Insert • Enter a single chromosome number you wish to examine • 1-22, X, or Y; or leave blank for whole genome. [Enter 9] Insert • Enter an exact location or leave at defaults if you wish to scan the whole chromosome or whole genome. [Use Default] Check mark “Generate FA” for FASTA Check mark “Generate SQL” for SQL Click the PROCESS BUTTON. Go to C:data for the converted files in FASTA and SQL formats.
  • 37. Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA) UNDER DEVELOPMENT After a single search of a whole genome or chromosome, use PGA to view the FASTA file in the “View FASTA” window. Or, view the exact location of variants simply by clicking on the “Variants” tab. This image shows some variants in John Lauerman’s Chr1 compared to the Human Reference Genome. Note: The following sources were used to create the tool: Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup Human reference genome (rCRS) - http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta Future plans include adding reports with graphs and other visualizations.
  • 38. Volunteer huA90CE6 Case Study: John Lauerman (Looking Closer with BLAST) Step 5 Use the generated FASTA file to perform a BLASTn search. In this case, John Lauerman’s Chr9 file was used (after using PGA, it is located in C:data with a .fa extension).
  • 39. Volunteer huA90CE6 Case Study: John Lauerman (Looking Closer with BLAST)
  • 40. Free Tools for Other Platforms CGATools (MacOS or LINUX only) Download Complete Genomics Analysis Tools software and User Guide documentation: http://cgatools.sourceforge.net CGA Tools 1.8.0 Software: CGA Tools 1.8.0 User Guide: http://cgatools.sourceforge.net/docs/1.8.0/cgatools-user-guide.pdf Illumina’s MyGenome App Requires iOS 6.1 or later. Compatible with iPad. http://www.illumina.com/clinical/clinical_informatics/mygenome_app.ilmn Complete Genomics’ Genome Voyager http://www.completegenomics.com/analysis-tools/voyager Complete Genomics’ List of Third Party Tools: http://www.completegenomics.com/analysis-tools/third-party-tools PyMOL for Linux and Mac: http://www.pymolwiki.org/index.php/Linux_Install http://www.pymolwiki.org/index.php/MAC_Install
  • 41. Using a mySQL database, it is possible to import many whole human genome sequences from the PGP project by following the example in Slide #36 using PGA. 2. Consider the needed space allocation. ● Each unzipped TSV file of an entire genome is about 1.3 MB TO GET STARTED 1. Determine the minimum and ideal sample sizes (number of volunteer DNA sequences) for significance in your study (usually 10,000). The PGP aims for a collection of 100,000 sequenced genomes. In silico human genome scientific studies can be conducted for the following applications: ● disease biomarker identification ● pharmacogenetics 3. Consider needed time for conversion and import into a mySQL database. Create your own database Analyzing Collections of Whole Human Genomes Through Multiple Sequence Alignments and Analysis
  • 42. Selected Bibliography ALBERTS, B. (1983). Molecular biology of the cell. New York, Garland Pub. CAREY, N. (2013). Epigenetics revolution: how modern biology is rewriting our understanding of genetics, disease, and inheritance. CHURCH, G. M., & REGIS, E. (2012). Regenesis: how synthetic biology will reinvent nature and ourselves. New York, Basic Books. SCHRÖDINGER, E. (2012). What is life?: the physical aspect of the living cell. Cambridge, Univ. Press. SKLOOT, R. (2010). The immortal life of Henrietta Lacks. New York, Crown Publishers. VENTER, J. C. (2007). A life decoded: my genome, my life. New York, Viking. WATSON, J. D. (1968). The double helix; a personal account of the discovery of the structure of DNA. New York, Atheneum. [SIGNED FIRST EDITION] WATSON, J. D. (2008). Molecular biology of the gene. San Francisco, Pearson/Benjamin Cummings. ZVELEBIL, M. J., & BAUM, J. O. (2008). Understanding bioinformatics. New York, Garland Science.
  • 43. Credits Personal Genome Project (Harvard) MITx: 7.00x: Introduction to Biology - The Secret of Life (14 weeks) : Eric Lander (MIT, Harvard) Bioinformatic Methods I | Coursera (6 weeks): Nicholas Provart - (University of Toronto) Bioinformatic Methods II | Coursera (6 weeks): Nicholas Provart - (University of Toronto) Illumina Gattica (screenshot) Genetics & Public Policy Center Mayo Clinic Stanford University Mega 6 JMOL NIH UCSC Genome Database PyMOL CGA Tools Complete Genomics MyGenome App NCBI – BLAST PBS MG – RAST Nature John Lauerman Tracy Kovach Mikael Häggström Database of Genomic Variants National Human Genome Research Institute International Society of Genetic Genealogy Personal Genome Analyzer: Architect: S. Bohle, Programmers: D. Yount