This document is a scientific paper published in Science in 2001 that describes the sequencing and initial analysis of the human genome. It has 276 authors from 14 institutions. The paper reports that an international public consortium has produced a working draft of the human genome sequence, containing over 90% of the euchromatic DNA. Analyzing this draft sequence provides new insights into human biology and evolution.
Initial sequencing and analysis of the human genome by international consortium
1. Curso E-investigación bibliográfica
Ciencias Biológicas y Biomédicas
Sesión
Martes 22, Jueves 24
febrero de 2011
Layla Michán
Departamento de Biología Evolutiva,
Facultad de Ciencias, UNAM
3. e-science/ cyberinfraestructure
• cyberinfraestructure (USA) • e-science (europe)
• United States National Science • United Kingdom's Office
Foundation (NSF) blue-ribbon of Science and
committee in 2003 Technology in 1999
• Describes the new research
• Will refer to the large
environments that support
advanced data acquisition, data scale science that will
storage, data management, data increasingly be carried
integration, data mining, data out through distributed
visualization and other global collaborations
computing and information enabled by the Internet
processing services over the
Internet
4. Ciberinfraestructura
•Entorno tecnológico-social que permite crear, difundir
y preservar los datos, información y conocimientos
mediante la adquisición, almacenamiento, gestión,
integración, informática, minería, visualización y otros
servicios a través de Internet (NSF 2003, 2007).
•Incluye un conjunto interoperable de diversos
elementos:
–1) Infraestructura, los sistemas computacionales (hardware,
software y redes), servicios, instrumentos y herramientas.
–2) Colecciones de datos.
–3) Grupos virtuales de investigación (colaboratorios y
observatorios).
5. E-ciencia (e-science)
• Resulta del uso y aplicación de la
Ciberinfraestructura en la práctica cientifica,
• Se caracteriza por la inter y multidisciplinariedad.
• Colaboración, la participación de un gran número
de investigadores (en algunos casos cientos)
localizados en diversas regiones y con diferentes
especialidades que se forman grupos trabajo (Hey
y Trefethen, 2005; Barbera et al.,2009).
6. E-ciencia
Uno de los primeros proyectos de e-ciencia fue el de el genoma
humano, se publicó en el 2001 en dos artículos con un día de
diferencia en las revistas Nature y Science.
Nature:Initial sequencing and analysis of the human genome
79 Autores
48 Instituciones
181 referencias
Todos los autores provenientes de departamentos de Ciencias
Genómicas (o genética) exceptuando los siguientes:
Department of Cellular and Structural Biology
Department of Molecular Genetics
Department of Molecular Biology
Science: The Sequence of the Human Genome
276 Autores
14 Instituciones
452 referencias
Todos los autores provenientes de departamentos de Ciencias
Genómicas (o genética) exceptuando los siguientes: Department of
Biology e Informática Médica
7.
8. Genbank
• Es una colección anotada de todas las secuencias de
nucleótidos a disposición del público y su traducción de
proteínas.
• Centro Nacional de Información Biotecnológica (NCBI)
• European Molecular Biology Laboratory (EMBL) de datos de
Bibliotecas del Instituto Europeo de Bioinformática (EBI)
• DNA Data Bank de Japón (DDBJ).
• Reciben las secuencias producidas en laboratorios de todo el
mundo de más de 100.000 organismos distintos.
• Crece a un ritmo exponencial, duplicando cada 10
meses. Suelte 134, producido en febrero de 2003, contenía
más de 29300 millones de bases nucleotídicas en más de
23,0 millones de secuencias.
• Se construye mediante el envío directo de los distintos
laboratorios y de los centros de secuenciación a gran escala.
12. Nature. 2001 Feb 15;409(6822):860-921.
Initial sequencing and analysis of the human genome.
Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh
W, Funke R, Gage D, Harris K, Heaford A, Howland J, Kann L, Lehoczky J, LeVine R, McEwan P, McKernan
K, Meldrim J, Mesirov JP, Miranda C, Morris W, Naylor J, Raymond C, Rosetti M, Santos R, Sheridan
A, Sougnez C, Stange-Thomann N,Stojanovic N, Subramanian A, Wyman D, Rogers J, Sulston J, Ainscough
R, Beck S, Bentley D, Burton J, Clee C, Carter N, Coulson A, Deadman R, Deloukas P, Dunham A,Dunham
I, Durbin R, French L, Grafham D, Gregory S, Hubbard T, Humphray S, Hunt A, Jones M, Lloyd C, McMurray
A, Matthews L, Mercer S, Milne S, Mullikin JC, Mungall A,Plumb R, Ross M, Shownkeen R, Sims
S, Waterston RH, Wilson RK, Hillier LW, McPherson JD, Marra MA, Mardis ER, Fulton LA, Chinwalla
AT, Pepin KH, Gish WR, Chissoe SL, Wendl MC, Delehaunty KD, Miner TL, Delehaunty A, Kramer JB, Cook
LL, Fulton RS, Johnson DL, Minx PJ, Clifton SW, Hawkins T, Branscomb E, Predki P, Richardson P,Wenning
S, Slezak T, Doggett N, Cheng JF, Olsen A, Lucas S, Elkin C, Uberbacher E, Frazier M, Gibbs RA, Muzny
DM, Scherer SE, Bouck JB, Sodergren EJ, Worley KC, Rives CM, Gorrell JH, Metzker ML, Naylor
SL, Kucherlapati RS, Nelson DL, Weinstock GM, Sakaki Y, Fujiyama A, Hattori M, Yada T, Toyoda A, Itoh
T, Kawagoe C, Watanabe H, Totoki Y,Taylor T, Weissenbach J, Heilig R, Saurin W, Artiguenave F, Brottier
P, Bruls T, Pelletier E, Robert C, Wincker P, Smith DR, Doucette-Stamm L, Rubenfield M, Weinstock K, Lee
HM, Dubois J, Rosenthal A, Platzer M, Nyakatura G, Taudien S, Rump A, Yang H, Yu J, Wang J, Huang
G, Gu J, Hood L, Rowen L, Madan A, Qin S, Davis RW, Federspiel NA,Abola AP, Proctor MJ, Myers
RM, Schmutz J, Dickson M, Grimwood J, Cox DR, Olson MV, Kaul R, Raymond C, Shimizu N, Kawasaki
K, Minoshima S, Evans GA, Athanasiou M,Schultz R, Roe BA, Chen F, Pan H, Ramser J, Lehrach
H, Reinhardt R, McCombie WR, de la Bastide M, Dedhia N, Blöcker H, Hornischer K, Nordsiek G, Agarwala
R, Aravind L,Bailey JA, Bateman A, Batzoglou S, Birney E, Bork P, Brown DG, Burge CB, Cerutti L, Chen
HC, Church D, Clamp M, Copley RR, Doerks T, Eddy SR, Eichler EE, Furey TS,Galagan J, Gilbert
JG, Harmon C, Hayashizaki Y, Haussler D, Hermjakob H, Hokamp K, Jang W, Johnson LS, Jones TA, Kasif
S, Kaspryzk A, Kennedy S, Kent WJ, Kitts P,Koonin EV, Korf I, Kulp D, Lancet D, Lowe TM, McLysaght
A, Mikkelsen T, Moran JV, Mulder N, Pollara VJ, Ponting CP, Schuler G, Schultz J, Slater G, Smit AF, Stupka
E,Szustakowski J, Thierry-Mieg D, Thierry-Mieg J, Wagner L, Wallis J, Wheeler R, Williams A, Wolf YI, Wolfe
KH, Yang SP, Yeh RF, Collins F, Guyer MS, Peterson J, Felsenfeld A,Wetterstrand KA, Patrinos A, Morgan
MJ, de Jong P, Catanese JJ, Osoegawa K, Shizuya H, Choi S, Chen YJ; International Human Genome
Sequencing Consortium.
13. • The human genome holds an extraordinary trove of
information about human development, physiology, medicine
and evolution. Here we report the results of an international
collaboration to produce and make freely available a draft
sequence of the human genome. We also present an initial
analysis of the data, describing some of the insights that can
be gleaned from the sequence.
• Here we report the results of a collaboration involving 20
groups from the United States, the United Kingdom, Japan,
France, Germany and China to produce a draft sequence of
the human genome.
• Of course, navigating information spanning nearly ten orders
of magnitude requires computational tools to extract the full
value.
14. FIGURA 1. Línea de tiempo de los análisis genómicos a gran escala.
Nature 409, 860-921(15 February 2001)doi:10.1038/35057062
http://www.nature.com/nature/journal/v409/n6822/fig_tab/409860a0_F1.html
15. FIGURE 3. The automated production line for sample preparation at the Whitehead
Institute, Center for Genome Research.
Nature 409, 860-921(15 February 2001)doi:10.1038/35057062
http://www.nature.com/nature/journal/v409/n6822/images/409860ac.2.jpg
16. • Science 16 February 2001:
Vol. 291. no. 5507, pp. 1304 - 1351
DOI: 10.1126/science.1058040
• REVIEW
• The Sequence of the Human Genome
• J. Craig Venter,1* Mark D. Adams,1 Eugene W. Myers,1 Peter W. Li,1 Richard J. Mural,1 Granger G. Sutton,1 Hamilton O.
Smith,1 Mark Yandell,1 Cheryl A. Evans,1Robert A. Holt,1 Jeannine D. Gocayne,1 Peter Amanatides,1 Richard M. Ballew,1 Daniel
H. Huson,1 Jennifer Russo Wortman,1 Qing Zhang,1Chinnappa D. Kodira,1 Xiangqun H. Zheng,1 Lin Chen,1 Marian
Skupski,1 Gangadharan Subramanian,1 Paul D. Thomas,1 Jinghui Zhang,1George L. Gabor Miklos,2 Catherine Nelson,3 Samuel
Broder,1 Andrew G. Clark,4 Joe Nadeau,5 Victor A. McKusick,6 Norton Zinder,7 Arnold J. Levine,7Richard J. Roberts,8 Mel
Simon,9 Carolyn Slayman,10 Michael Hunkapiller,11 Randall Bolanos,1 Arthur Delcher,1 Ian Dew,1 Daniel Fasulo,1 Michael
Flanigan,1Liliana Florea,1 Aaron Halpern,1 Sridhar Hannenhalli,1 Saul Kravitz,1 Samuel Levy,1 Clark Mobarry,1 Knut
Reinert,1 Karin Remington,1 Jane Abu-Threideh,1Ellen Beasley,1 Kendra Biddick,1 Vivien Bonazzi,1 Rhonda Brandon,1 Michele
Cargill,1 Ishwar Chandramouliswaran,1 Rosane Charlab,1 Kabir Chaturvedi,1Zuoming Deng,1 Valentina Di Francesco,1 Patrick
Dunn,1 Karen Eilbeck,1 Carlos Evangelista,1 Andrei E. Gabrielian,1 Weiniu Gan,1 Wangmao Ge,1Fangcheng Gong,1 Zhiping
Gu,1 Ping Guan,1 Thomas J. Heiman,1 Maureen E. Higgins,1 Rui-Ru Ji,1 Zhaoxi Ke,1 Karen A. Ketchum,1 Zhongwu Lai,1 Yiding
Lei,1Zhenya Li,1 Jiayin Li,1 Yong Liang,1 Xiaoying Lin,1 Fu Lu,1 Gennady V. Merkulov,1 Natalia Milshina,1 Helen M.
Moore,1 Ashwinikumar K Naik,1Vaibhav A. Narayan,1 Beena Neelam,1 Deborah Nusskern,1 Douglas B. Rusch,1 Steven
Salzberg,12 Wei Shao,1 Bixiong Shue,1 Jingtao Sun,1 Zhen Yuan Wang,1Aihui Wang,1 Xin Wang,1 Jian Wang,1 Ming-Hui
Wei,1 Ron Wides,13 Chunlin Xiao,1 Chunhua Yan,1 Alison Yao,1 Jane Ye,1 Ming Zhan,1 Weiqing Zhang,1Hongyu Zhang,1 Qi
Zhao,1 Liansheng Zheng,1 Fei Zhong,1 Wenyan Zhong,1 Shiaoping C. Zhu,1 Shaying Zhao,12 Dennis Gilbert,1 Suzanna
Baumhueter,1Gene Spier,1 Christine Carter,1 Anibal Cravchik,1 Trevor Woodage,1 Feroze Ali,1 Huijin An,1 Aderonke Awe,1 Danita
Baldwin,1 Holly Baden,1 Mary Barnstead,1Ian Barrow,1 Karen Beeson,1 Dana Busam,1 Amy Carver,1 Angela Center,1 Ming Lai
Cheng,1 Liz Curry,1 Steve Danaher,1 Lionel Davenport,1 Raymond Desilets,1Susanne Dietz,1 Kristina Dodson,1 Lisa
Doup,1 Steven Ferriera,1 Neha Garg,1 Andres Gluecksmann,1 Brit Hart,1 Jason Haynes,1 Charles Haynes,1 Cheryl
Heiner,1Suzanne Hladun,1 Damon Hostin,1 Jarrett Houck,1 Timothy Howland,1 Chinyere Ibegwam,1 Jeffery Johnson,1 Francis
Kalush,1 Lesley Kline,1 Shashi Koduru,1Amy Love,1 Felecia Mann,1 David May,1 Steven McCawley,1 Tina McIntosh,1 Ivy
McMullen,1 Mee Moy,1 Linda Moy,1 Brian Murphy,1 Keith Nelson,1Cynthia Pfannkoch,1 Eric Pratts,1 Vinita Puri,1 Hina
Qureshi,1 Matthew Reardon,1 Robert Rodriguez,1 Yu-Hui Rogers,1 Deanna Romblad,1 Bob Ruhfel,1Richard Scott,1 Cynthia
Sitter,1 Michelle Smallwood,1 Erin Stewart,1 Renee Strong,1 Ellen Suh,1 Reginald Thomas,1 Ni Ni Tint,1 Sukyee Tse,1 Claire
Vech,1Gary Wang,1 Jeremy Wetter,1 Sherita Williams,1 Monica Williams,1 Sandra Windsor,1 Emily Winn-Deen,1 Keriellen
Wolfe,1 Jayshree Zaveri,1 Karena Zaveri,1Josep F. Abril,14 Roderic Guigó,14 Michael J. Campbell,1 Kimmen V. Sjolander,1 Brian
Karlak,1 Anish Kejariwal,1 Huaiyu Mi,1 Betty Lazareva,1 Thomas Hatton,1Apurva Narechania,1 Karen Diemer,1 Anushya
Muruganujan,1 Nan Guo,1 Shinji Sato,1 Vineet Bafna,1 Sorin Istrail,1 Ross Lippert,1 Russell Schwartz,1Brian Walenz,1 Shibu
Yooseph,1 David Allen,1 Anand Basu,1 James Baxendale,1 Louis Blick,1 Marcelo Caminha,1 John Carnes-Stine,1 Parris
Caulk,1Yen-Hui Chiang,1 My Coyne,1 Carl Dahlke,1 Anne Deslattes Mays,1 Maria Dombroski,1 Michael Donnelly,1 Dale
Ely,1 Shiva Esparham,1 Carl Fosler,1 Harold Gire,1Stephen Glanowski,1 Kenneth Glasser,1 Anna Glodek,1 Mark Gorokhov,1 Ken
Graham,1 Barry Gropman,1 Michael Harris,1 Jeremy Heil,1 Scott Henderson,1Jeffrey Hoover,1 Donald Jennings,1 Catherine
Jordan,1 James Jordan,1 John Kasha,1 Leonid Kagan,1 Cheryl Kraft,1 Alexander Levitsky,1 Mark Lewis,1Xiangjun Liu,1 John
Lopez,1 Daniel Ma,1 William Majoros,1 Joe McDaniel,1 Sean Murphy,1 Matthew Newman,1 Trung Nguyen,1 Ngoc Nguyen,1 Marc
Nodell,1Sue Pan,1 Jim Peck,1 Marshall Peterson,1 William Rowe,1 Robert Sanders,1 John Scott,1 Michael Simpson,1 Thomas
Smith,1 Arlan Sprague,1Timothy Stockwell,1 Russell Turner,1 Eli Venter,1 Mei Wang,1 Meiyuan Wen,1 David Wu,1 Mitchell
Wu,1 Ashley Xia,1 Ali Zandieh,1 Xiaohong Zhu1
17. • A 2.91-billion base pair (bp) consensus sequence of the euchromatic portion of the human genome was
generated by the whole-genome shotgun sequencing method. The 14.8-billion bp DNA sequence
was generated over 9 months from 27,271,853 high-quality sequence reads (5.11-fold coverage of the
genome) from both ends of plasmid clones made from the DNA of five individuals. Two assembly
strategies--a whole-genome assembly and a regional chromosome assembly--were used, each
combining sequence data from Celera and the publicly funded genome effort. The public data were
shredded into 550-bp segments to create a 2.9-fold coverage of those genome regions that had been
sequenced, without including biases inherent in the cloning and assembly procedure used by the publicly
funded group. This brought the effective coverage in the assemblies toeightfold, reducing the number and
size of gaps in the final assembly over what would be obtained with 5.11-fold coverage. The two
assembly strategies yielded very similar results that largely agree with independent mapping data. The
assemblies effectively cover the euchromatic regions of the human chromosomes. More than 90% of the
genome is in scaffold assemblies of 100,000 bp or more, and 25% of the genome is in scaffolds of
10 million bp or larger. Analysis of the genome sequence revealed 26,588 protein-encoding transcripts for
which there was strong corroborating evidence and an additional ~12,000 computationally derived genes
with mouse matches or other weak supporting evidence. Although gene-dense clusters are obvious,
almost half the genes are dispersed in low G+C sequence separated by large tracts of apparently
noncoding sequence. Only 1.1% of the genome is spanned by exons, whereas 24% is in introns, with
75% of the genome being intergenic DNA. Duplications of segmental blocks, ranging in size up to
chromosomal lengths, are abundant throughout the genome and reveal a complex evolutionary history.
Comparative genomic analysis indicates vertebrate expansions of genes associated with neuronal
function, with tissue-specific developmental regulation, and with the hemostasis and immune
systems. DNA sequence comparisons between the consensus sequence and publicly funded genome
data provided locations of 2.1 million single-nucleotide polymorphisms (SNPs). A random pair of human
haploid genomes differed at a rate of 1 bp per 1250 on average, but there was marked heterogeneity in
the level of polymorphism across the genome. Less than 1% of all SNPs resulted in variation in proteins,
but the task of determining which SNPs have functional consequences remains an open challenge.
18. Fig. 2. Flow diagram
for sequencing
pipeline. Samples are
received, selected,
and processed in
compliance with
standard operating
procedures, with a
focus on quality within
and across
departments. Each
process has defined
inputs and outputs
with the capability to
exchange samples and
data with both
internal and external
entities according to
defined quality
guidelines.
Manufacturing
pipeline processes,
products, quality
control measures, and
responsible parties are
indicated and are
described further in
the text.
J. C. Venter et al., Science 291, 1304 -1351 (2001)
23. Definiciones
• Describes the new research environments that support advanced data
acquisition, data storage, data management, data integration, data
mining, data visualization and other computing and information
processing services over the Internet (NSF, 2003).
• The comprehensive infrastructure needed to capitalize on dramatic
advances in information Technology. Integrates hardware for
computing, data and networks, digitally-enabled sensors, observatories
and experimental facilities, and an interoperable suite of software and
middle-ware services and tools. Investments in interdiscip-linary teams
and cyberinfrastructure professionals with expertise in algorithm
development, system operations, and applications development are
also essential to exploit the full power of cyberinfrastructure to create,
disseminate, and preserve scientific data, information and knowledge
(NSF 2007).
• Technological solution to the problem of efficiently connecting data,
computers, and people with the goal of enabling derivation of novel
scientific theories and knowledge (Wikipedia 2009).
24. Cyberinfrastructure
• Describes the new research environments that support advanced
data acquisition, data storage, data management, data integration,
data mining, data visualization and other computing and information
processing services over the Internet (NSF, 2003).
• The comprehensive infrastructure needed to capitalize on dramatic
advances in information Technology. Integrates hardware for
computing, data and networks, digitally-enabled sensors,
observatories and experimental facilities, and an interoperable suite
of software and middle-ware services and tools. Investments in
interdiscip-linary teams and cyberinfrastructure professionals with
expertise in algorithm development, system operations, and
applications development are also essential to exploit the full power
of cyberinfrastructure to create, disseminate, and preserve scientific
data, information and knowledge (NSF 2007).
• Technological solution to the problem of efficiently connecting data,
computers, and people with the goal of enabling derivation of novel
scientific theories and knowledge (Wikipedia 2009).
26. e-Science
• Originally referred to experiments that connected together a few powerful
computers located at different sites and, later, a very large number of
modest PCs across the world in order to undertake enormous calculations or
process huge amounts of data. The coordination of geographically dispersed
computing and data resources has become known as the Grid. This is
shorthand for the emerging standards and technology – hardware and
software – being developed to enable and simplify the sharing of resources.
The analogy is an electric power grid, which comprises numerous varied
resources connected together to contribute power into a shared pool that
users can easily access when they need it.
• What is exciting about the Grid is that the combination of extensive
connectivity, massive computer power and vast quantities of digitized data –
all three of which are still rapidly expanding – making possible new
applications that are orders of magnitude more potent than even a few years
ago.
• The term 'e-research' is sometimes used instead of 'e-science', with the
advantage that gives more emphasis to the end result of better, richer, faster
or new research results, rather than the technologies used to get them.
National Centre for e-Social Science. 2008. Frequently Asked Questions.
Diponible en:
http://www.ncess.ac.uk/about_eSS/faq/?q=General_1#General_1
27. e-investigación
• Actividades de investigación que utilizan una gama de capacidades avanzadas de
las TIC y abarca nuevas metodologías de investigación que salen de un mayor
acceso a:
– Las comunicaciones de banda ancha de redes, instrumentos de investigación y
las instalaciones, redes de sensores y repositorios de datos;
– Software y servicios de infraestructura que permitan garantizar la conectividad e
interoperabilidad;
– Aplicación herramientas que abarcan la disciplina de instrumentos específicos y
herramientas de interacción.
– Avanzar y aumentar, en lugar de reemplazar las tradicionales metodologías de
investigación,
• Permitirá a los investigadores para llevar a cabo su labor de investigación más
creativa, eficiente y colaboración a larga distancia y difundir sus resultados de la
investigación con un mayor efecto.
• Colaboración
• Nuevos campos de investigación emergentes, utilizando nuevas técnicas de minería
de datos y el análisis, avanzados algoritmos computacionales y de redes de
intercambio de recursos.
28. e-investigación
• e-journal: electronic
• e-social sciences: de enabling (permitir)
(National Centre for e-Social Science,
2008)
• e- research: alta velocidad, red digital
disponible a cualquir hora en cualquier
lugar (Anderson y Kanuka, 2002)
33. E-investigación bibliográfica
• Investigación bibliográfica basada en el
uso de la Web y la ciberinfraestructura
– Recursos de la Web 2.0 en evolución a la 3.0
– Aplicaciones, herramientas, servicios.
– Colecciones de datos digitales (repositorios,
bases de datos).
• Análisis sistémico de la literatura
• Meta-análisis
34. Tareas
• Buscar tres ejemplos de e-ciencia
(ciberinfraestructura) de su área de
interés.
• Marcarlo en Diigo y compartirlo al grupo.
• Describir uno de ellos en una cuartilla.
• Enviarlo al grupo en documento google.
35. • Este proyecto se lleva a cabo gracias al
financiamiento de:
DGAPA, UNAM
Proyecto PAPIME PE 201509
36. Licencia Creative
Commons
Forma de citar este trabajo
Michán, L. 2011. Presentación
http://creativecommons.org/licenses/by/3.0/deed.es_GT