SlideShare une entreprise Scribd logo
1  sur  119
Télécharger pour lire hors ligne
Editando anotaciones con Apollo

Un taller para la comunidad científica reunida en BIOS
Monica Munoz-Torres, PhD | @monimunozto

Berkeley Bioinformatics Open-Source Projects (BBOP)

Lawrence Berkeley National Laboratory | 

University of California Berkeley | U.S. Department of Energy


BIOS, Manizales, Colombia | 21 Septiembre, 2015
APOLLO DEVELOPMENT
APOLLO DEVELOPERS 2
h" p://G e nom e Ar c hite c t. or g /	
   	
  
Nathan Dunn
Eric Yao
JBrowse, UC Berkeley
Christine Elsik’s Lab,
University of Missouri
Suzi Lewis
Principal Investigator
BBOP	
  
Moni Munoz-Torres
Stephen Ficklin
GenSAS,
Washington State University
Colin DieshDeepak Unni
OUTLINE

Web	
  Apollo	
  Collabora(ve	
  Cura(on	
  and	
  	
  
Interac(ve	
  Analysis	
  of	
  Genomes	
  
3OUTLINE
•  Hoy	
   descubriremos	
   cómo	
  
sortear	
   obstáculos	
   para	
  
extraer	
  la	
  información	
  más	
  
valiosa	
  en	
  un	
  proyectos	
  de	
  
secuenciación	
  &	
  anotación	
  
de	
  genomas.	
  
4
BY THE END OF THIS TALK

you will

v BeAer	
  understand	
  genome	
  cura(on	
  in	
  the	
  context	
  of	
  annota(on:	
  	
  
assembled	
  genome	
  à	
  automated	
  annota=on	
  à	
  manual	
  annota=on	
  
v Become	
  familiar	
  with	
  the	
  environment	
  and	
  func(onality	
  of	
  the	
  Apollo	
  
genome	
  annota(on	
  edi(ng	
  tool.	
  
v Learn	
  to	
  iden(fy	
  homologs	
  of	
  known	
  genes	
  of	
  interest	
  in	
  a	
  newly	
  
sequenced	
  genome.	
  
v Learn	
  about	
  corrobora(ng	
  and	
  modifying	
  automa(cally	
  annotated	
  gene	
  
models	
  using	
  available	
  evidence	
  in	
  Apollo.	
  
Introduction
¿Cómo	
  se	
  traza	
  
el	
  mapa	
  de	
  un	
  genoma?	
  
6
El mapa del genoma
Introduction
Diseño & muestreo
Análisis comparativos
Colección
consenso
de genes
Anotación
manual
Anotación
automatizada
Secuenciación Ensamblaje
Síntesis &
publicación
7
El mapa del genoma
Introduction
Diseño & muestreo
Análisis comparativos
Colección
consenso
de genes
Anotación
manual
Anotación
automatizada
Secuenciación Ensamblaje
Síntesis &
publicación
QC
QC
QC
QC
QCQC
QC
CURATING GENOMES

steps involved
1  Genera=on	
  of	
  Gene	
  Models	
  
calling	
  ORFs,	
  one	
  or	
  more	
  
rounds	
  of	
  gene	
  predic(on,	
  
etc.	
  
	
  
2  Annota=on	
  of	
  gene	
  models	
  
Describing	
  func(on,	
  
expression	
  paAerns,	
  
metabolic	
  network	
  
	
  memberships.	
  
3  Manual	
  annota=on	
  
CURATING GENOMES 8
ANOTACION DE GENOMAS

requiere precisión y profundidad
Anotando Genomas 9
La	
  colección	
  de	
  genes	
  de	
  cada	
  organismo	
  informa	
  una	
  variedad	
  de	
  análisis:	
  
•  Número	
  de	
  genes,	
  %	
  GC,	
  composición	
  de	
  TEs,	
  áreas	
  repe((vas	
  
•  Asignar	
  función	
  
•  Evolución	
  molecular,	
  conservación	
  de	
  secuencias	
  
•  Familias	
  de	
  genes	
  
•  Caminos	
  metabólicos	
  
•  ¿Qué	
  hace	
  único	
  a	
  cada	
  organismo?	
  	
  
¿Qué	
  hace	
  “abeja”	
  a	
  una	
  abeja?	
  
Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
Refresquemos	
  nuestra	
  
memoria.	
  
REVIEW ON YOUR OWN

for manual annotation
To	
  remember…	
  Biological	
  concepts	
  to	
  beAer	
  
understand	
  manual	
  annota(on	
  
11FOOD FOR THOUGHT
•  GLOSSARY	
  
from	
  con1g	
  to	
  splice	
  site	
  
	
  
•  CENTRAL	
  DOGMA	
  
in	
  molecular	
  biology	
  
	
  
•  WHAT	
  IS	
  A	
  GENE?	
  
defining	
  your	
  goal	
  
•  TRANSCRIPTION	
  
mRNA	
  in	
  detail	
  
	
  
•  TRANSLATION	
  
and	
  other	
  defini(ons	
  
•  GENOME	
  CURATION	
  
steps	
  involved	
  
12BIO-REFRESHER
What is a gene?
v  The	
  defini(on	
  of	
  a	
  gene	
  paints	
  a	
  very	
  complex	
  picture	
  of	
  molecular	
  ac(vity	
  
and	
  it	
  is	
  a	
  con(nuously	
  evolving	
  concept.	
  	
  
•  From	
  the	
  Sequence	
  Ontology	
  (SO):	
  
“A	
  gene	
  is	
  a	
  locatable	
  region	
  of	
  genomic	
  sequence,	
  corresponding	
  to	
  a	
  unit	
  
of	
  inheritance,	
  which	
  is	
  associated	
  with	
  regulatory	
  regions,	
  transcribed	
  
regions	
  and/or	
  other	
  func(onal	
  sequence	
  regions”.	
  
	
  
	
  
“Evolving	
  Concept”	
  at	
  hAp://goo.gl/LpsajQ	
  
13BIO-REFRESHER
What is a gene?
v  In	
  our	
  life(me,	
  the	
  Encyclopedia	
  of	
  DNA	
  Elements	
  (ENCODE)	
  project	
  
updated	
  this	
  concept	
  yet	
  again.	
  Long	
  transcripts	
  &	
  dispersed	
  regula1on!	
  
	
  
	
  
“A	
  gene	
  is	
  a	
  DNA	
  segment	
  that	
  contributes	
  phenotype/func(on.	
  In	
  the	
  absence	
  of	
  
demonstrated	
  func(on,	
  a	
  gene	
  may	
  be	
  characterized	
  by	
  sequence,	
  transcrip(on	
  or	
  
homology.”	
  
	
  
https://www.encodeproject.org/
14BIO-REFRESHER
What is a gene?

let’s think computationally!
v  Think	
  of	
  the	
  genome	
  as	
  an	
  operating system for	
  a	
  living	
  being	
  
•  Considering	
  that	
  the	
  nucleo(des	
  of	
  the	
  genome	
  are	
  put	
  together	
  into	
  a	
  
code	
  that	
  is	
  executed	
  through	
  the	
  process	
  of	
  transcription	
  and	
  
translation…
•  …	
  think	
  of	
  genes	
  as	
  subroutines	
  that	
  are	
  repe((vely	
  called	
  in	
  the	
  
process	
  of	
  transcription
Gerstein et al., 2007. Genome Res.
15BIO-REFRESHER
What is a gene?

considerations
v  Also	
  consider	
  :	
  
•  A	
  gene	
  is	
  a	
  genomic	
  sequence	
  (DNA	
  or	
  RNA)	
  directly	
  encoding	
  
func(onal	
  product	
  molecules,	
  either	
  RNA	
  or	
  protein.	
  
•  If	
  several	
  func(onal	
  products	
  share	
  overlapping	
  regions,	
  we	
  take	
  the	
  
union	
  of	
  all	
  overlapping	
  genomics	
  sequences	
  coding	
  for	
  them.	
  
•  This	
  union	
  must	
  be	
  coherent	
  –	
  i.e.,	
  processed	
  separately	
  for	
  final	
  
protein	
  and	
  RNA	
  products	
  –	
  but	
  does	
  not	
  require	
  that	
  all	
  products	
  
necessarily	
  share	
  a	
  common	
  subsequence.
Gerstein et al., 2007. Genome Res.
16BIO-REFRESHER
“El	
  gen	
  es	
  la	
  unión	
  de	
  
secuencias	
  genómicas	
  
que	
  codifican	
  una	
  
colección	
  coherente	
  
de	
  productos	
  
funcionales	
  que	
  
pueden	
  o	
  no	
  
superponerse.”	
  	
  
Gerstein et al., 2007. Genome Res
El	
  Gen:	
  un	
  blanco	
  en	
  movimiento.	
  
¿QUÉ ES UN GEN?
17BIO-REFRESHER
TRANSLATION

reading frame
v  Reading	
  frame	
  is	
  a	
  manner	
  of	
  dividing	
  the	
  sequence	
  of	
  nucleo(des	
  in	
  mRNA	
  
(or	
  DNA)	
  into	
  a	
  set	
  of	
  consecu(ve,	
  non-­‐overlapping	
  triplets	
  (codons).	
  
v  Three	
  frames	
  can	
  be	
  read	
  in	
  the	
  5’	
  à	
  3’	
  direc(on.	
  Given	
  that	
  DNA	
  has	
  two	
  
an(-­‐parallel	
  strands,	
  an	
  addi(onal	
  three	
  frames	
  are	
  possible	
  to	
  be	
  read	
  on	
  
the	
  an(-­‐sense	
  strand.	
  Six	
  total	
  possible	
  reading	
  frames	
  exist.	
  
v  In	
  eukaryotes,	
  only	
  one	
  reading	
  frame	
  per	
  sec(on	
  of	
  DNA	
  is	
  biologically	
  
relevant	
  at	
  a	
  (me:	
  it	
  has	
  the	
  poten(al	
  to	
  be	
  transcribed	
  into	
  RNA	
  and	
  
translated	
  into	
  protein.	
  This	
  is	
  called	
  the	
  OPEN	
  READING	
  FRAME	
  (ORF)	
  
•  ORF	
  =	
  Start	
  signal	
  +	
  coding	
  sequence	
  (divisible	
  by	
  3)	
  +	
  Stop	
  signal	
  
v  The	
  sec(ons	
  of	
  the	
  mature	
  mRNA	
  transcribed	
  with	
  the	
  coding	
  sequence	
  but	
  
not	
  translated	
  are	
  called	
  UnTranslated	
  Regions	
  (UTR);	
  one	
  at	
  each	
  end.	
  
18
"Reading Frame" by Hornung Ákos - Wikimedia Commons
BIO-REFRESHER
TRANSLATION

reading frame
19
"ORF" by Thatsonginc - Wikimedia Commons
BIO-REFRESHER
TRANSLATION

reading frame
20BIO-REFRESHER
TRANSLATION

reading frame: splice sites
v  The	
  spliceosome	
  catalyzes	
  the	
  removal	
  of	
  introns	
  and	
  the	
  liga(on	
  of	
  flanking	
  
exons.	
  
•  introns:	
  spaces	
  inside	
  the	
  gene,	
  not	
  part	
  of	
  the	
  coding	
  sequence	
  
•  exons:	
  expression	
  units	
  (of	
  the	
  coding	
  sequence)	
  
v  Splicing	
  “signals”	
  (from	
  the	
  point	
  of	
  view	
  of	
  an	
  intron):	
  	
  
•  There	
  is	
  a	
  5’	
  end	
  splice	
  “signal”	
  (site):	
  usually	
  GT	
  (less	
  common:	
  GC)	
  
•  And	
  a	
  3’	
  end	
  splice	
  site:	
  usually	
  AG	
  
•  …]5’-­‐GT/AG-­‐3’[…	
  
	
  
v  It	
  is	
  possible	
  to	
  produce	
  more	
  than	
  one	
  protein	
  (polypep(de)	
  sequence	
  from	
  
the	
  same	
  genic	
  region,	
  by	
  alterna(vely	
  bringing	
  exons	
  together=	
  alterna=ve	
  
splicing.	
  For	
  example,	
  the	
  gene	
  Dscam	
  (Drosophila)	
  has	
  38,000	
  alterna(vely	
  
spliced	
  mRNAs	
  =	
  isoforms	
  
21
"Gene structure" by Daycd- Wikimedia Commons
BIO-REFRESHER
TRANSLATION

now in your mind
•  Although	
  of	
  brief	
  existence,	
  understanding	
  mRNAs	
  is	
  crucial,	
  
	
  as	
  they	
  will	
  become	
  the	
  center	
  of	
  your	
  work.	
  
22
Text for figures goes here
BIO-REFRESHER
TRANSLATION

reading frame: phase
v  Introns	
  can	
  interrupt	
  the	
  reading	
  frame	
  of	
  a	
  gene	
  by	
  inser(ng	
  a	
  sequence	
  
between	
  two	
  consecu(ve	
  codons	
  
	
  
	
  
v  Between	
  the	
  first	
  and	
  second	
  nucleo(de	
  of	
  a	
  codon	
  
	
  
v  Or	
  between	
  the	
  second	
  and	
  third	
  nucleo(de	
  of	
  a	
  codon	
  
"Exon and Intron classes”. Licensed under Fair use via Wikipedia
23
"Protein synthesis" by Kelvinsong - Wikimedia Commons
CURATING GENOMES
TRANSLATION

in detail
24BIO-REFRESHER
HICCUPS

in transcription and translation
v  The	
  presence	
  of	
  premature	
  Stop	
  codons	
  in	
  the	
  message	
  is	
  possible.	
  A	
  
process	
  called	
  non-­‐sense	
  mediated	
  decay	
  checks	
  for	
  them	
  and	
  corrects	
  
them	
  to	
  avoid:	
  incomplete	
  splicing,	
  DNA	
  muta(ons,	
  transcrip(on	
  errors,	
  and	
  
leaky	
  scanning	
  of	
  ribosome	
  –	
  causing	
  changes	
  in	
  the	
  reading	
  frame	
  (frame	
  
shiYs).	
  
v  Inser(ons	
  and	
  dele(ons	
  (indels)	
  can	
  cause	
  frame	
  shios,	
  when	
  indel	
  is	
  not	
  
divisible	
  by	
  three	
  (3).	
  As	
  a	
  result,	
  the	
  pep(de	
  can	
  be	
  abnormally	
  long,	
  or	
  
abnormally	
  short	
  –	
  depending	
  when	
  the	
  first	
  in-­‐frame	
  Stop	
  signal	
  is	
  located.	
  
Predicción	
  &	
  Anotación	
  
26Gene Prediction
GENE PREDICTION
v  The	
  iden(fica(on	
  of	
  structural	
  features	
  of	
  the	
  genome:	
  
	
  
•  Primarily	
  focused	
  on	
  protein-­‐coding	
  genes.	
  	
  
•  Predicts	
  also	
  transfer	
  RNAs	
  (tRNA),	
  ribosomal	
  RNAs	
  (rRNA),	
  
regulatory	
  mo(fs,	
  long	
  and	
  small	
  non-­‐coding	
  RNAs	
  (ncRNA),	
  
repe((ve	
  elements	
  (masked),	
  etc.	
  
•  Two	
  methods	
  for	
  iden(fica(on.	
  
•  Some	
  are	
  self-­‐trained	
  and	
  some	
  must	
  be	
  trained.	
  
27Gene Prediction
GENE PREDICTION

methods for discovery
1)	
  Ab	
  ini,o:	
  	
  
-­‐	
  based	
  on	
  DNA	
  composi(on,	
  	
  
-­‐	
  deals	
  strictly	
  with	
  genomic	
  
sequences	
  
-­‐	
  makes	
  use	
  of	
  sta(s(cal	
  
approaches	
  to	
  search	
  for	
  coding	
  
regions	
  and	
  typical	
  gene	
  signals.	
  	
  
	
  
•  E.g.	
  Augustus,	
  GENSCAN,	
  	
  
geneid,	
  fgenesh,	
  etc.	
  
3’	
  
Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920
28
Nucleic Acids 2003 vol. 31 no. 13 3738-3741
Gene Prediction
GENE PREDICTION

methods for discovery (ctd)
2)	
  Homology-­‐based:	
  	
  
-­‐	
  evidence-­‐based,	
  	
  
-­‐	
  finds	
  genes	
  using	
  either	
  similarity	
  searches	
  in	
  the	
  main	
  databases	
  or	
  
experimental	
  data	
  including	
  RNAseq,	
  expressed	
  sequence	
  tags	
  (ESTs),	
  full-­‐length	
  
complementary	
  DNAs	
  (cDNAs),	
  etc.	
  	
  
	
  
•  E.g:	
  fgenesh++,	
  Just	
  Annotate	
  My	
  genome	
  (JAMg),	
  SGP2	
  
29
GENE ANNOTATION
Integra(on	
  of	
  data	
  from	
  computa(onal	
  &	
  experimental	
  evidence	
  with	
  data	
  
from	
  predic(on	
  tools,	
  to	
  generate	
  a	
  reliable	
  set	
  of	
  structural	
  annota=ons.	
  	
  
	
  
Involves:	
  
1)	
  ab	
  ini1o	
  predic(ons	
  
2)	
  assessment	
  of	
  biological	
  evidence	
  to	
  drive	
  the	
  gene	
  predic(on	
  process	
  
3)	
  synthesis	
  of	
  these	
  results	
  to	
  produce	
  a	
  set	
  of	
  consensus	
  gene	
  models	
  
Gene Annotation
30
In	
  some	
  cases	
  algorithms	
  and	
  metrics	
  used	
  to	
  generate	
  
consensus	
  sets	
  may	
  actually	
  reduce	
  the	
  accuracy	
  of	
  the	
  gene’s	
  
representa(on.	
  
GENE ANNOTATION
Gene	
  models	
  may	
  be	
  organized	
  into	
  “sets”	
  using:	
  
v  automa(c	
  integra(on	
  of	
  predicted	
  sets	
  (combiners);	
  e.g:	
  GLEAN,	
  
EvidenceModeler	
  
or	
  
v  tools	
  packaged	
  into	
  pipelines;	
  e.g:	
  MAKER,	
  PASA,	
  Gnomon,	
  
Ensembl,	
  etc.	
  
Gene Annotation
ANOTACION

un arte imperfecto
No one is perfect, least of all automated annotation. 31
Nuevas	
  tecnologías	
  traen	
  nuevos	
  retos:	
  	
  
•  Errores	
  en	
  el	
  ensamblaje	
  pueden	
  causar	
  
fragmentación	
  en	
  las	
  anotaciones	
  
•  Cobertura	
  limitada	
  dificulta	
  la	
  
iden(ficación	
  con	
  certeza	
  
Image: www.BroadInstitute.org
ANOTACION MANUAL

mejorando predicciones
Schiex	
  et	
  al.	
  Nucleic	
  Acids	
  2003	
  (31)	
  13:	
  3738-­‐3741	
  
Predicciones Automatizadas
Evidencia Experimental
Manual Annotation – to the rescue. 32
cDNAs,	
  búsquedas	
  con	
  HMM,	
  RNAseq,	
  	
  
genes	
  de	
  otras	
  especies.	
  
Entonces,	
  es	
  necesario	
  refinar	
  las	
  
predicciones	
  de	
  elementos	
  biológicos	
  
codificados	
  en	
  el	
  genoma,	
  lo	
  que	
  requiere	
  
una	
  cuidadosa	
  revisión.	
  
33
BIOCURACION

ajustes estructurales y funcionales
Iden(ficar	
  los	
  elementos	
  del	
  genoma	
  
que	
  mejor	
  representan	
  la	
  biología	
  
subyacente	
  y	
  eliminar	
  los	
  elementos	
  
que	
  reflejan	
  errores	
  sistémicos	
  de	
  los	
  
análisis	
  automa(zados.	
  
Asignar	
  funciones	
  a	
  través	
  de	
  análisis	
  
compara(vos	
  entre	
  elementos	
  
genómicos	
  similares	
  de	
  organismos	
  
cercanamente	
  relacionados	
  usando	
  
literatura,	
  bases	
  de	
  datos,	
  y	
  datos	
  
experimentales.	
  
BIOCURACION
hAp://GeneOntology.org	
  
1	
  
2	
  
MANUAL ANNOTATION 34
PERO, EN CURACION

no siempre era posible ampliar estos esfuerzos
Researchers	
  on	
  their	
  own;	
  
may	
  or	
  may	
  not	
  publicize	
  
results;	
  may	
  be	
  a	
  dead-­‐end	
  
with	
  very	
  few	
  people	
  ever	
  
aware	
  of	
  these	
  results.	
  
Elsik	
  et	
  al.	
  2006.	
  Genome	
  Res.	
  16(11):1329-­‐33.	
  
Too	
  many	
  sequences	
  and	
  not	
  enough	
  hands.	
  
A	
  small	
  group	
  of	
  highly	
  
trained	
  experts	
  (e.g.	
  GO).	
  
1	
   Museum	
  
A	
  few	
  very	
  good	
  biologists,	
  a	
  	
  
few	
  very	
  good	
  bioinforma(cians	
  
camping	
  together	
  for	
  intense	
  but	
  
short	
  periods	
  of	
  (me.	
  
Jamboree	
  2	
  
Co"age	
  3	
  
ANOTACION

un ejercicio en colaboración
COLABORANDO 35
Los	
  inves1gadores	
  usualmente	
  buscamos	
  las	
  
opiniones	
  y	
  percepciones	
  de	
  colegas	
  con	
  
experiencia	
  en	
  áreas	
  específicas	
  del	
  
conocimiento.	
  	
  
Por	
  ejemplo,	
  dominios	
  conservados	
  	
  
o	
  familias	
  de	
  genes.	
  
Apollo

una herramienta para editar anotaciones
36
v  En	
  la	
  web,	
  integrado	
  con	
  JBrowse.	
  
v  ¡Permite	
  la	
  colaboración	
  en	
  (empo	
  real!	
  
v  Automá(camente	
  genera	
  datos	
  en	
  	
  
formatos	
  comunes	
  para	
  análisis.	
  
v  Anotación	
  manual	
  de	
  genes,	
  pseudogenes,	
  tRNAs,	
  	
  
snRNAs,	
  snoRNAs,	
  ncRNAs,	
  miRNAs,	
  TEs,	
  y	
  fragmentos	
  repe((vos.	
  
v  Funciones	
  intui(vas	
  y	
  menús	
  desplegables	
  crean	
  y	
  editan	
  estructuras	
  
de	
  transcritos	
  y	
  exones,	
  insertan	
  comentarios	
  (CV,	
  texto	
  libre),	
  y	
  
términos	
  de	
  GO,	
  etc.	
  
INTRODUCING APOLLO
hAp://GenomeArchitect.org/	
  
ARQUITECTURA

simple, flexible
ARCHITECTURE 37
Cliente	
  de	
  web	
  +	
  Motor	
  de	
  edición	
  de	
  anotaciones	
  +	
  Servicio	
  de	
  datos	
  en	
  el	
  servidor	
  
REST / JSON
Websockets
Motor de Anotación (Servidor)
Shiro
LDAP
OAuth
Annotations
Security
Preferences
Organisms
Tracks
BAM
BED
VCF
GFF3
BigWig
Curadores
Google Web Toolkit (GWT) /
Bootstrap
JBrowse DOJO / jQuery Datos a JBrowse
Organismo 1
Carga de datos con
evidencia genómica
para cada organismo
Servicio único de almacenamiento
PostgreSQL, MySQL, MongoDB,
ElasticSearch
Apollo v2.0
Datos a JBrowse
Organismo 2
CLIENTE DE WEB

panel del curador
ARCHITECTURE 38
Motor de Anotación (Servidor)
Curadores
Google Web Toolkit (GWT) / Bootstrap
JBrowse
DOJO /
jQuery
Apollo v2.0
BAM
BED
VCF
GFF3
BigWig
REST / JSON
Websockets
Usa GWT/Bootstrap en el
frente para proveerle
un comportamiento
versátil a la aplicación.
Panel Del Curador
¡NUEVO!
¡NUEVO!
MOTOR DE ANOTACION

lógica de edición
ARCHITECTURE 39
Motor de Anotación (Servidor)
Shiro
LDAP
OAuth
Datos a JBrowse
Organismo 2
Datos a JBrowse
Organismo 1
Servicio único de almacenamiento
Apollo v2.0
Controladores Grails (J2EE servlet)
llevan las solicitudes al directorio
de datos apropiado para cada
organismo en JBrowse
Carga de datos con evidencia
genómica para cada organismo
¡NUEVO!
Cliente de web
REST / JSON
Websockets
SERVICIO DE DATOS EN EL SERVIDOR

servicio único de almacenamiento
ARCHITECTURE 40
Anotaciones
Seguridad
Preferencias
Organismos
Pistas de datos
Servicio único de almacenamiento
PostgreSQL, MySQL, MongoDB,
ElasticSearch
Motor de Anotación (Servidor)
Un solo servicio de almacenamiento,
consultable, para guardar las
anotaciones. ¡NUEVO!
Apollo v2.0
¡COLABOREMOS!

Apollo tiene código abierto y es expandible
HIGHLIGHTED IMPROVEMENTS 41
The Genome Sequence Annotation Server (GenSAS)
Annotate
Los	
  usuarios	
  pueden	
  adicionar	
  programas	
  para	
  permi=r	
  sus	
  propios	
  procesos	
  de	
  
trabajo.	
  
Ejemplos:	
  	
  
•  GenSAS:	
  plataforma	
  para	
  
anotación	
  estructural	
  del	
  
genoma.	
  
	
  
•  i5K:	
  
-­‐	
  Espacio	
  en	
  NAL	
  para	
  compar(r	
  
ensamblajes	
  y	
  conjuntos	
  de	
  
genes,	
  y	
  para	
  anotación	
  manual.	
  
-­‐	
  Proyecto	
  Piloto	
  >40	
  genomas:	
  
47	
  charlas,	
  9	
  posters	
  en	
  
Simposio	
  de	
  Genómica	
  de	
  
Artrópodos.	
  
Annotate
National
Agricultural
Library
We	
  train	
  and	
  support	
  hundreds	
  of	
  geographically	
  dispersed	
  scien(sts	
  from	
  
diverse	
  research	
  communi(es	
  to	
  conduct	
  manual	
  annota(ons,	
  to	
  recover	
  
coding	
  sequences	
  in	
  agreement	
  with	
  all	
  available	
  biological	
  evidence	
  using	
  
Apollo.	
  	
  
	
  
v  Gate	
  keeping	
  and	
  monitoring.	
  
v  Tutorials,	
  training	
  workshops,	
  and	
  “geneborees”.	
  
42
DISPERSED COMMUNITIES
collaborative manual annotation efforts
APOLLO
LESSONS LEARNED

What	
  we	
  have	
  learned:	
  	
  
•  Collabora(ve	
  work	
  dis(lls	
  invaluable	
  knowledge	
  
•  We	
  must	
  enforce	
  strict	
  rules	
  and	
  formats	
  
•  We	
  must	
  evolve	
  with	
  the	
  data	
  
•  A	
  liAle	
  training	
  goes	
  a	
  long	
  way	
  
•  NGS	
  poses	
  addi(onal	
  challenges	
  
LESSONS LEARNED 43
¿Cuál	
  es	
  la	
  tarea	
  del	
  
curador?	
  
Becoming Acquainted with Web Apollo
45 | 45	
GENERAL PROCESS OF CURATION

main steps to remember
1.  Select	
  or	
  find	
  a	
  region	
  of	
  interest,	
  e.g.	
  scaffold.	
  
2.  Select	
  appropriate	
  evidence	
  tracks	
  to	
  review	
  the	
  gene	
  model.	
  
3.  Determine	
  whether	
  a	
  feature	
  in	
  an	
  exis(ng	
  evidence	
  track	
  
will	
  provide	
  a	
  reasonable	
  gene	
  model	
  to	
  start	
  working.	
  
4.  If	
  necessary,	
  adjust	
  the	
  gene	
  model.	
  
5.  Check	
  your	
  edited	
  gene	
  model	
  for	
  integrity	
  and	
  accuracy	
  by	
  
comparing	
  it	
  with	
  available	
  homologs.	
  
6.  Comment	
  and	
  finish.	
  
46CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

annotators: that’s you!
v  Annota=ng	
  a	
  simple	
  case:	
  WHEN	
  “The	
  official	
  predic(on	
  is	
  correct,	
  or	
  nearly	
  
correct,	
  assuming	
  that	
  no	
  aligned	
  data	
  extends	
  beyond	
  the	
  gene	
  model	
  and	
  
if	
  so,	
  it	
  is	
  not	
  likely	
  to	
  be	
  coding	
  sequence,	
  and/or	
  the	
  gene	
  predic(on	
  
matches	
  what	
  you	
  know	
  about	
  the	
  gene”:	
  
a.  Can	
  you	
  add	
  UTRs?	
  	
  
b.  Check	
  exon	
  structures.	
  
c.  Check	
  splice	
  sites:	
  …]5’-­‐GT/AG-­‐3’[…	
  
d.  Check	
  ‘start’	
  and	
  ‘stop’	
  sites.	
  
e.  Check	
  the	
  predicted	
  protein	
  product(s).	
  
f.  If	
  the	
  protein	
  product	
  s(ll	
  does	
  not	
  look	
  correct,	
  go	
  on	
  to	
  “Annota(ng	
  
more	
  complex	
  cases”.	
  	
  
47CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

continued
v  Addi=onal	
  func=onality.	
  You	
  may	
  also	
  need	
  to	
  learn	
  how	
  to:	
  
a.  Get	
  genomic	
  sequence	
  	
  
b.  Merge	
  exons	
  	
  
c.  Add/Delete	
  an	
  exon	
  	
  
d.  Create	
  an	
  exon	
  de	
  novo	
  (within	
  an	
  intron	
  or	
  outside	
  exis(ng	
  
annota(ons).	
  
e.  Right/apple-­‐click	
  on	
  a	
  feature	
  to	
  get	
  feature	
  ID	
  and	
  addi(onal	
  
informa(on	
  	
  
f.  Looking	
  up	
  homolog	
  descrip(ons	
  going	
  to	
  the	
  accession	
  web	
  page	
  at	
  
UniProt/Swissprot	
  	
  
48CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

continued
v  Annota=ng	
  more	
  complex	
  cases:	
  	
  
a.  Incomplete	
  annota(on:	
  protein	
  integrity	
  checks,	
  indicate	
  gaps,	
  missing	
  5’	
  
sequences	
  or	
  missing	
  3’	
  sequences.	
  	
  
b.  Merge	
  of	
  2	
  gene	
  predic(ons	
  on	
  same	
  scaffold	
  	
  
c.  Merge	
  of	
  2	
  gene	
  predic(ons	
  on	
  different	
  scaffolds	
  (uh-­‐oh!).	
  
d.  Split	
  of	
  a	
  gene	
  predic(on	
  	
  
e.  Frameshios,	
  Selenocysteine,	
  single-­‐base	
  errors,	
  and	
  other	
  inconvenient	
  
phenomena	
  	
  
49CURATING GENOMES
WHAT ANNOTATORS SHOULD LOOK FOR

continued
v  Adding	
  important	
  project	
  informa=on	
  in	
  the	
  form	
  of	
  Canned	
  and/or	
  
Customized	
  Comments:	
  
a.  NCBI	
  ID,	
  RefSeq	
  ID,	
  gene	
  symbol(s),	
  common	
  name(s),	
  synonyms,	
  top	
  
BLAST	
  hits	
  (GenBank	
  IDs),	
  orthologs	
  with	
  species	
  names,	
  and	
  anything	
  
else	
  you	
  can	
  think	
  of,	
  because	
  you	
  are	
  the	
  expert.	
  
b.  Type	
  of	
  annota(on	
  (e.g.:	
  whether	
  or	
  not	
  the	
  gene	
  model	
  was	
  changed)	
  	
  
c.  Data	
  source	
  (for	
  example	
  if	
  the	
  Fgeneshpp	
  predicted	
  gene	
  was	
  the	
  
star(ng	
  point	
  for	
  your	
  annota(on)	
  
d.  The	
  kinds	
  of	
  changes	
  you	
  made	
  to	
  the	
  gene	
  model,	
  e.g.:	
  split,	
  merge	
  
e.  Func(onal	
  descrip(on	
  
f.  Whether	
  you	
  would	
  like	
  for	
  your	
  MOD	
  curator	
  to	
  check	
  the	
  annota(on	
  
g.  Whether	
  part	
  of	
  your	
  gene	
  is	
  on	
  a	
  different	
  scaffold.	
  
50
TRAINING CURATORS

a little training goes a long way!
Provided	
  with	
  adequate	
  tools,	
  wet	
  lab	
  scien(sts	
  make	
  
excep(onal	
  curators	
  who	
  can	
  easily	
  learn	
  to	
  maximize	
  the	
  
genera(on	
  of	
  accurate,	
  biologically	
  supported	
  gene	
  models.	
  
APOLLO
Conozcamos	
  a	
  Apollo	
  
	
  hAp://genomearchitect.org/web_apollo_user_guide	
  
52
Apollo

ámbito de edición para anotaciones
BECOMING ACQUAINTED WITH APOLLO
Color	
  por	
  marco	
  de	
  lectura,	
  
alternar	
  cadena,	
  cambiar	
  
esquema	
  de	
  color,	
  resaltador	
  
Cargar	
  evidencia	
  experimental	
  
(GFF3,	
  BAM,	
  BigWig),	
  pistas	
  de	
  
datos	
  de	
  combinación	
  y	
  	
  
búsqueda.	
  
Interrogar	
  el	
  genoma	
  
usando	
  BLAT.	
  
Navegación	
  y	
  zoom.	
  
Buscar	
  un	
  gen	
  o	
  un	
  grupo	
  
Obtener	
  coordenadas,	
  y	
  hacer	
  
zoom	
  con	
  “selección	
  elás(ca”	
  
Login	
  
Anotaciones	
  
creadas	
  por	
  el	
  
usuario	
  
Panel	
  del	
  
curador	
  
Pistas	
  de	
  
datos	
  de	
  
evidencia	
  
Datos	
  
transcriptómicos	
  de	
  
estadío	
  y	
  	
  (po	
  
celular	
  	
  
específicos.	
  
¡Ahora	
  juguemos!	
  
Instrucciones
54 | 54	
APOLLO EN LA WEB

instrucciones
Email:	
  
nombre.apellido@example.com	
  
	
  
Contraseña:	
  
nombreapellido	
  
Email	
   Contraseña	
   Servidor	
   Empezar	
  en	
  
user.one@example.com	
   userone	
   1	
   1	
  
user.two@example.com	
   usertwo	
   2	
   1	
  
user.three@example.com	
   userthree	
   3	
   1	
  
user.four@example.com	
   userfour	
   4	
   1	
  
user.five@example.com	
   userfive	
   5	
   1	
  
user.six@example.com	
   usersix	
   1	
   7	
  
user.seven@example.com	
   userseven	
   2	
   7	
  
user.eight@example.com	
   usereight	
   3	
   7	
  
user.nine@example.com	
   usernine	
   4	
   7	
  
user.ten@example.com	
   userten	
   5	
   7	
  
user.eleven@example.com	
   usereleven	
   1	
   1	
  
user.twelve@example.com	
   usertwelve	
   2	
   1	
  
user.thirteen@example.com	
   userthirteen	
   3	
   1	
  
user.fourteen@example.com	
   userfourteen	
   4	
   1	
  
user.fioeen@example.com	
   userfioeen	
   5	
   1	
  
user.sixteen@example.com	
   usersixteen	
   1	
   7	
  
user.seventeen@example.com	
   userseventeen	
   2	
   7	
  
user.eighAeen@example.com	
   usereighteen	
   3	
   7	
  
user.nineteen@example.com	
   usernineteen	
   4	
   7	
  
user.twenty@example.com	
   usertwenty	
   5	
   7	
  
user.twentyone@example.com	
   usertwentyone	
   1	
   1	
  
user.twentytwo@example.com	
   usertwentytwo	
   2	
   1	
  
user.twentythree@example.com	
   usertwentythree	
   3	
   1	
  
user.twentyfour@example.com	
   usertwentyfour	
   4	
   1	
  
user.twentyfive@example.com	
   usertwentyfive	
   5	
   1	
  
user.twentysix@example.com	
   usertwentysix	
   1	
   7	
  
user.twentyseven@example.com	
   usertwentyseven	
   2	
   7	
  
user.twentyeight@example.com	
   usertwentyeight	
   3	
   7	
  
user.twentynine@example.com	
   usertwentynine	
   4	
   7	
  
Servidor	
   URL	
  
1	
  hAp://54.94.132.228:8080/apollo/annotator/index	
  
2	
  hAp://54.207.71.112:8080/apollo/annotator/index	
  
3	
  hAp://54.207.106.136:8080/apollo/annotator/index	
  
4	
  hAp://54.207.113.253:8080/apollo/annotator/index	
  
5	
  hAp://54.232.217.84:8080/apollo/annotator/index	
  
Funcionalidad	
  y	
  navegación.	
  
56
Apollo

funcionalidad
BECOMING ACQUAINTED WITH APOLLO
•  Agregar:	
  	
  
–  IDs	
  de	
  bases	
  de	
  datos	
  públicas	
  (e.g.	
  
GenBank,	
  usando	
  DBXRef);	
  símbolo(s)	
  
de	
  cada	
  gen,	
  nombre(s)	
  común(es),	
  
sinónimos,	
  el	
  mejor	
  resultado	
  de	
  BLAST,	
  
ortólogos	
  con	
  el	
  nombre	
  de	
  la	
  especie.	
  
–  Asignaciones	
  de	
  función	
  apropiadas	
  
(e.g.	
  via	
  datos	
  de	
  RNA-­‐Seq,	
  búsquedas	
  
de	
  literatura,	
  búsquedas	
  con	
  HMMs,	
  
etc.)	
  
–  Comentarios	
  acerca	
  de	
  las	
  
modificaciones	
  que	
  se	
  realizaron,	
  o	
  si	
  
ninguna	
  fue	
  necesaria.	
  
–  Y	
  otras	
  notas	
  que	
  se	
  le	
  ocurran	
  al	
  
biocurador.	
  	
  
•  Corregir	
  si(os	
  de	
  ‘Inicio’	
  y	
  ‘Parada’	
  
•  Arreglar	
  si(os	
  de	
  ayuste	
  no	
  canónicos	
  
•  Anotar	
  UTRs	
  (e.g.:	
  usando	
  RNA-­‐Seq)	
  
•  Obtener	
  &	
  corregir	
  predicciones	
  de	
  
productos	
  de	
  proteínas	
  
-­‐	
  Alinearlos	
  con	
  genes	
  o	
  familias	
  de	
  
genes	
  relevantes.	
  
-­‐	
  Usar	
  blastp	
  en	
  RefSeq	
  o	
  nr	
  de	
  NCBI	
  
•  Revisar	
  la	
  falta	
  de	
  datos	
  en	
  el	
  ensamble	
  
•  Unir	
  2	
  predicciones	
  de	
  genes	
  en	
  el	
  
mismo	
  grupo	
  
•  Dividir	
  una	
  predicción	
  de	
  gen	
  
•  Corregir	
  desplazamientos	
  de	
  la	
  pauta	
  
de	
  lectura,	
  y	
  otros	
  errores	
  en	
  el	
  
ensamblaje	
  
•  Anotar	
  selenocisteínas,	
  errores	
  de	
  una	
  
sola	
  base,	
  etc.	
  
REMOVABLE SIDE DOCK

with customizable tabs
HIGHLIGHTED IMPROVEMENTS 57
Annotations Organism Users Groups AdminTracks
Reference
Sequence
EDITS & EXPORTS

annotation details, exon boundaries, data export
HIGHLIGHTED IMPROVEMENTS 58
1 2
Annotations
1
2
HIGHLIGHTED IMPROVEMENTS 59
Reference
Sequences
3
FASTA	
  
GFF3	
  
EDITS & EXPORTS

annotation details, exon boundaries, data export
3
60 | 60	
Becoming Acquainted with Web Apollo.
USER NAVIGATION
Annotator	
  
panel.	
  
•  Choose appropriate evidence tracks from list on annotator panel.
•  Select & drag elements from evidence track into the ‘User-created Annotations’ area.
•  Hovering over annotation in progress brings up an information pop-up.
61 | 61	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
•  Annotation right-click menu
62	
Annota(ons,	
  annota(on	
  edits,	
  and	
  History:	
  stored	
  in	
  a	
  centralized	
  database.	
  
62	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
63	
The	
  Annota(on	
  Informa=on	
  Editor	
  
DBXRefs	
  are	
  database	
  crossed	
  references:	
  if	
  you	
  have	
  
reason	
  to	
  believe	
  that	
  this	
  gene	
  is	
  linked	
  to	
  a	
  gene	
  in	
  a	
  
public	
  database	
  (including	
  your	
  own),	
  then	
  add	
  it	
  here.	
  
63	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
64	
The	
  Annota(on	
  Informa=on	
  Editor	
  
•  Add	
  PubMed	
  IDs	
  
•  Include	
  GO	
  terms	
  as	
  appropriate	
  
from	
  any	
  of	
  the	
  three	
  ontologies	
  
•  Write	
  comments	
  sta(ng	
  how	
  you	
  
have	
  validated	
  each	
  model.	
  
64	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
65 | 65	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
•  ‘Zoom	
  to	
  base	
  level’	
  op(on	
  reveals	
  the	
  DNA	
  Track.	
  
66 | 66	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
•  Color	
  exons	
  by	
  CDS	
  from	
  the	
  ‘View’	
  menu.	
  
67 |
Zoom	
  in/out	
  with	
  keyboard:	
  
shio	
  +	
  arrow	
  keys	
  up/down	
  
67	
USER NAVIGATION
Becoming Acquainted with Web Apollo.
•  Toggle	
  reference	
  DNA	
  sequence	
  and	
  transla=on	
  frames	
  in	
  forward	
  
strand.	
  Toggle	
  models	
  in	
  either	
  direc(on.	
  
Anotación	
  
casos	
  simples	
  
“Simple	
  case”:	
  	
  
	
  -­‐	
  the	
  predicted	
  gene	
  model	
  is	
  correct	
  or	
  nearly	
  correct,	
  and	
  	
  
	
  -­‐	
  this	
  model	
  is	
  supported	
  by	
  evidence	
  that	
  completely	
  or	
  mostly	
  
agrees	
  with	
  the	
  predic(on.	
  	
  
	
  -­‐	
  evidence	
  that	
  extends	
  beyond	
  the	
  predicted	
  model	
  is	
  assumed	
  
to	
  be	
  non-­‐coding	
  sequence.	
  	
  
	
  
The	
  following	
  are	
  simple	
  modifica(ons.	
  	
  
	
  
70 | 70	
ANNOTATING SIMPLE CASES
Becoming Acquainted with Web Apollo. SIMPLE CASES
71 |
•  A	
  confirma(on	
  box	
  will	
  warn	
  you	
  if	
  the	
  receiving	
  transcript	
  is	
  not	
  on	
  the	
  
same	
  strand	
  as	
  the	
  feature	
  where	
  the	
  new	
  exon	
  originated.	
  
•  Check	
  ‘Start’	
  and	
  ‘Stop’	
  signals	
  aoer	
  each	
  edit.	
  
71	
ADDING EXONS
Becoming Acquainted with Web Apollo. SIMPLE CASES
If	
  transcript	
  alignment	
  data	
  are	
  available	
  and	
  extend	
  beyond	
  your	
  original	
  annota(on,	
  you	
  
may	
  extend	
  or	
  add	
  UTRs.	
  	
  
1.  Right	
  click	
  at	
  the	
  exon	
  edge	
  and	
  ‘Zoom	
  to	
  base	
  level’.	
  	
  
2.  Place	
  the	
  cursor	
  over	
  the	
  edge	
  of	
  the	
  exon	
  un1l	
  it	
  becomes	
  a	
  black	
  arrow	
  then	
  click	
  
and	
  drag	
  the	
  edge	
  of	
  the	
  exon	
  to	
  the	
  new	
  coordinate	
  posi(on	
  that	
  includes	
  the	
  UTR.	
  	
  
72 |
To	
  add	
  a	
  new	
  spliced	
  UTR	
  to	
  an	
  exis(ng	
  annota(on	
  
follow	
  the	
  procedure	
  for	
  adding	
  an	
  exon.	
  
72	
ADDING UTRs
Becoming Acquainted with Web Apollo. SIMPLE CASES
1.  Zoom	
  in	
  to	
  clearly	
  resolve	
  each	
  exon	
  as	
  a	
  dis(nct	
  rectangle.	
  	
  
2.  Two	
  exons	
  from	
  different	
  tracks	
  sharing	
  the	
  same	
  start	
  and/or	
  end	
  
coordinates	
  will	
  display	
  a	
  red	
  bar	
  to	
  indicate	
  matching	
  edges.	
  
3.  Selec(ng	
  the	
  whole	
  annota(on	
  or	
  one	
  exon	
  at	
  a	
  (me,	
  use	
  this	
  ‘edge-­‐
matching’	
  func(on	
  and	
  scroll	
  along	
  the	
  length	
  of	
  the	
  annota(on,	
  
verifying	
  exon	
  boundaries	
  against	
  available	
  data.	
  Use	
  square	
  [	
  ]	
  
brackets	
  to	
  scroll	
  from	
  exon	
  to	
  exon.	
  
4.  Check	
  if	
  cDNA	
  /	
  RNAseq	
  reads	
  lack	
  one	
  or	
  more	
  of	
  the	
  annotated	
  
exons	
  or	
  include	
  addi(onal	
  exons.	
  	
  
	
  
73 | 73	
CHECK EXON INTEGRITY
Becoming Acquainted with Web Apollo. SIMPLE CASES
To	
  modify	
  an	
  exon	
  boundary	
  and	
  match	
  
data	
   in	
   the	
   evidence	
   tracks:	
   select	
  
both	
   the	
   offending	
   exon	
   and	
   the	
  
feature	
  with	
  the	
  expected	
  boundary,	
  
then	
  right	
  click	
  on	
  the	
  annota(on	
  to	
  
select	
  ‘Set	
  3’	
  end’	
  or	
  ‘Set	
  5’	
  end’	
  as	
  
appropriate.	
  
	
  
74 |
In	
  some	
  cases	
  all	
  the	
  data	
  may	
  disagree	
  with	
  the	
  annota(on,	
  in	
  
other	
  cases	
  some	
  data	
  support	
  the	
  annota(on	
  and	
  some	
  of	
  the	
  
data	
  support	
  one	
  or	
  more	
  alterna(ve	
  transcripts.	
  Try	
  to	
  annotate	
  
as	
  many	
  alterna(ve	
  transcripts	
  as	
  are	
  well	
  supported	
  by	
  the	
  data.	
  
74	
EXON STRUCTURE INTEGRITY
Becoming Acquainted with Web Apollo. SIMPLE CASES
Flags	
  non-­‐canonical	
  
splice	
  sites.	
  
Selec(on	
  of	
  features	
  and	
  sub-­‐
features	
  
Edge-­‐matching	
  
Evidence	
  Tracks	
  Area	
  
‘User-­‐created	
  Annota(ons’	
  Track	
  
Apollo’s	
  edi(ng	
  logic	
  (brain):	
  	
  
§  selects	
  longest	
  ORF	
  as	
  CDS	
  
§  flags	
  non-­‐canonical	
  splice	
  sites	
  
75	
ORFs AND SPLICE SITES
Becoming Acquainted with Web Apollo. SIMPLE CASES
76 |
Exon/intron	
  junc(on	
  possible	
  error	
  
Original	
  model	
  
Curated	
  model	
  
Non-­‐canonical	
   splices	
   are	
   indicated	
   by	
   an	
  
orange	
   circle	
   with	
   a	
   white	
   exclama(on	
   point	
  
inside,	
   placed	
   over	
   the	
   edge	
   of	
   the	
   offending	
  
exon.	
  	
  
Canonical	
  splice	
  sites:	
  
3’-­‐…exon]GA	
  /	
  TG[exon…-­‐5’	
  
5’-­‐…exon]GT	
  /	
  AG[exon…-­‐3’	
  
reverse	
  strand,	
  not	
  reverse-­‐complemented:	
  
forward	
  strand	
  
76	
SPLICE SITES
Becoming Acquainted with Web Apollo. SIMPLE CASES
Zoom	
  to	
  review	
  non-­‐canonical	
  
splice	
  site	
  warnings.	
  Although	
  
these	
  may	
  not	
  always	
  have	
  to	
  be	
  
corrected	
  (e.g	
  GC	
  donor),	
  they	
  
should	
  be	
  flagged	
  with	
  the	
  
appropriate	
  comment.	
  	
  
Web	
  Apollo	
  calculates	
  the	
  longest	
  possible	
  open	
  
reading	
  frame	
  (ORF)	
  that	
  includes	
  canonical	
  ‘Start’	
  
and	
  ‘Stop’	
  signals	
  within	
  the	
  predicted	
  exons.	
  	
  
If	
  ‘Start’	
  appears	
  to	
  be	
  incorrect,	
  modify	
  it	
  by	
  selec(ng	
  
an	
  in-­‐frame	
  ‘Start’	
  codon	
  further	
  up	
  or	
  
downstream,	
  depending	
  on	
  evidence	
  (protein	
  
database,	
  addi(onal	
  evidence	
  tracks).	
  	
  
	
  
It	
  may	
  be	
  present	
  outside	
  the	
  predicted	
  gene	
  
model,	
  within	
  a	
  region	
  supported	
  by	
  another	
  
evidence	
  track.	
  
	
  
In	
  very	
  rare	
  cases,	
  the	
  actual	
  ‘Start’	
  codon	
  may	
  be	
  
non-­‐canonical	
  (non-­‐ATG).	
  	
  
77 | 77	
‘START’ AND ‘STOP’ SITES
Becoming Acquainted with Web Apollo. SIMPLE CASES
complex	
  cases	
  
Evidence	
  may	
  support	
  joining	
  two	
  or	
  more	
  different	
  gene	
  models.	
  	
  
Warning:	
  protein	
  alignments	
  may	
  have	
  incorrect	
  splice	
  sites	
  and	
  lack	
  non-­‐conserved	
  regions!	
  
	
  
1.  In	
  ‘User-­‐created	
  Annota=ons’	
  area	
  shio-­‐click	
  to	
  select	
  an	
  intron	
  from	
  each	
  gene	
  model	
  and	
  
right	
  click	
  to	
  select	
  the	
  ‘Merge’	
  op(on	
  from	
  the	
  menu.	
  	
  
2.  Drag	
  suppor(ng	
  evidence	
  tracks	
  over	
  the	
  candidate	
  models	
  to	
  corroborate	
  overlap,	
  or	
  
review	
  edge	
  matching	
  and	
  coverage	
  across	
  models.	
  
3.  Check	
  the	
  resul(ng	
  transla(on	
  by	
  querying	
  a	
  protein	
  database	
  e.g.	
  UniProt.	
  Add	
  comments	
  
to	
  record	
  that	
  this	
  annota(on	
  is	
  the	
  result	
  of	
  a	
  merge.	
  
79 | 79	
Red	
  lines	
  around	
  exons:	
  
‘edge-­‐matching’	
  allows	
  annotators	
  to	
  confirm	
  whether	
  the	
  
evidence	
  is	
  in	
  agreement	
  without	
  examining	
  each	
  exon	
  at	
  the	
  
base	
  level.	
  
COMPLEX CASES
merge two gene predictions on the same scaffold
Becoming Acquainted with Web Apollo. COMPLEX CASES
One	
  or	
  more	
  splits	
  may	
  be	
  recommended	
  when:	
  	
  
-­‐	
  different	
  segments	
  of	
  the	
  predicted	
  protein	
  align	
  to	
  two	
  or	
  more	
  
different	
  gene	
  families	
  	
  
-­‐	
  predicted	
  protein	
  doesn’t	
  align	
  to	
  known	
  proteins	
  over	
  its	
  en(re	
  length	
  	
  
Transcript	
  data	
  may	
  support	
  a	
  split,	
  but	
  first	
  verify	
  whether	
  they	
  are	
  
alterna(ve	
  transcripts.	
  	
  
80 | 80	
COMPLEX CASES
split a gene prediction
Becoming Acquainted with Web Apollo. COMPLEX CASES
DNA	
  Track	
  
‘User-­‐created	
  Annota=ons’	
  Track	
  
81	
COMPLEX CASES
correcting frameshifts and single-base errors
Becoming Acquainted with Web Apollo. COMPLEX CASES
Always	
  remember:	
  when	
  annota(ng	
  gene	
  models	
  using	
  Apollo,	
  you	
  are	
  looking	
  at	
  a	
  ‘frozen’	
  version	
  of	
  
the	
  genome	
  assembly	
  and	
  you	
  will	
  not	
  be	
  able	
  to	
  modify	
  the	
  assembly	
  itself.	
  
82	
COMPLEX CASES
correcting selenocysteine containing proteins
Becoming Acquainted with Web Apollo. COMPLEX CASES
83	
COMPLEX CASES
correcting selenocysteine containing proteins
Becoming Acquainted with Web Apollo. COMPLEX CASES
1.  Apollo	
  allows	
  annotators	
  to	
  make	
  single	
  base	
  modifica(ons	
  or	
  frameshios	
  that	
  are	
  reflected	
  in	
  
the	
  sequence	
  and	
  structure	
  of	
  any	
  transcripts	
  overlapping	
  the	
  modifica(on.	
  These	
  
manipula(ons	
  do	
  NOT	
  change	
  the	
  underlying	
  genomic	
  sequence.	
  	
  
2.  If	
  you	
  determine	
  that	
  you	
  need	
  to	
  make	
  one	
  of	
  these	
  changes,	
  zoom	
  in	
  to	
  the	
  nucleo(de	
  level	
  
and	
  right	
  click	
  over	
  a	
  single	
  nucleo(de	
  on	
  the	
  genomic	
  sequence	
  to	
  access	
  a	
  menu	
  that	
  
provides	
  op(ons	
  for	
  crea(ng	
  inser(ons,	
  dele(ons	
  or	
  subs(tu(ons.	
  	
  
3.  The	
  ‘Create	
  Genomic	
  Inser=on’	
  feature	
  will	
  require	
  you	
  to	
  enter	
  the	
  necessary	
  string	
  of	
  
nucleo(de	
  residues	
  that	
  will	
  be	
  inserted	
  to	
  the	
  right	
  of	
  the	
  cursor’s	
  current	
  loca(on.	
  The	
  
‘Create	
  Genomic	
  Dele=on’	
  op(on	
  will	
  require	
  you	
  to	
  enter	
  the	
  length	
  of	
  the	
  dele(on,	
  star(ng	
  
with	
  the	
  nucleo(de	
  where	
  the	
  cursor	
  is	
  posi(oned.	
  The	
  ‘Create	
  Genomic	
  Subs=tu=on’	
  feature	
  
asks	
  for	
  the	
  string	
  of	
  nucleo(de	
  residues	
  that	
  will	
  replace	
  the	
  ones	
  on	
  the	
  DNA	
  track.	
  
4.  Once	
  you	
  have	
  entered	
  the	
  modifica(ons,	
  Apollo	
  will	
  recalculate	
  the	
  corrected	
  transcript	
  and	
  
protein	
  sequences,	
  which	
  will	
  appear	
  when	
  you	
  use	
  the	
  right-­‐click	
  menu	
  ‘Get	
  Sequence’	
  
op(on.	
  Since	
  the	
  underlying	
  genomic	
  sequence	
  is	
  reflected	
  in	
  all	
  annota(ons	
  that	
  include	
  the	
  
modified	
  region	
  you	
  should	
  alert	
  the	
  curators	
  of	
  your	
  organisms	
  database	
  using	
  the	
  
‘Comments’	
  sec(on	
  to	
  report	
  the	
  CDS	
  edits.	
  	
  
5.  In	
  special	
  cases	
  such	
  as	
  selenocysteine	
  containing	
  proteins	
  (read-­‐throughs),	
  right-­‐click	
  over	
  the	
  
offending/premature	
  ‘Stop’	
  signal	
  and	
  choose	
  the	
  ‘Set	
  readthrough	
  stop	
  codon’	
  op(on	
  from	
  
the	
  menu.	
  
	
  84 | 84	
Becoming Acquainted with Web Apollo. COMPLEX CASES
COMPLEX CASES
correcting frameshifts, single-base errors, and selenocysteines
Follow	
  the	
  checklist	
  un(l	
  you	
  are	
  happy	
  with	
  the	
  annota(on!	
  
And	
  remember	
  to…	
  
–  comment	
  to	
  validate	
  your	
  annota(on,	
  even	
  if	
  you	
  made	
  no	
  changes	
  to	
  an	
  
exis(ng	
  model.	
  Think	
  of	
  comments	
  as	
  your	
  vote	
  of	
  confidence.	
  
	
  
–  or	
  add	
  a	
  comment	
  to	
  inform	
  the	
  community	
  of	
  unresolved	
  issues	
  you	
  
think	
  this	
  model	
  may	
  have.	
  
85 | 85	
Always	
  Remember:	
  Web	
  Apollo	
  cura(on	
  is	
  a	
  community	
  effort	
  so	
  
please	
  use	
  comments	
  to	
  communicate	
  the	
  reasons	
  for	
  your	
  	
  
annota(on	
  (your	
  comments	
  will	
  be	
  visible	
  to	
  everyone).	
  
COMPLETING THE ANNOTATION
Becoming Acquainted with Web Apollo.
Checklist	
  
1.  Can	
  you	
  add	
  UTRs	
  (e.g.:	
  via	
  RNA-­‐Seq)?	
  
2.  Check	
  exon	
  structures	
  
3.  Check	
  splice	
  sites:	
  most	
  splice	
  sites	
  display	
  these	
  
residues	
  …]5’-­‐GT/AG-­‐3’[…	
  
4.  Check	
  ‘Start’	
  and	
  ‘Stop’	
  sites	
  
5.  Check	
  the	
  predicted	
  protein	
  product(s)	
  
–  Align	
  it	
  against	
  relevant	
  genes/gene	
  family.	
  
–  blastp	
  against	
  NCBI’s	
  RefSeq	
  or	
  nr	
  
6.  If	
  the	
  protein	
  product	
  s(ll	
  does	
  not	
  look	
  correct	
  
then	
  check:	
  
–  Are	
  there	
  gaps	
  in	
  the	
  genome?	
  
–  Merge	
  of	
  2	
  gene	
  predic(ons	
  on	
  the	
  same	
  
scaffold	
  
–  Merge	
  of	
  2	
  gene	
  predic(ons	
  from	
  different	
  
scaffolds	
  	
  
–  Split	
  a	
  gene	
  predic(on	
  
–  FrameshiYs	
  	
  
–  error	
  in	
  the	
  genome	
  assembly?	
  
–  Selenocysteines,	
  single-­‐base	
  errors,	
  etc	
  
87 | 87	
7.  Finalize	
  annota(on	
  by	
  adding:	
  
–  Important	
  project	
  informa(on	
  in	
  the	
  form	
  of	
  
comments	
  
–  IDs	
  from	
  public	
  databases	
  e.g.	
  GenBank	
  (via	
  
DBXRef),	
  gene	
  symbol(s),	
  common	
  name(s),	
  
synonyms,	
  top	
  BLAST	
  hits,	
  orthologs	
  with	
  species	
  
names,	
  and	
  everything	
  else	
  you	
  can	
  think	
  of,	
  
because	
  you	
  are	
  the	
  expert.	
  
–  Whether	
  your	
  model	
  replaces	
  one	
  or	
  more	
  models	
  
from	
  the	
  official	
  gene	
  set	
  (so	
  it	
  can	
  be	
  deleted).	
  
–  The	
  kinds	
  of	
  changes	
  you	
  made	
  to	
  the	
  gene	
  model	
  
of	
  interest,	
  if	
  any.	
  	
  
–  Any	
  appropriate	
  func(onal	
  assignments	
  of	
  interest	
  
to	
  the	
  community	
  (e.g.	
  via	
  BLAST,	
  RNA-­‐Seq	
  data,	
  
literature	
  searches,	
  etc.)	
  
THE CHECKLIST
for accuracy and integrity
MANUAL ANNOTATION CHECKLIST
Example	
  
Example
Example 89
A	
  public	
  Apollo	
  Demo	
  using	
  the	
  Honey	
  Bee	
  genome	
  is	
  available	
  at	
  	
  
hAp://genomearchitect.org/WebApolloDemo	
  
-­‐	
  Demonstra(on	
  using	
  the	
  Hyalella	
  azteca	
  genome	
  
(amphipod	
  crustacean).	
  
What do we know about this genome?
•  Currently	
  publicly	
  available	
  data	
  at	
  NCBI:	
  
•  >37,000	
   	
  nucleo(de	
  seqsà	
  scaffolds,	
  mitochondrial	
  genes	
  
•  300	
   	
  amino	
  acid	
  seqsà	
  mitochondrion	
  
•  53 	
   	
  ESTs	
  
•  0	
   	
   	
  conserved	
  domains	
  iden(fied	
  
•  0 	
   	
  “gene”	
  entries	
  submiAed	
  
	
  
•  Data	
  at	
  i5K	
  Workspace@NAL	
  (annota(on	
  hosted	
  at	
  USDA)	
  	
  
-­‐	
  10,832	
  scaffolds:	
  23,288	
  transcripts:	
  12,906	
  proteins	
  
Example 90
PubMed Search: 

what’s new?
Example 91
PubMed Search: what’s new?
Example 92
“Ten	
  popula(ons	
  (3	
  cultures,	
  7	
  from	
  California	
  water	
  
bodies)	
  differed	
  by	
  at	
  least	
  550-­‐fold	
  in	
  sensi=vity	
  to	
  
pyrethroids.”	
  	
  
“By	
  sequencing	
  the	
  primary	
  pyrethroid	
  target	
  site,	
  the	
  
voltage-­‐gated	
  sodium	
  channel	
  (vgsc),	
  we	
  show	
  that	
  
point	
  muta(ons	
  and	
  their	
  spread	
  in	
  natural	
  popula(ons	
  
were	
  responsible	
  for	
  differences	
  in	
  pyrethroid	
  
sensi(vity.”	
  
“The	
  finding	
  that	
  a	
  non-­‐target	
  aqua(c	
  species	
  has	
  
acquired	
  resistance	
  to	
  pes(cides	
  used	
  only	
  on	
  terrestrial	
  
pests	
  is	
  troubling	
  evidence	
  of	
  the	
  impact	
  of	
  chronic	
  
pes=cide	
  transport	
  from	
  land-­‐based	
  applica(ons	
  into	
  
aqua(c	
  systems.”	
  
How many sequences for our gene of
interest?
Example 93
•  Para,	
  (voltage-­‐gated	
  sodium	
  channel	
  alpha	
  
subunit;	
  Nasonia	
  vitripennis).	
  	
  
•  NaCP60E	
  (Sodium	
  channel	
  protein	
  60	
  E;	
  D.	
  
melanogaster).	
  
–  MF:	
  voltage-­‐gated	
  ca(on	
  channel	
  ac(vity	
  
(IDA,	
  GO:0022843).	
  
–  BP:	
  olfactory	
  behavior	
  (IMP,	
  GO:
0042048),	
  sodium	
  ion	
  transmembrane	
  
transport	
  (ISS,GO:0035725).	
  
–  CC:	
  voltage-­‐gated	
  sodium	
  channel	
  
complex	
  (IEA,	
  GO:0001518).	
  
And	
  what	
  do	
  we	
  know	
  about	
  them?	
  
Retrieving sequences for 

sequence similarity searches.
Example 94
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT search
Example 95
>vgsc-­‐Segment3-­‐DomainII	
  
RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG
QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
BLAT search
Example 96
Customizations: 

high-scoring segment pairs (hsp) in “BLAST+ Results” track
Example 97
Creating a new gene model: drag and drop
Example 98
•  Apollo automatically calculates ORF.
In this case, ORF includes the high-scoring segment pairs (hsp).
Available Tracks
Example 99
Get Sequence
Example 100
http://blast.ncbi.nlm.nih.gov/Blast.cgi
Also, flanking sequences (other gene models) vs. NCBI nr
Example 101
In	
  this	
  case,	
  two	
  gene	
  
models	
  upstream,	
  at	
  5’	
  
end.	
  
BLAST	
  hsps	
  
Review alignments
Example 102
HaztTmpM006234	
  
HaztTmpM006233	
  
HaztTmpM006232	
  
Hypothesis for vgsc gene model
Example 103
Editing: merge the three models
Example 104
Merge	
  by	
  dropping	
  an	
  
exon	
  or	
  gene	
  model	
  
onto	
  another.	
  
Merge	
  by	
  selec(ng	
  
two	
  exons	
  (holding	
  
down	
  “Shio”)	
  and	
  
using	
  the	
  right	
  click	
  
menu.	
  
or…	
  
Editing: correct boundaries, delete exons
Example 105
Modify	
  exon	
  /	
  intron	
  
boundary:	
  	
  
-­‐  Drag	
  the	
  end	
  of	
  the	
  
exon	
  to	
  the	
  nearest	
  
canonical	
  splice	
  site.	
  
-­‐  Use	
  right-­‐click	
  menu.	
  
Delete	
  first	
  exon	
  from	
  
HaztTmpM006233	
  
Editing: set translation start
Example 106
Editing: modify boundaries
Example 107
Modify	
  intron	
  /	
  
exon	
  boundary	
  
also	
  at	
  coord.	
  
78,999.	
  
Finished model
Example 108
Corroborate	
  integrity	
  and	
  accuracy	
  of	
  the	
  model:	
  	
  
-­‐	
  Start	
  and	
  Stop	
  
-­‐	
  Exon	
  structure	
  and	
  splice	
  sites	
  …]5’-­‐GT/AG-­‐3’[…	
  
-­‐	
  Check	
  the	
  predicted	
  protein	
  product	
  vs.	
  NCBI	
  nr	
  
Information Editor
•  DBXRefs:	
  e.g.	
  NP_001128389.1,	
  N.	
  
vitripennis,	
  RefSeq	
  
•  PubMed	
  iden(fier:	
  PMID:	
  24065824	
  
•  Gene	
  Ontology	
  IDs:	
  GO:0022843,	
  GO:
0042048,	
  GO:0035725,	
  GO:0001518.	
  
•  Comments.	
  
•  Name,	
  Symbol.	
  	
  
•  Approve	
  /	
  Delete	
  radio	
  buAon.	
  
Example 109
Comments	
  
(if	
  applicable)	
  
Video	
  demostración	
  
APOLLO

demonstration
DEMO 111
Apollo	
  demo	
  video	
  available	
  at:	
  	
  
hAps://youtu.be/VgPtAP_fvxY	
  
CONTENIDO

Web	
  Apollo	
  Collabora(ve	
  Cura(on	
  and	
  	
  
Interac(ve	
  Analysis	
  of	
  Genomes	
  
112OUTLINE
•  BIO-­‐REFRESHER	
  
conceptos	
  que	
  neceistamos	
  
•  ANOTACION	
  
predicciones	
  automá(cas	
  
•  ANOTACION	
  MANUAL	
  
necesaria,	
  en	
  colaboración	
  
	
  
•  APOLLO	
  
avanzando	
  la	
  curación	
  en	
  colaboración	
  
	
  
•  EJEMPLO	
  
demonstraciones	
  
•  EJERCICIOS	
  
Ejercicios	
  
Exercises
Live	
  Demonstra(on	
  using	
  the	
  Apis	
  mellifera	
  genome.	
  
115
1.	
  Evidence	
  in	
  support	
  of	
  protein	
  coding	
  gene	
  
models.	
  
	
  	
  
1.1	
  Consensus	
  Gene	
  Sets:	
  
Official	
  Gene	
  Set	
  v3.2	
  
Official	
  Gene	
  Set	
  v1.0	
  
	
  
1.2	
  Consensus	
  Gene	
  Sets	
  comparison:	
  
OGSv3.2	
  genes	
  that	
  merge	
  OGSv1.0	
  and	
  
RefSeq	
  genes	
  
OGSv3.2	
  genes	
  that	
  split	
  OGSv1.0	
  and	
  RefSeq	
  
genes	
  
	
  
1.3	
  Protein	
  Coding	
  Gene	
  Predic=ons	
  Supported	
  by	
  
Biological	
  Evidence:	
  
NCBI	
  Gnomon	
  
Fgenesh++	
  with	
  RNASeq	
  training	
  data	
  
Fgenesh++	
  without	
  RNASeq	
  training	
  data	
  
NCBI	
  RefSeq	
  Protein	
  Coding	
  Genes	
  and	
  Low	
  Quality	
  
Protein	
  Coding	
  Genes	
  
1.4	
  Ab	
  ini,o	
  protein	
  coding	
  gene	
  predic=ons:	
  
Augustus	
  Set	
  12,	
  Augustus	
  Set	
  9,	
  Fgenesh,	
  GeneID,	
  
N-­‐SCAN,	
  SGP2	
  
	
  
1.5	
  Transcript	
  Sequence	
  Alignment:	
  
NCBI	
  ESTs,	
  Apis	
  cerana	
  RNA-­‐Seq,	
  Forager	
  Bee	
  Brain	
  
Illumina	
  Con(gs,	
  Nurse	
  Bee	
  Brain	
  Illumina	
  Con(gs,	
  
Forager	
  RNA-­‐Seq	
  reads,	
  Nurse	
  RNA-­‐Seq	
  reads,	
  
Abdomen	
  454	
  Con(gs,	
  Brain	
  and	
  Ovary	
  454	
  
Con(gs,	
  Embryo	
  454	
  Con(gs,	
  Larvae	
  454	
  Con(gs,	
  
Mixed	
  Antennae	
  454	
  Con(gs,	
  Ovary	
  454	
  Con(gs	
  
Testes	
  454	
  Con(gs,	
  Forager	
  RNA-­‐Seq	
  HeatMap,	
  
Forager	
  RNA-­‐Seq	
  XY	
  Plot,	
  Nurse	
  RNA-­‐Seq	
  
HeatMap,	
  Nurse	
  RNA-­‐Seq	
  XY	
  Plot	
  	
  
Becoming Acquainted with Web Apollo.
Exercises
Live	
  Demonstra(on	
  using	
  the	
  Apis	
  mellifera	
  genome.	
  
116
1.	
  Evidence	
  in	
  support	
  of	
  protein	
  coding	
  gene	
  
models	
  (Con=nued).	
  
	
  
1.6	
  Protein	
  homolog	
  alignment:	
  
Acep_OGSv1.2	
  
Aech_OGSv3.8	
  
Cflo_OGSv3.3	
  
Dmel_r5.42	
  
Hsal_OGSv3.3	
  
Lhum_OGSv1.2	
  
Nvit_OGSv1.2	
  
Nvit_OGSv2.0	
  
Pbar_OGSv1.2	
  
Sinv_OGSv2.2.3	
  
Znev_OGSv2.1	
  
Metazoa_Swissprot	
  
	
  
	
  
2.	
  Evidence	
  in	
  support	
  of	
  non	
  protein	
  coding	
  gene	
  
models	
  
	
  
2.1	
  Non-­‐protein	
  coding	
  gene	
  predic=ons:	
  
NCBI	
  RefSeq	
  Noncoding	
  RNA	
  
NCBI	
  RefSeq	
  miRNA	
  
	
  
2.2	
  Pseudogene	
  predic=ons:	
  
NCBI	
  RefSeq	
  Pseudogene	
  
Becoming Acquainted with Web Apollo.
Instrucciones
117 | 117	
APOLLO EN LA WEB

instrucciones
Servidor	
   URL	
  
1	
  hAp://54.94.132.228:8080/apollo/annotator/index	
  
2	
  hAp://54.94.132.228:8080/apollo/annotator/index	
  
3	
  hAp://54.94.132.228:8080/apollo/annotator/index	
  
4	
  hAp://54.94.132.228:8080/apollo/annotator/index	
  
5	
  hAp://54.94.132.228:8080/apollo/annotator/index	
  
Email:	
  
nombre.apellido@example.com	
  
	
  
Contraseña:	
  
nombreapellido	
  
Email	
   Contraseña	
   Servidor	
   Empezar	
  en	
  
user.one@example.com	
   userone	
   1	
   1	
  
user.two@example.com	
   usertwo	
   2	
   1	
  
user.three@example.com	
   userthree	
   3	
   1	
  
user.four@example.com	
   userfour	
   4	
   1	
  
user.five@example.com	
   userfive	
   5	
   1	
  
user.six@example.com	
   usersix	
   1	
   7	
  
user.seven@example.com	
   userseven	
   2	
   7	
  
user.eight@example.com	
   usereight	
   3	
   7	
  
user.nine@example.com	
   usernine	
   4	
   7	
  
user.ten@example.com	
   userten	
   5	
   7	
  
user.eleven@example.com	
   usereleven	
   1	
   1	
  
user.twelve@example.com	
   usertwelve	
   2	
   1	
  
user.thirteen@example.com	
   userthirteen	
   3	
   1	
  
user.fourteen@example.com	
   userfourteen	
   4	
   1	
  
user.fioeen@example.com	
   userfioeen	
   5	
   1	
  
user.sixteen@example.com	
   usersixteen	
   1	
   7	
  
user.seventeen@example.com	
   userseventeen	
   2	
   7	
  
user.eighAeen@example.com	
   usereighteen	
   3	
   7	
  
user.nineteen@example.com	
   usernineteen	
   4	
   7	
  
user.twenty@example.com	
   usertwenty	
   5	
   7	
  
user.twentyone@example.com	
   usertwentyone	
   1	
   1	
  
user.twentytwo@example.com	
   usertwentytwo	
   2	
   1	
  
user.twentythree@example.com	
   usertwentythree	
   3	
   1	
  
user.twentyfour@example.com	
   usertwentyfour	
   4	
   1	
  
user.twentyfive@example.com	
   usertwentyfive	
   5	
   1	
  
user.twentysix@example.com	
   usertwentysix	
   1	
   7	
  
user.twentyseven@example.com	
   usertwentyseven	
   2	
   7	
  
user.twentyeight@example.com	
   usertwentyeight	
   3	
   7	
  
user.twentynine@example.com	
   usertwentynine	
   4	
   7	
  
Thank you. 118
•  Berkeley	
  Bioinforma=cs	
  Open-­‐source	
  Projects	
  (BBOP),	
  
Berkeley	
  Lab:	
  Apollo	
  and	
  Gene	
  Ontology	
  teams.	
  Suzanna	
  
E.	
  Lewis	
  (PI).	
  
•  §	
  Chris1ne	
  G.	
  Elsik	
  (PI).	
  University	
  of	
  Missouri.	
  	
  
•  *	
  Ian	
  Holmes	
  (PI).	
  University	
  of	
  California	
  Berkeley.	
  
•  Arthropod	
  genomics	
  community:	
  i5K	
  Steering	
  
CommiAee	
  (esp.	
  Sue	
  Brown	
  (Kansas	
  State)),	
  Alexie	
  
Papanicolaou	
  (UWS),	
  and	
  the	
  Honey	
  Bee	
  Genome	
  
Sequencing	
  Consor(um.	
  
•  Stephen	
  Ficklin	
  GenSAS	
  Washington	
  State	
  University	
  
•  Apollo	
  is	
  supported	
  by	
  NIH	
  grants	
  5R01GM080203	
  from	
  
NIGMS,	
  and	
  5R01HG004483	
  from	
  NHGRI.	
  Both	
  projects	
  
are	
  also	
  supported	
  by	
  the	
  Director,	
  Office	
  of	
  Science,	
  
Office	
  of	
  Basic	
  Energy	
  Sciences,	
  of	
  the	
  U.S.	
  Department	
  
of	
  Energy	
  under	
  Contract	
  No.	
  DE-­‐AC02-­‐05CH11231	
  
•  	
  	
  
•  For	
  your	
  a"en=on,	
  thank	
  you!	
  
Apollo	
  
Nathan	
  Dunn	
  
Colin	
  Diesh	
  §	
  
Deepak	
  Unni	
  §	
  	
  
	
  
Gene	
  Ontology	
  
Chris	
  Mungall	
  
Seth	
  Carbon	
  
Heiko	
  Dietze	
  
	
  
BBOP	
  
Apollo:	
  hAp://GenomeArchitect.org	
  	
  
GO:	
  hAp://GeneOntology.org	
  
i5K:	
  hAp://arthropodgenomes.org/wiki/i5K	
  
¡Gracias!	
  
NAL	
  at	
  USDA	
  
Monica	
  Poelchau	
  
Christopher	
  Childers	
  
Gary	
  Moore	
  
HGSC	
  at	
  BCM	
  
fringy	
  Richards	
  
Kim	
  Worley	
  
	
  
JBrowse	
   	
   	
   	
  	
  Eric	
  Yao	
  *	
  
Apolo Taller en BIOS

Contenu connexe

Tendances

Biological versus computer viruses
Biological versus computer virusesBiological versus computer viruses
Biological versus computer viruses
UltraUploader
 
C dna and genomic libraries copy
C dna and genomic libraries   copyC dna and genomic libraries   copy
C dna and genomic libraries copy
christanantony
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
Amit Ruchi Yadav
 
Rna lecture
Rna lectureRna lecture
Rna lecture
nishulpu
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data Integration
COST action BM1006
 

Tendances (20)

Introduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research CommunityIntroduction to Apollo: A webinar for the i5K Research Community
Introduction to Apollo: A webinar for the i5K Research Community
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Biological versus computer viruses
Biological versus computer virusesBiological versus computer viruses
Biological versus computer viruses
 
BITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysisBITS - Comparative genomics: gene family analysis
BITS - Comparative genomics: gene family analysis
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
C dna and genomic libraries copy
C dna and genomic libraries   copyC dna and genomic libraries   copy
C dna and genomic libraries copy
 
Gene identification and discovery
Gene identification and discoveryGene identification and discovery
Gene identification and discovery
 
DNA Library
DNA Library DNA Library
DNA Library
 
Rna lecture
Rna lectureRna lecture
Rna lecture
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
Why Transcriptome? Why RNA-Seq? ENCODE answers….
Why Transcriptome? Why RNA-Seq?  ENCODE answers….Why Transcriptome? Why RNA-Seq?  ENCODE answers….
Why Transcriptome? Why RNA-Seq? ENCODE answers….
 
C dna and genomic libraries amirtham
C dna and genomic libraries   amirthamC dna and genomic libraries   amirtham
C dna and genomic libraries amirtham
 
Nidhi sharma ppt gene library
Nidhi sharma ppt gene library Nidhi sharma ppt gene library
Nidhi sharma ppt gene library
 
Differential gene expression
Differential gene expressionDifferential gene expression
Differential gene expression
 
Construction of genomic and c dna library
Construction of genomic and c dna libraryConstruction of genomic and c dna library
Construction of genomic and c dna library
 
Construction of genomic library in lambda
Construction of genomic library in lambdaConstruction of genomic library in lambda
Construction of genomic library in lambda
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data Integration
 
Shotgun and clone contig method
Shotgun and clone contig methodShotgun and clone contig method
Shotgun and clone contig method
 
DNA library
DNA libraryDNA library
DNA library
 
subtractive hybridization
subtractive hybridizationsubtractive hybridization
subtractive hybridization
 

En vedette

Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Monica Munoz-Torres
 

En vedette (9)

Web Apollo Workshop UIUC
Web Apollo Workshop UIUCWeb Apollo Workshop UIUC
Web Apollo Workshop UIUC
 
Web Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research CommunityWeb Apollo Tutorial for Medfly Research Community
Web Apollo Tutorial for Medfly Research Community
 
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
Apollo: Scalable & collaborative curation of genomes - Biocuration 2015
 
PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81PAINT Family PTHR13451-MUS81
PAINT Family PTHR13451-MUS81
 
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)Introduction to Apollo - i5k Research Community – Calanoida (copepod)
Introduction to Apollo - i5k Research Community – Calanoida (copepod)
 
Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015Apollo Workshop at KSU 2015
Apollo Workshop at KSU 2015
 
Curation Introduction - Apollo Workshop
Curation Introduction - Apollo WorkshopCuration Introduction - Apollo Workshop
Curation Introduction - Apollo Workshop
 
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcionalCONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
CONSORCIO ONTOLOGÍA DE GENES: herramientas para anotación funcional
 
Editing Functionality - Apollo Workshop
Editing Functionality - Apollo WorkshopEditing Functionality - Apollo Workshop
Editing Functionality - Apollo Workshop
 

Similaire à Apolo Taller en BIOS

RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
Toyin23
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
c.titus.brown
 

Similaire à Apolo Taller en BIOS (20)

Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing Apollo Collaborative genome annotation editing
Apollo Collaborative genome annotation editing
 
gene_concept_2.pdf
gene_concept_2.pdfgene_concept_2.pdf
gene_concept_2.pdf
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
RNA-Seq_Presentation
RNA-Seq_PresentationRNA-Seq_Presentation
RNA-Seq_Presentation
 
2014 ucl
2014 ucl2014 ucl
2014 ucl
 
2014 naples
2014 naples2014 naples
2014 naples
 
CROP GENOME SEQUENCING
CROP GENOME SEQUENCINGCROP GENOME SEQUENCING
CROP GENOME SEQUENCING
 
Microbial genetics notes
Microbial genetics notesMicrobial genetics notes
Microbial genetics notes
 
Genomic library
Genomic libraryGenomic library
Genomic library
 
2014 whitney-research
2014 whitney-research2014 whitney-research
2014 whitney-research
 
NCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners SlidesNCBI Boot Camp for Beginners Slides
NCBI Boot Camp for Beginners Slides
 
2014 villefranche
2014 villefranche2014 villefranche
2014 villefranche
 
1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes1.introduction to genetic engineering and restriction enzymes
1.introduction to genetic engineering and restriction enzymes
 
2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge2015 bioinformatics go_hmm_wim_vancriekinge
2015 bioinformatics go_hmm_wim_vancriekinge
 
Cambridge Pre-U Biology - 1.6 Genes and Protein Synthesis PART 1 Sample
Cambridge Pre-U Biology - 1.6 Genes and Protein Synthesis PART 1 SampleCambridge Pre-U Biology - 1.6 Genes and Protein Synthesis PART 1 Sample
Cambridge Pre-U Biology - 1.6 Genes and Protein Synthesis PART 1 Sample
 
Genome structure
Genome structure Genome structure
Genome structure
 
08_Annotation_2022.pdf
08_Annotation_2022.pdf08_Annotation_2022.pdf
08_Annotation_2022.pdf
 
Comparitive genomics
Comparitive genomicsComparitive genomics
Comparitive genomics
 
Genomics_final.pptx
Genomics_final.pptxGenomics_final.pptx
Genomics_final.pptx
 

Plus de Monica Munoz-Torres

Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Monica Munoz-Torres
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
Monica Munoz-Torres
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Monica Munoz-Torres
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
Monica Munoz-Torres
 

Plus de Monica Munoz-Torres (13)

Apollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionalityApollo Workshop AGS2017 Editing functionality
Apollo Workshop AGS2017 Editing functionality
 
Apollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 IntroductionApollo Workshop AGS2017 Introduction
Apollo Workshop AGS2017 Introduction
 
Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015Apollo Exercises Kansas State University 2015
Apollo Exercises Kansas State University 2015
 
Apollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citriApollo annotation guidelines for i5k projects Diaphorina citri
Apollo annotation guidelines for i5k projects Diaphorina citri
 
JBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGRJBrowse & Apollo Overview - for AGR
JBrowse & Apollo Overview - for AGR
 
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
Apollo Genome Annotation Editor: Latest Updates, Including New Galaxy Integra...
 
Gene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunityGene Ontology Consortium: Website & COmmunity
Gene Ontology Consortium: Website & COmmunity
 
Essential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation ToolsEssential Requirements for Community Annotation Tools
Essential Requirements for Community Annotation Tools
 
Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015Data Visualization And Annotation Workshop at Biocuration 2015
Data Visualization And Annotation Workshop at Biocuration 2015
 
Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05Apollo: developers call 2015-02-05
Apollo: developers call 2015-02-05
 
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of GenomesApollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
Apollo and i5K: Collaborative Curation and Interactive Analysis of Genomes
 
Web Apollo Workshop University of Exeter
Web Apollo Workshop University of ExeterWeb Apollo Workshop University of Exeter
Web Apollo Workshop University of Exeter
 
Munoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ssMunoz torres web-apollo-workshop_exeter-2014_ss
Munoz torres web-apollo-workshop_exeter-2014_ss
 

Dernier

Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
MohamedFarag457087
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
chemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdfchemical bonding Essentials of Physical Chemistry2.pdf
chemical bonding Essentials of Physical Chemistry2.pdf
 
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai YoungDubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
Dubai Call Girls Beauty Face Teen O525547819 Call Girls Dubai Young
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx300003-World Science Day For Peace And Development.pptx
300003-World Science Day For Peace And Development.pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 

Apolo Taller en BIOS

  • 1. Editando anotaciones con Apollo
 Un taller para la comunidad científica reunida en BIOS Monica Munoz-Torres, PhD | @monimunozto
 Berkeley Bioinformatics Open-Source Projects (BBOP)
 Lawrence Berkeley National Laboratory | 
 University of California Berkeley | U.S. Department of Energy 
 BIOS, Manizales, Colombia | 21 Septiembre, 2015
  • 2. APOLLO DEVELOPMENT APOLLO DEVELOPERS 2 h" p://G e nom e Ar c hite c t. or g /     Nathan Dunn Eric Yao JBrowse, UC Berkeley Christine Elsik’s Lab, University of Missouri Suzi Lewis Principal Investigator BBOP   Moni Munoz-Torres Stephen Ficklin GenSAS, Washington State University Colin DieshDeepak Unni
  • 3. OUTLINE
 Web  Apollo  Collabora(ve  Cura(on  and     Interac(ve  Analysis  of  Genomes   3OUTLINE •  Hoy   descubriremos   cómo   sortear   obstáculos   para   extraer  la  información  más   valiosa  en  un  proyectos  de   secuenciación  &  anotación   de  genomas.  
  • 4. 4 BY THE END OF THIS TALK
 you will
 v BeAer  understand  genome  cura(on  in  the  context  of  annota(on:     assembled  genome  à  automated  annota=on  à  manual  annota=on   v Become  familiar  with  the  environment  and  func(onality  of  the  Apollo   genome  annota(on  edi(ng  tool.   v Learn  to  iden(fy  homologs  of  known  genes  of  interest  in  a  newly   sequenced  genome.   v Learn  about  corrobora(ng  and  modifying  automa(cally  annotated  gene   models  using  available  evidence  in  Apollo.   Introduction
  • 5. ¿Cómo  se  traza   el  mapa  de  un  genoma?  
  • 6. 6 El mapa del genoma Introduction Diseño & muestreo Análisis comparativos Colección consenso de genes Anotación manual Anotación automatizada Secuenciación Ensamblaje Síntesis & publicación
  • 7. 7 El mapa del genoma Introduction Diseño & muestreo Análisis comparativos Colección consenso de genes Anotación manual Anotación automatizada Secuenciación Ensamblaje Síntesis & publicación QC QC QC QC QCQC QC
  • 8. CURATING GENOMES
 steps involved 1  Genera=on  of  Gene  Models   calling  ORFs,  one  or  more   rounds  of  gene  predic(on,   etc.     2  Annota=on  of  gene  models   Describing  func(on,   expression  paAerns,   metabolic  network    memberships.   3  Manual  annota=on   CURATING GENOMES 8
  • 9. ANOTACION DE GENOMAS
 requiere precisión y profundidad Anotando Genomas 9 La  colección  de  genes  de  cada  organismo  informa  una  variedad  de  análisis:   •  Número  de  genes,  %  GC,  composición  de  TEs,  áreas  repe((vas   •  Asignar  función   •  Evolución  molecular,  conservación  de  secuencias   •  Familias  de  genes   •  Caminos  metabólicos   •  ¿Qué  hace  único  a  cada  organismo?     ¿Qué  hace  “abeja”  a  una  abeja?   Marbach et al. 2011. Nature Methods | Shutterstock.com | Alexander Wild
  • 11. REVIEW ON YOUR OWN
 for manual annotation To  remember…  Biological  concepts  to  beAer   understand  manual  annota(on   11FOOD FOR THOUGHT •  GLOSSARY   from  con1g  to  splice  site     •  CENTRAL  DOGMA   in  molecular  biology     •  WHAT  IS  A  GENE?   defining  your  goal   •  TRANSCRIPTION   mRNA  in  detail     •  TRANSLATION   and  other  defini(ons   •  GENOME  CURATION   steps  involved  
  • 12. 12BIO-REFRESHER What is a gene? v  The  defini(on  of  a  gene  paints  a  very  complex  picture  of  molecular  ac(vity   and  it  is  a  con(nuously  evolving  concept.     •  From  the  Sequence  Ontology  (SO):   “A  gene  is  a  locatable  region  of  genomic  sequence,  corresponding  to  a  unit   of  inheritance,  which  is  associated  with  regulatory  regions,  transcribed   regions  and/or  other  func(onal  sequence  regions”.       “Evolving  Concept”  at  hAp://goo.gl/LpsajQ  
  • 13. 13BIO-REFRESHER What is a gene? v  In  our  life(me,  the  Encyclopedia  of  DNA  Elements  (ENCODE)  project   updated  this  concept  yet  again.  Long  transcripts  &  dispersed  regula1on!       “A  gene  is  a  DNA  segment  that  contributes  phenotype/func(on.  In  the  absence  of   demonstrated  func(on,  a  gene  may  be  characterized  by  sequence,  transcrip(on  or   homology.”     https://www.encodeproject.org/
  • 14. 14BIO-REFRESHER What is a gene?
 let’s think computationally! v  Think  of  the  genome  as  an  operating system for  a  living  being   •  Considering  that  the  nucleo(des  of  the  genome  are  put  together  into  a   code  that  is  executed  through  the  process  of  transcription  and   translation… •  …  think  of  genes  as  subroutines  that  are  repe((vely  called  in  the   process  of  transcription Gerstein et al., 2007. Genome Res.
  • 15. 15BIO-REFRESHER What is a gene?
 considerations v  Also  consider  :   •  A  gene  is  a  genomic  sequence  (DNA  or  RNA)  directly  encoding   func(onal  product  molecules,  either  RNA  or  protein.   •  If  several  func(onal  products  share  overlapping  regions,  we  take  the   union  of  all  overlapping  genomics  sequences  coding  for  them.   •  This  union  must  be  coherent  –  i.e.,  processed  separately  for  final   protein  and  RNA  products  –  but  does  not  require  that  all  products   necessarily  share  a  common  subsequence. Gerstein et al., 2007. Genome Res.
  • 16. 16BIO-REFRESHER “El  gen  es  la  unión  de   secuencias  genómicas   que  codifican  una   colección  coherente   de  productos   funcionales  que   pueden  o  no   superponerse.”     Gerstein et al., 2007. Genome Res El  Gen:  un  blanco  en  movimiento.   ¿QUÉ ES UN GEN?
  • 17. 17BIO-REFRESHER TRANSLATION
 reading frame v  Reading  frame  is  a  manner  of  dividing  the  sequence  of  nucleo(des  in  mRNA   (or  DNA)  into  a  set  of  consecu(ve,  non-­‐overlapping  triplets  (codons).   v  Three  frames  can  be  read  in  the  5’  à  3’  direc(on.  Given  that  DNA  has  two   an(-­‐parallel  strands,  an  addi(onal  three  frames  are  possible  to  be  read  on   the  an(-­‐sense  strand.  Six  total  possible  reading  frames  exist.   v  In  eukaryotes,  only  one  reading  frame  per  sec(on  of  DNA  is  biologically   relevant  at  a  (me:  it  has  the  poten(al  to  be  transcribed  into  RNA  and   translated  into  protein.  This  is  called  the  OPEN  READING  FRAME  (ORF)   •  ORF  =  Start  signal  +  coding  sequence  (divisible  by  3)  +  Stop  signal   v  The  sec(ons  of  the  mature  mRNA  transcribed  with  the  coding  sequence  but   not  translated  are  called  UnTranslated  Regions  (UTR);  one  at  each  end.  
  • 18. 18 "Reading Frame" by Hornung Ákos - Wikimedia Commons BIO-REFRESHER TRANSLATION
 reading frame
  • 19. 19 "ORF" by Thatsonginc - Wikimedia Commons BIO-REFRESHER TRANSLATION
 reading frame
  • 20. 20BIO-REFRESHER TRANSLATION
 reading frame: splice sites v  The  spliceosome  catalyzes  the  removal  of  introns  and  the  liga(on  of  flanking   exons.   •  introns:  spaces  inside  the  gene,  not  part  of  the  coding  sequence   •  exons:  expression  units  (of  the  coding  sequence)   v  Splicing  “signals”  (from  the  point  of  view  of  an  intron):     •  There  is  a  5’  end  splice  “signal”  (site):  usually  GT  (less  common:  GC)   •  And  a  3’  end  splice  site:  usually  AG   •  …]5’-­‐GT/AG-­‐3’[…     v  It  is  possible  to  produce  more  than  one  protein  (polypep(de)  sequence  from   the  same  genic  region,  by  alterna(vely  bringing  exons  together=  alterna=ve   splicing.  For  example,  the  gene  Dscam  (Drosophila)  has  38,000  alterna(vely   spliced  mRNAs  =  isoforms  
  • 21. 21 "Gene structure" by Daycd- Wikimedia Commons BIO-REFRESHER TRANSLATION
 now in your mind •  Although  of  brief  existence,  understanding  mRNAs  is  crucial,    as  they  will  become  the  center  of  your  work.  
  • 22. 22 Text for figures goes here BIO-REFRESHER TRANSLATION
 reading frame: phase v  Introns  can  interrupt  the  reading  frame  of  a  gene  by  inser(ng  a  sequence   between  two  consecu(ve  codons       v  Between  the  first  and  second  nucleo(de  of  a  codon     v  Or  between  the  second  and  third  nucleo(de  of  a  codon   "Exon and Intron classes”. Licensed under Fair use via Wikipedia
  • 23. 23 "Protein synthesis" by Kelvinsong - Wikimedia Commons CURATING GENOMES TRANSLATION
 in detail
  • 24. 24BIO-REFRESHER HICCUPS
 in transcription and translation v  The  presence  of  premature  Stop  codons  in  the  message  is  possible.  A   process  called  non-­‐sense  mediated  decay  checks  for  them  and  corrects   them  to  avoid:  incomplete  splicing,  DNA  muta(ons,  transcrip(on  errors,  and   leaky  scanning  of  ribosome  –  causing  changes  in  the  reading  frame  (frame   shiYs).   v  Inser(ons  and  dele(ons  (indels)  can  cause  frame  shios,  when  indel  is  not   divisible  by  three  (3).  As  a  result,  the  pep(de  can  be  abnormally  long,  or   abnormally  short  –  depending  when  the  first  in-­‐frame  Stop  signal  is  located.  
  • 26. 26Gene Prediction GENE PREDICTION v  The  iden(fica(on  of  structural  features  of  the  genome:     •  Primarily  focused  on  protein-­‐coding  genes.     •  Predicts  also  transfer  RNAs  (tRNA),  ribosomal  RNAs  (rRNA),   regulatory  mo(fs,  long  and  small  non-­‐coding  RNAs  (ncRNA),   repe((ve  elements  (masked),  etc.   •  Two  methods  for  iden(fica(on.   •  Some  are  self-­‐trained  and  some  must  be  trained.  
  • 27. 27Gene Prediction GENE PREDICTION
 methods for discovery 1)  Ab  ini,o:     -­‐  based  on  DNA  composi(on,     -­‐  deals  strictly  with  genomic   sequences   -­‐  makes  use  of  sta(s(cal   approaches  to  search  for  coding   regions  and  typical  gene  signals.       •  E.g.  Augustus,  GENSCAN,     geneid,  fgenesh,  etc.   3’   Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920
  • 28. 28 Nucleic Acids 2003 vol. 31 no. 13 3738-3741 Gene Prediction GENE PREDICTION
 methods for discovery (ctd) 2)  Homology-­‐based:     -­‐  evidence-­‐based,     -­‐  finds  genes  using  either  similarity  searches  in  the  main  databases  or   experimental  data  including  RNAseq,  expressed  sequence  tags  (ESTs),  full-­‐length   complementary  DNAs  (cDNAs),  etc.       •  E.g:  fgenesh++,  Just  Annotate  My  genome  (JAMg),  SGP2  
  • 29. 29 GENE ANNOTATION Integra(on  of  data  from  computa(onal  &  experimental  evidence  with  data   from  predic(on  tools,  to  generate  a  reliable  set  of  structural  annota=ons.       Involves:   1)  ab  ini1o  predic(ons   2)  assessment  of  biological  evidence  to  drive  the  gene  predic(on  process   3)  synthesis  of  these  results  to  produce  a  set  of  consensus  gene  models   Gene Annotation
  • 30. 30 In  some  cases  algorithms  and  metrics  used  to  generate   consensus  sets  may  actually  reduce  the  accuracy  of  the  gene’s   representa(on.   GENE ANNOTATION Gene  models  may  be  organized  into  “sets”  using:   v  automa(c  integra(on  of  predicted  sets  (combiners);  e.g:  GLEAN,   EvidenceModeler   or   v  tools  packaged  into  pipelines;  e.g:  MAKER,  PASA,  Gnomon,   Ensembl,  etc.   Gene Annotation
  • 31. ANOTACION
 un arte imperfecto No one is perfect, least of all automated annotation. 31 Nuevas  tecnologías  traen  nuevos  retos:     •  Errores  en  el  ensamblaje  pueden  causar   fragmentación  en  las  anotaciones   •  Cobertura  limitada  dificulta  la   iden(ficación  con  certeza   Image: www.BroadInstitute.org
  • 32. ANOTACION MANUAL
 mejorando predicciones Schiex  et  al.  Nucleic  Acids  2003  (31)  13:  3738-­‐3741   Predicciones Automatizadas Evidencia Experimental Manual Annotation – to the rescue. 32 cDNAs,  búsquedas  con  HMM,  RNAseq,     genes  de  otras  especies.   Entonces,  es  necesario  refinar  las   predicciones  de  elementos  biológicos   codificados  en  el  genoma,  lo  que  requiere   una  cuidadosa  revisión.  
  • 33. 33 BIOCURACION
 ajustes estructurales y funcionales Iden(ficar  los  elementos  del  genoma   que  mejor  representan  la  biología   subyacente  y  eliminar  los  elementos   que  reflejan  errores  sistémicos  de  los   análisis  automa(zados.   Asignar  funciones  a  través  de  análisis   compara(vos  entre  elementos   genómicos  similares  de  organismos   cercanamente  relacionados  usando   literatura,  bases  de  datos,  y  datos   experimentales.   BIOCURACION hAp://GeneOntology.org   1   2  
  • 34. MANUAL ANNOTATION 34 PERO, EN CURACION
 no siempre era posible ampliar estos esfuerzos Researchers  on  their  own;   may  or  may  not  publicize   results;  may  be  a  dead-­‐end   with  very  few  people  ever   aware  of  these  results.   Elsik  et  al.  2006.  Genome  Res.  16(11):1329-­‐33.   Too  many  sequences  and  not  enough  hands.   A  small  group  of  highly   trained  experts  (e.g.  GO).   1   Museum   A  few  very  good  biologists,  a     few  very  good  bioinforma(cians   camping  together  for  intense  but   short  periods  of  (me.   Jamboree  2   Co"age  3  
  • 35. ANOTACION
 un ejercicio en colaboración COLABORANDO 35 Los  inves1gadores  usualmente  buscamos  las   opiniones  y  percepciones  de  colegas  con   experiencia  en  áreas  específicas  del   conocimiento.     Por  ejemplo,  dominios  conservados     o  familias  de  genes.  
  • 36. Apollo
 una herramienta para editar anotaciones 36 v  En  la  web,  integrado  con  JBrowse.   v  ¡Permite  la  colaboración  en  (empo  real!   v  Automá(camente  genera  datos  en     formatos  comunes  para  análisis.   v  Anotación  manual  de  genes,  pseudogenes,  tRNAs,     snRNAs,  snoRNAs,  ncRNAs,  miRNAs,  TEs,  y  fragmentos  repe((vos.   v  Funciones  intui(vas  y  menús  desplegables  crean  y  editan  estructuras   de  transcritos  y  exones,  insertan  comentarios  (CV,  texto  libre),  y   términos  de  GO,  etc.   INTRODUCING APOLLO hAp://GenomeArchitect.org/  
  • 37. ARQUITECTURA
 simple, flexible ARCHITECTURE 37 Cliente  de  web  +  Motor  de  edición  de  anotaciones  +  Servicio  de  datos  en  el  servidor   REST / JSON Websockets Motor de Anotación (Servidor) Shiro LDAP OAuth Annotations Security Preferences Organisms Tracks BAM BED VCF GFF3 BigWig Curadores Google Web Toolkit (GWT) / Bootstrap JBrowse DOJO / jQuery Datos a JBrowse Organismo 1 Carga de datos con evidencia genómica para cada organismo Servicio único de almacenamiento PostgreSQL, MySQL, MongoDB, ElasticSearch Apollo v2.0 Datos a JBrowse Organismo 2
  • 38. CLIENTE DE WEB
 panel del curador ARCHITECTURE 38 Motor de Anotación (Servidor) Curadores Google Web Toolkit (GWT) / Bootstrap JBrowse DOJO / jQuery Apollo v2.0 BAM BED VCF GFF3 BigWig REST / JSON Websockets Usa GWT/Bootstrap en el frente para proveerle un comportamiento versátil a la aplicación. Panel Del Curador ¡NUEVO! ¡NUEVO!
  • 39. MOTOR DE ANOTACION
 lógica de edición ARCHITECTURE 39 Motor de Anotación (Servidor) Shiro LDAP OAuth Datos a JBrowse Organismo 2 Datos a JBrowse Organismo 1 Servicio único de almacenamiento Apollo v2.0 Controladores Grails (J2EE servlet) llevan las solicitudes al directorio de datos apropiado para cada organismo en JBrowse Carga de datos con evidencia genómica para cada organismo ¡NUEVO! Cliente de web REST / JSON Websockets
  • 40. SERVICIO DE DATOS EN EL SERVIDOR
 servicio único de almacenamiento ARCHITECTURE 40 Anotaciones Seguridad Preferencias Organismos Pistas de datos Servicio único de almacenamiento PostgreSQL, MySQL, MongoDB, ElasticSearch Motor de Anotación (Servidor) Un solo servicio de almacenamiento, consultable, para guardar las anotaciones. ¡NUEVO! Apollo v2.0
  • 41. ¡COLABOREMOS!
 Apollo tiene código abierto y es expandible HIGHLIGHTED IMPROVEMENTS 41 The Genome Sequence Annotation Server (GenSAS) Annotate Los  usuarios  pueden  adicionar  programas  para  permi=r  sus  propios  procesos  de   trabajo.   Ejemplos:     •  GenSAS:  plataforma  para   anotación  estructural  del   genoma.     •  i5K:   -­‐  Espacio  en  NAL  para  compar(r   ensamblajes  y  conjuntos  de   genes,  y  para  anotación  manual.   -­‐  Proyecto  Piloto  >40  genomas:   47  charlas,  9  posters  en   Simposio  de  Genómica  de   Artrópodos.   Annotate National Agricultural Library
  • 42. We  train  and  support  hundreds  of  geographically  dispersed  scien(sts  from   diverse  research  communi(es  to  conduct  manual  annota(ons,  to  recover   coding  sequences  in  agreement  with  all  available  biological  evidence  using   Apollo.       v  Gate  keeping  and  monitoring.   v  Tutorials,  training  workshops,  and  “geneborees”.   42 DISPERSED COMMUNITIES collaborative manual annotation efforts APOLLO
  • 43. LESSONS LEARNED
 What  we  have  learned:     •  Collabora(ve  work  dis(lls  invaluable  knowledge   •  We  must  enforce  strict  rules  and  formats   •  We  must  evolve  with  the  data   •  A  liAle  training  goes  a  long  way   •  NGS  poses  addi(onal  challenges   LESSONS LEARNED 43
  • 44. ¿Cuál  es  la  tarea  del   curador?  
  • 45. Becoming Acquainted with Web Apollo 45 | 45 GENERAL PROCESS OF CURATION
 main steps to remember 1.  Select  or  find  a  region  of  interest,  e.g.  scaffold.   2.  Select  appropriate  evidence  tracks  to  review  the  gene  model.   3.  Determine  whether  a  feature  in  an  exis(ng  evidence  track   will  provide  a  reasonable  gene  model  to  start  working.   4.  If  necessary,  adjust  the  gene  model.   5.  Check  your  edited  gene  model  for  integrity  and  accuracy  by   comparing  it  with  available  homologs.   6.  Comment  and  finish.  
  • 46. 46CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 annotators: that’s you! v  Annota=ng  a  simple  case:  WHEN  “The  official  predic(on  is  correct,  or  nearly   correct,  assuming  that  no  aligned  data  extends  beyond  the  gene  model  and   if  so,  it  is  not  likely  to  be  coding  sequence,  and/or  the  gene  predic(on   matches  what  you  know  about  the  gene”:   a.  Can  you  add  UTRs?     b.  Check  exon  structures.   c.  Check  splice  sites:  …]5’-­‐GT/AG-­‐3’[…   d.  Check  ‘start’  and  ‘stop’  sites.   e.  Check  the  predicted  protein  product(s).   f.  If  the  protein  product  s(ll  does  not  look  correct,  go  on  to  “Annota(ng   more  complex  cases”.    
  • 47. 47CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 continued v  Addi=onal  func=onality.  You  may  also  need  to  learn  how  to:   a.  Get  genomic  sequence     b.  Merge  exons     c.  Add/Delete  an  exon     d.  Create  an  exon  de  novo  (within  an  intron  or  outside  exis(ng   annota(ons).   e.  Right/apple-­‐click  on  a  feature  to  get  feature  ID  and  addi(onal   informa(on     f.  Looking  up  homolog  descrip(ons  going  to  the  accession  web  page  at   UniProt/Swissprot    
  • 48. 48CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 continued v  Annota=ng  more  complex  cases:     a.  Incomplete  annota(on:  protein  integrity  checks,  indicate  gaps,  missing  5’   sequences  or  missing  3’  sequences.     b.  Merge  of  2  gene  predic(ons  on  same  scaffold     c.  Merge  of  2  gene  predic(ons  on  different  scaffolds  (uh-­‐oh!).   d.  Split  of  a  gene  predic(on     e.  Frameshios,  Selenocysteine,  single-­‐base  errors,  and  other  inconvenient   phenomena    
  • 49. 49CURATING GENOMES WHAT ANNOTATORS SHOULD LOOK FOR
 continued v  Adding  important  project  informa=on  in  the  form  of  Canned  and/or   Customized  Comments:   a.  NCBI  ID,  RefSeq  ID,  gene  symbol(s),  common  name(s),  synonyms,  top   BLAST  hits  (GenBank  IDs),  orthologs  with  species  names,  and  anything   else  you  can  think  of,  because  you  are  the  expert.   b.  Type  of  annota(on  (e.g.:  whether  or  not  the  gene  model  was  changed)     c.  Data  source  (for  example  if  the  Fgeneshpp  predicted  gene  was  the   star(ng  point  for  your  annota(on)   d.  The  kinds  of  changes  you  made  to  the  gene  model,  e.g.:  split,  merge   e.  Func(onal  descrip(on   f.  Whether  you  would  like  for  your  MOD  curator  to  check  the  annota(on   g.  Whether  part  of  your  gene  is  on  a  different  scaffold.  
  • 50. 50 TRAINING CURATORS
 a little training goes a long way! Provided  with  adequate  tools,  wet  lab  scien(sts  make   excep(onal  curators  who  can  easily  learn  to  maximize  the   genera(on  of  accurate,  biologically  supported  gene  models.   APOLLO
  • 51. Conozcamos  a  Apollo    hAp://genomearchitect.org/web_apollo_user_guide  
  • 52. 52 Apollo
 ámbito de edición para anotaciones BECOMING ACQUAINTED WITH APOLLO Color  por  marco  de  lectura,   alternar  cadena,  cambiar   esquema  de  color,  resaltador   Cargar  evidencia  experimental   (GFF3,  BAM,  BigWig),  pistas  de   datos  de  combinación  y     búsqueda.   Interrogar  el  genoma   usando  BLAT.   Navegación  y  zoom.   Buscar  un  gen  o  un  grupo   Obtener  coordenadas,  y  hacer   zoom  con  “selección  elás(ca”   Login   Anotaciones   creadas  por  el   usuario   Panel  del   curador   Pistas  de   datos  de   evidencia   Datos   transcriptómicos  de   estadío  y    (po   celular     específicos.  
  • 54. Instrucciones 54 | 54 APOLLO EN LA WEB
 instrucciones Email:   nombre.apellido@example.com     Contraseña:   nombreapellido   Email   Contraseña   Servidor   Empezar  en   user.one@example.com   userone   1   1   user.two@example.com   usertwo   2   1   user.three@example.com   userthree   3   1   user.four@example.com   userfour   4   1   user.five@example.com   userfive   5   1   user.six@example.com   usersix   1   7   user.seven@example.com   userseven   2   7   user.eight@example.com   usereight   3   7   user.nine@example.com   usernine   4   7   user.ten@example.com   userten   5   7   user.eleven@example.com   usereleven   1   1   user.twelve@example.com   usertwelve   2   1   user.thirteen@example.com   userthirteen   3   1   user.fourteen@example.com   userfourteen   4   1   user.fioeen@example.com   userfioeen   5   1   user.sixteen@example.com   usersixteen   1   7   user.seventeen@example.com   userseventeen   2   7   user.eighAeen@example.com   usereighteen   3   7   user.nineteen@example.com   usernineteen   4   7   user.twenty@example.com   usertwenty   5   7   user.twentyone@example.com   usertwentyone   1   1   user.twentytwo@example.com   usertwentytwo   2   1   user.twentythree@example.com   usertwentythree   3   1   user.twentyfour@example.com   usertwentyfour   4   1   user.twentyfive@example.com   usertwentyfive   5   1   user.twentysix@example.com   usertwentysix   1   7   user.twentyseven@example.com   usertwentyseven   2   7   user.twentyeight@example.com   usertwentyeight   3   7   user.twentynine@example.com   usertwentynine   4   7   Servidor   URL   1  hAp://54.94.132.228:8080/apollo/annotator/index   2  hAp://54.207.71.112:8080/apollo/annotator/index   3  hAp://54.207.106.136:8080/apollo/annotator/index   4  hAp://54.207.113.253:8080/apollo/annotator/index   5  hAp://54.232.217.84:8080/apollo/annotator/index  
  • 56. 56 Apollo
 funcionalidad BECOMING ACQUAINTED WITH APOLLO •  Agregar:     –  IDs  de  bases  de  datos  públicas  (e.g.   GenBank,  usando  DBXRef);  símbolo(s)   de  cada  gen,  nombre(s)  común(es),   sinónimos,  el  mejor  resultado  de  BLAST,   ortólogos  con  el  nombre  de  la  especie.   –  Asignaciones  de  función  apropiadas   (e.g.  via  datos  de  RNA-­‐Seq,  búsquedas   de  literatura,  búsquedas  con  HMMs,   etc.)   –  Comentarios  acerca  de  las   modificaciones  que  se  realizaron,  o  si   ninguna  fue  necesaria.   –  Y  otras  notas  que  se  le  ocurran  al   biocurador.     •  Corregir  si(os  de  ‘Inicio’  y  ‘Parada’   •  Arreglar  si(os  de  ayuste  no  canónicos   •  Anotar  UTRs  (e.g.:  usando  RNA-­‐Seq)   •  Obtener  &  corregir  predicciones  de   productos  de  proteínas   -­‐  Alinearlos  con  genes  o  familias  de   genes  relevantes.   -­‐  Usar  blastp  en  RefSeq  o  nr  de  NCBI   •  Revisar  la  falta  de  datos  en  el  ensamble   •  Unir  2  predicciones  de  genes  en  el   mismo  grupo   •  Dividir  una  predicción  de  gen   •  Corregir  desplazamientos  de  la  pauta   de  lectura,  y  otros  errores  en  el   ensamblaje   •  Anotar  selenocisteínas,  errores  de  una   sola  base,  etc.  
  • 57. REMOVABLE SIDE DOCK
 with customizable tabs HIGHLIGHTED IMPROVEMENTS 57 Annotations Organism Users Groups AdminTracks Reference Sequence
  • 58. EDITS & EXPORTS
 annotation details, exon boundaries, data export HIGHLIGHTED IMPROVEMENTS 58 1 2 Annotations 1 2
  • 59. HIGHLIGHTED IMPROVEMENTS 59 Reference Sequences 3 FASTA   GFF3   EDITS & EXPORTS
 annotation details, exon boundaries, data export 3
  • 60. 60 | 60 Becoming Acquainted with Web Apollo. USER NAVIGATION Annotator   panel.   •  Choose appropriate evidence tracks from list on annotator panel. •  Select & drag elements from evidence track into the ‘User-created Annotations’ area. •  Hovering over annotation in progress brings up an information pop-up.
  • 61. 61 | 61 USER NAVIGATION Becoming Acquainted with Web Apollo. •  Annotation right-click menu
  • 62. 62 Annota(ons,  annota(on  edits,  and  History:  stored  in  a  centralized  database.   62 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 63. 63 The  Annota(on  Informa=on  Editor   DBXRefs  are  database  crossed  references:  if  you  have   reason  to  believe  that  this  gene  is  linked  to  a  gene  in  a   public  database  (including  your  own),  then  add  it  here.   63 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 64. 64 The  Annota(on  Informa=on  Editor   •  Add  PubMed  IDs   •  Include  GO  terms  as  appropriate   from  any  of  the  three  ontologies   •  Write  comments  sta(ng  how  you   have  validated  each  model.   64 USER NAVIGATION Becoming Acquainted with Web Apollo.
  • 65. 65 | 65 USER NAVIGATION Becoming Acquainted with Web Apollo. •  ‘Zoom  to  base  level’  op(on  reveals  the  DNA  Track.  
  • 66. 66 | 66 USER NAVIGATION Becoming Acquainted with Web Apollo. •  Color  exons  by  CDS  from  the  ‘View’  menu.  
  • 67. 67 | Zoom  in/out  with  keyboard:   shio  +  arrow  keys  up/down   67 USER NAVIGATION Becoming Acquainted with Web Apollo. •  Toggle  reference  DNA  sequence  and  transla=on  frames  in  forward   strand.  Toggle  models  in  either  direc(on.  
  • 70. “Simple  case”:      -­‐  the  predicted  gene  model  is  correct  or  nearly  correct,  and      -­‐  this  model  is  supported  by  evidence  that  completely  or  mostly   agrees  with  the  predic(on.      -­‐  evidence  that  extends  beyond  the  predicted  model  is  assumed   to  be  non-­‐coding  sequence.       The  following  are  simple  modifica(ons.       70 | 70 ANNOTATING SIMPLE CASES Becoming Acquainted with Web Apollo. SIMPLE CASES
  • 71. 71 | •  A  confirma(on  box  will  warn  you  if  the  receiving  transcript  is  not  on  the   same  strand  as  the  feature  where  the  new  exon  originated.   •  Check  ‘Start’  and  ‘Stop’  signals  aoer  each  edit.   71 ADDING EXONS Becoming Acquainted with Web Apollo. SIMPLE CASES
  • 72. If  transcript  alignment  data  are  available  and  extend  beyond  your  original  annota(on,  you   may  extend  or  add  UTRs.     1.  Right  click  at  the  exon  edge  and  ‘Zoom  to  base  level’.     2.  Place  the  cursor  over  the  edge  of  the  exon  un1l  it  becomes  a  black  arrow  then  click   and  drag  the  edge  of  the  exon  to  the  new  coordinate  posi(on  that  includes  the  UTR.     72 | To  add  a  new  spliced  UTR  to  an  exis(ng  annota(on   follow  the  procedure  for  adding  an  exon.   72 ADDING UTRs Becoming Acquainted with Web Apollo. SIMPLE CASES
  • 73. 1.  Zoom  in  to  clearly  resolve  each  exon  as  a  dis(nct  rectangle.     2.  Two  exons  from  different  tracks  sharing  the  same  start  and/or  end   coordinates  will  display  a  red  bar  to  indicate  matching  edges.   3.  Selec(ng  the  whole  annota(on  or  one  exon  at  a  (me,  use  this  ‘edge-­‐ matching’  func(on  and  scroll  along  the  length  of  the  annota(on,   verifying  exon  boundaries  against  available  data.  Use  square  [  ]   brackets  to  scroll  from  exon  to  exon.   4.  Check  if  cDNA  /  RNAseq  reads  lack  one  or  more  of  the  annotated   exons  or  include  addi(onal  exons.       73 | 73 CHECK EXON INTEGRITY Becoming Acquainted with Web Apollo. SIMPLE CASES
  • 74. To  modify  an  exon  boundary  and  match   data   in   the   evidence   tracks:   select   both   the   offending   exon   and   the   feature  with  the  expected  boundary,   then  right  click  on  the  annota(on  to   select  ‘Set  3’  end’  or  ‘Set  5’  end’  as   appropriate.     74 | In  some  cases  all  the  data  may  disagree  with  the  annota(on,  in   other  cases  some  data  support  the  annota(on  and  some  of  the   data  support  one  or  more  alterna(ve  transcripts.  Try  to  annotate   as  many  alterna(ve  transcripts  as  are  well  supported  by  the  data.   74 EXON STRUCTURE INTEGRITY Becoming Acquainted with Web Apollo. SIMPLE CASES
  • 75. Flags  non-­‐canonical   splice  sites.   Selec(on  of  features  and  sub-­‐ features   Edge-­‐matching   Evidence  Tracks  Area   ‘User-­‐created  Annota(ons’  Track   Apollo’s  edi(ng  logic  (brain):     §  selects  longest  ORF  as  CDS   §  flags  non-­‐canonical  splice  sites   75 ORFs AND SPLICE SITES Becoming Acquainted with Web Apollo. SIMPLE CASES
  • 76. 76 | Exon/intron  junc(on  possible  error   Original  model   Curated  model   Non-­‐canonical   splices   are   indicated   by   an   orange   circle   with   a   white   exclama(on   point   inside,   placed   over   the   edge   of   the   offending   exon.     Canonical  splice  sites:   3’-­‐…exon]GA  /  TG[exon…-­‐5’   5’-­‐…exon]GT  /  AG[exon…-­‐3’   reverse  strand,  not  reverse-­‐complemented:   forward  strand   76 SPLICE SITES Becoming Acquainted with Web Apollo. SIMPLE CASES Zoom  to  review  non-­‐canonical   splice  site  warnings.  Although   these  may  not  always  have  to  be   corrected  (e.g  GC  donor),  they   should  be  flagged  with  the   appropriate  comment.    
  • 77. Web  Apollo  calculates  the  longest  possible  open   reading  frame  (ORF)  that  includes  canonical  ‘Start’   and  ‘Stop’  signals  within  the  predicted  exons.     If  ‘Start’  appears  to  be  incorrect,  modify  it  by  selec(ng   an  in-­‐frame  ‘Start’  codon  further  up  or   downstream,  depending  on  evidence  (protein   database,  addi(onal  evidence  tracks).       It  may  be  present  outside  the  predicted  gene   model,  within  a  region  supported  by  another   evidence  track.     In  very  rare  cases,  the  actual  ‘Start’  codon  may  be   non-­‐canonical  (non-­‐ATG).     77 | 77 ‘START’ AND ‘STOP’ SITES Becoming Acquainted with Web Apollo. SIMPLE CASES
  • 79. Evidence  may  support  joining  two  or  more  different  gene  models.     Warning:  protein  alignments  may  have  incorrect  splice  sites  and  lack  non-­‐conserved  regions!     1.  In  ‘User-­‐created  Annota=ons’  area  shio-­‐click  to  select  an  intron  from  each  gene  model  and   right  click  to  select  the  ‘Merge’  op(on  from  the  menu.     2.  Drag  suppor(ng  evidence  tracks  over  the  candidate  models  to  corroborate  overlap,  or   review  edge  matching  and  coverage  across  models.   3.  Check  the  resul(ng  transla(on  by  querying  a  protein  database  e.g.  UniProt.  Add  comments   to  record  that  this  annota(on  is  the  result  of  a  merge.   79 | 79 Red  lines  around  exons:   ‘edge-­‐matching’  allows  annotators  to  confirm  whether  the   evidence  is  in  agreement  without  examining  each  exon  at  the   base  level.   COMPLEX CASES merge two gene predictions on the same scaffold Becoming Acquainted with Web Apollo. COMPLEX CASES
  • 80. One  or  more  splits  may  be  recommended  when:     -­‐  different  segments  of  the  predicted  protein  align  to  two  or  more   different  gene  families     -­‐  predicted  protein  doesn’t  align  to  known  proteins  over  its  en(re  length     Transcript  data  may  support  a  split,  but  first  verify  whether  they  are   alterna(ve  transcripts.     80 | 80 COMPLEX CASES split a gene prediction Becoming Acquainted with Web Apollo. COMPLEX CASES
  • 81. DNA  Track   ‘User-­‐created  Annota=ons’  Track   81 COMPLEX CASES correcting frameshifts and single-base errors Becoming Acquainted with Web Apollo. COMPLEX CASES Always  remember:  when  annota(ng  gene  models  using  Apollo,  you  are  looking  at  a  ‘frozen’  version  of   the  genome  assembly  and  you  will  not  be  able  to  modify  the  assembly  itself.  
  • 82. 82 COMPLEX CASES correcting selenocysteine containing proteins Becoming Acquainted with Web Apollo. COMPLEX CASES
  • 83. 83 COMPLEX CASES correcting selenocysteine containing proteins Becoming Acquainted with Web Apollo. COMPLEX CASES
  • 84. 1.  Apollo  allows  annotators  to  make  single  base  modifica(ons  or  frameshios  that  are  reflected  in   the  sequence  and  structure  of  any  transcripts  overlapping  the  modifica(on.  These   manipula(ons  do  NOT  change  the  underlying  genomic  sequence.     2.  If  you  determine  that  you  need  to  make  one  of  these  changes,  zoom  in  to  the  nucleo(de  level   and  right  click  over  a  single  nucleo(de  on  the  genomic  sequence  to  access  a  menu  that   provides  op(ons  for  crea(ng  inser(ons,  dele(ons  or  subs(tu(ons.     3.  The  ‘Create  Genomic  Inser=on’  feature  will  require  you  to  enter  the  necessary  string  of   nucleo(de  residues  that  will  be  inserted  to  the  right  of  the  cursor’s  current  loca(on.  The   ‘Create  Genomic  Dele=on’  op(on  will  require  you  to  enter  the  length  of  the  dele(on,  star(ng   with  the  nucleo(de  where  the  cursor  is  posi(oned.  The  ‘Create  Genomic  Subs=tu=on’  feature   asks  for  the  string  of  nucleo(de  residues  that  will  replace  the  ones  on  the  DNA  track.   4.  Once  you  have  entered  the  modifica(ons,  Apollo  will  recalculate  the  corrected  transcript  and   protein  sequences,  which  will  appear  when  you  use  the  right-­‐click  menu  ‘Get  Sequence’   op(on.  Since  the  underlying  genomic  sequence  is  reflected  in  all  annota(ons  that  include  the   modified  region  you  should  alert  the  curators  of  your  organisms  database  using  the   ‘Comments’  sec(on  to  report  the  CDS  edits.     5.  In  special  cases  such  as  selenocysteine  containing  proteins  (read-­‐throughs),  right-­‐click  over  the   offending/premature  ‘Stop’  signal  and  choose  the  ‘Set  readthrough  stop  codon’  op(on  from   the  menu.    84 | 84 Becoming Acquainted with Web Apollo. COMPLEX CASES COMPLEX CASES correcting frameshifts, single-base errors, and selenocysteines
  • 85. Follow  the  checklist  un(l  you  are  happy  with  the  annota(on!   And  remember  to…   –  comment  to  validate  your  annota(on,  even  if  you  made  no  changes  to  an   exis(ng  model.  Think  of  comments  as  your  vote  of  confidence.     –  or  add  a  comment  to  inform  the  community  of  unresolved  issues  you   think  this  model  may  have.   85 | 85 Always  Remember:  Web  Apollo  cura(on  is  a  community  effort  so   please  use  comments  to  communicate  the  reasons  for  your     annota(on  (your  comments  will  be  visible  to  everyone).   COMPLETING THE ANNOTATION Becoming Acquainted with Web Apollo.
  • 87. 1.  Can  you  add  UTRs  (e.g.:  via  RNA-­‐Seq)?   2.  Check  exon  structures   3.  Check  splice  sites:  most  splice  sites  display  these   residues  …]5’-­‐GT/AG-­‐3’[…   4.  Check  ‘Start’  and  ‘Stop’  sites   5.  Check  the  predicted  protein  product(s)   –  Align  it  against  relevant  genes/gene  family.   –  blastp  against  NCBI’s  RefSeq  or  nr   6.  If  the  protein  product  s(ll  does  not  look  correct   then  check:   –  Are  there  gaps  in  the  genome?   –  Merge  of  2  gene  predic(ons  on  the  same   scaffold   –  Merge  of  2  gene  predic(ons  from  different   scaffolds     –  Split  a  gene  predic(on   –  FrameshiYs     –  error  in  the  genome  assembly?   –  Selenocysteines,  single-­‐base  errors,  etc   87 | 87 7.  Finalize  annota(on  by  adding:   –  Important  project  informa(on  in  the  form  of   comments   –  IDs  from  public  databases  e.g.  GenBank  (via   DBXRef),  gene  symbol(s),  common  name(s),   synonyms,  top  BLAST  hits,  orthologs  with  species   names,  and  everything  else  you  can  think  of,   because  you  are  the  expert.   –  Whether  your  model  replaces  one  or  more  models   from  the  official  gene  set  (so  it  can  be  deleted).   –  The  kinds  of  changes  you  made  to  the  gene  model   of  interest,  if  any.     –  Any  appropriate  func(onal  assignments  of  interest   to  the  community  (e.g.  via  BLAST,  RNA-­‐Seq  data,   literature  searches,  etc.)   THE CHECKLIST for accuracy and integrity MANUAL ANNOTATION CHECKLIST
  • 89. Example Example 89 A  public  Apollo  Demo  using  the  Honey  Bee  genome  is  available  at     hAp://genomearchitect.org/WebApolloDemo   -­‐  Demonstra(on  using  the  Hyalella  azteca  genome   (amphipod  crustacean).  
  • 90. What do we know about this genome? •  Currently  publicly  available  data  at  NCBI:   •  >37,000    nucleo(de  seqsà  scaffolds,  mitochondrial  genes   •  300    amino  acid  seqsà  mitochondrion   •  53    ESTs   •  0      conserved  domains  iden(fied   •  0    “gene”  entries  submiAed     •  Data  at  i5K  Workspace@NAL  (annota(on  hosted  at  USDA)     -­‐  10,832  scaffolds:  23,288  transcripts:  12,906  proteins   Example 90
  • 91. PubMed Search: 
 what’s new? Example 91
  • 92. PubMed Search: what’s new? Example 92 “Ten  popula(ons  (3  cultures,  7  from  California  water   bodies)  differed  by  at  least  550-­‐fold  in  sensi=vity  to   pyrethroids.”     “By  sequencing  the  primary  pyrethroid  target  site,  the   voltage-­‐gated  sodium  channel  (vgsc),  we  show  that   point  muta(ons  and  their  spread  in  natural  popula(ons   were  responsible  for  differences  in  pyrethroid   sensi(vity.”   “The  finding  that  a  non-­‐target  aqua(c  species  has   acquired  resistance  to  pes(cides  used  only  on  terrestrial   pests  is  troubling  evidence  of  the  impact  of  chronic   pes=cide  transport  from  land-­‐based  applica(ons  into   aqua(c  systems.”  
  • 93. How many sequences for our gene of interest? Example 93 •  Para,  (voltage-­‐gated  sodium  channel  alpha   subunit;  Nasonia  vitripennis).     •  NaCP60E  (Sodium  channel  protein  60  E;  D.   melanogaster).   –  MF:  voltage-­‐gated  ca(on  channel  ac(vity   (IDA,  GO:0022843).   –  BP:  olfactory  behavior  (IMP,  GO: 0042048),  sodium  ion  transmembrane   transport  (ISS,GO:0035725).   –  CC:  voltage-­‐gated  sodium  channel   complex  (IEA,  GO:0001518).   And  what  do  we  know  about  them?  
  • 94. Retrieving sequences for 
 sequence similarity searches. Example 94 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 95. BLAT search Example 95 >vgsc-­‐Segment3-­‐DomainII   RVFKLAKSWPTLNLLISIMGKTVGALGNLTFVLCIIIFIFAVMGMQLFGKNYTEKVTKFKWSQDG QMPRWNFVDFFHSFMIVFRVLCGEWIESMWDCMYVGDFSCVPFFLATVVIGNLVVSFMHR
  • 97. Customizations: 
 high-scoring segment pairs (hsp) in “BLAST+ Results” track Example 97
  • 98. Creating a new gene model: drag and drop Example 98 •  Apollo automatically calculates ORF. In this case, ORF includes the high-scoring segment pairs (hsp).
  • 101. Also, flanking sequences (other gene models) vs. NCBI nr Example 101 In  this  case,  two  gene   models  upstream,  at  5’   end.   BLAST  hsps  
  • 102. Review alignments Example 102 HaztTmpM006234   HaztTmpM006233   HaztTmpM006232  
  • 103. Hypothesis for vgsc gene model Example 103
  • 104. Editing: merge the three models Example 104 Merge  by  dropping  an   exon  or  gene  model   onto  another.   Merge  by  selec(ng   two  exons  (holding   down  “Shio”)  and   using  the  right  click   menu.   or…  
  • 105. Editing: correct boundaries, delete exons Example 105 Modify  exon  /  intron   boundary:     -­‐  Drag  the  end  of  the   exon  to  the  nearest   canonical  splice  site.   -­‐  Use  right-­‐click  menu.   Delete  first  exon  from   HaztTmpM006233  
  • 106. Editing: set translation start Example 106
  • 107. Editing: modify boundaries Example 107 Modify  intron  /   exon  boundary   also  at  coord.   78,999.  
  • 108. Finished model Example 108 Corroborate  integrity  and  accuracy  of  the  model:     -­‐  Start  and  Stop   -­‐  Exon  structure  and  splice  sites  …]5’-­‐GT/AG-­‐3’[…   -­‐  Check  the  predicted  protein  product  vs.  NCBI  nr  
  • 109. Information Editor •  DBXRefs:  e.g.  NP_001128389.1,  N.   vitripennis,  RefSeq   •  PubMed  iden(fier:  PMID:  24065824   •  Gene  Ontology  IDs:  GO:0022843,  GO: 0042048,  GO:0035725,  GO:0001518.   •  Comments.   •  Name,  Symbol.     •  Approve  /  Delete  radio  buAon.   Example 109 Comments   (if  applicable)  
  • 111. APOLLO
 demonstration DEMO 111 Apollo  demo  video  available  at:     hAps://youtu.be/VgPtAP_fvxY  
  • 112. CONTENIDO
 Web  Apollo  Collabora(ve  Cura(on  and     Interac(ve  Analysis  of  Genomes   112OUTLINE •  BIO-­‐REFRESHER   conceptos  que  neceistamos   •  ANOTACION   predicciones  automá(cas   •  ANOTACION  MANUAL   necesaria,  en  colaboración     •  APOLLO   avanzando  la  curación  en  colaboración     •  EJEMPLO   demonstraciones   •  EJERCICIOS  
  • 114.
  • 115. Exercises Live  Demonstra(on  using  the  Apis  mellifera  genome.   115 1.  Evidence  in  support  of  protein  coding  gene   models.       1.1  Consensus  Gene  Sets:   Official  Gene  Set  v3.2   Official  Gene  Set  v1.0     1.2  Consensus  Gene  Sets  comparison:   OGSv3.2  genes  that  merge  OGSv1.0  and   RefSeq  genes   OGSv3.2  genes  that  split  OGSv1.0  and  RefSeq   genes     1.3  Protein  Coding  Gene  Predic=ons  Supported  by   Biological  Evidence:   NCBI  Gnomon   Fgenesh++  with  RNASeq  training  data   Fgenesh++  without  RNASeq  training  data   NCBI  RefSeq  Protein  Coding  Genes  and  Low  Quality   Protein  Coding  Genes   1.4  Ab  ini,o  protein  coding  gene  predic=ons:   Augustus  Set  12,  Augustus  Set  9,  Fgenesh,  GeneID,   N-­‐SCAN,  SGP2     1.5  Transcript  Sequence  Alignment:   NCBI  ESTs,  Apis  cerana  RNA-­‐Seq,  Forager  Bee  Brain   Illumina  Con(gs,  Nurse  Bee  Brain  Illumina  Con(gs,   Forager  RNA-­‐Seq  reads,  Nurse  RNA-­‐Seq  reads,   Abdomen  454  Con(gs,  Brain  and  Ovary  454   Con(gs,  Embryo  454  Con(gs,  Larvae  454  Con(gs,   Mixed  Antennae  454  Con(gs,  Ovary  454  Con(gs   Testes  454  Con(gs,  Forager  RNA-­‐Seq  HeatMap,   Forager  RNA-­‐Seq  XY  Plot,  Nurse  RNA-­‐Seq   HeatMap,  Nurse  RNA-­‐Seq  XY  Plot     Becoming Acquainted with Web Apollo.
  • 116. Exercises Live  Demonstra(on  using  the  Apis  mellifera  genome.   116 1.  Evidence  in  support  of  protein  coding  gene   models  (Con=nued).     1.6  Protein  homolog  alignment:   Acep_OGSv1.2   Aech_OGSv3.8   Cflo_OGSv3.3   Dmel_r5.42   Hsal_OGSv3.3   Lhum_OGSv1.2   Nvit_OGSv1.2   Nvit_OGSv2.0   Pbar_OGSv1.2   Sinv_OGSv2.2.3   Znev_OGSv2.1   Metazoa_Swissprot       2.  Evidence  in  support  of  non  protein  coding  gene   models     2.1  Non-­‐protein  coding  gene  predic=ons:   NCBI  RefSeq  Noncoding  RNA   NCBI  RefSeq  miRNA     2.2  Pseudogene  predic=ons:   NCBI  RefSeq  Pseudogene   Becoming Acquainted with Web Apollo.
  • 117. Instrucciones 117 | 117 APOLLO EN LA WEB
 instrucciones Servidor   URL   1  hAp://54.94.132.228:8080/apollo/annotator/index   2  hAp://54.94.132.228:8080/apollo/annotator/index   3  hAp://54.94.132.228:8080/apollo/annotator/index   4  hAp://54.94.132.228:8080/apollo/annotator/index   5  hAp://54.94.132.228:8080/apollo/annotator/index   Email:   nombre.apellido@example.com     Contraseña:   nombreapellido   Email   Contraseña   Servidor   Empezar  en   user.one@example.com   userone   1   1   user.two@example.com   usertwo   2   1   user.three@example.com   userthree   3   1   user.four@example.com   userfour   4   1   user.five@example.com   userfive   5   1   user.six@example.com   usersix   1   7   user.seven@example.com   userseven   2   7   user.eight@example.com   usereight   3   7   user.nine@example.com   usernine   4   7   user.ten@example.com   userten   5   7   user.eleven@example.com   usereleven   1   1   user.twelve@example.com   usertwelve   2   1   user.thirteen@example.com   userthirteen   3   1   user.fourteen@example.com   userfourteen   4   1   user.fioeen@example.com   userfioeen   5   1   user.sixteen@example.com   usersixteen   1   7   user.seventeen@example.com   userseventeen   2   7   user.eighAeen@example.com   usereighteen   3   7   user.nineteen@example.com   usernineteen   4   7   user.twenty@example.com   usertwenty   5   7   user.twentyone@example.com   usertwentyone   1   1   user.twentytwo@example.com   usertwentytwo   2   1   user.twentythree@example.com   usertwentythree   3   1   user.twentyfour@example.com   usertwentyfour   4   1   user.twentyfive@example.com   usertwentyfive   5   1   user.twentysix@example.com   usertwentysix   1   7   user.twentyseven@example.com   usertwentyseven   2   7   user.twentyeight@example.com   usertwentyeight   3   7   user.twentynine@example.com   usertwentynine   4   7  
  • 118. Thank you. 118 •  Berkeley  Bioinforma=cs  Open-­‐source  Projects  (BBOP),   Berkeley  Lab:  Apollo  and  Gene  Ontology  teams.  Suzanna   E.  Lewis  (PI).   •  §  Chris1ne  G.  Elsik  (PI).  University  of  Missouri.     •  *  Ian  Holmes  (PI).  University  of  California  Berkeley.   •  Arthropod  genomics  community:  i5K  Steering   CommiAee  (esp.  Sue  Brown  (Kansas  State)),  Alexie   Papanicolaou  (UWS),  and  the  Honey  Bee  Genome   Sequencing  Consor(um.   •  Stephen  Ficklin  GenSAS  Washington  State  University   •  Apollo  is  supported  by  NIH  grants  5R01GM080203  from   NIGMS,  and  5R01HG004483  from  NHGRI.  Both  projects   are  also  supported  by  the  Director,  Office  of  Science,   Office  of  Basic  Energy  Sciences,  of  the  U.S.  Department   of  Energy  under  Contract  No.  DE-­‐AC02-­‐05CH11231   •      •  For  your  a"en=on,  thank  you!   Apollo   Nathan  Dunn   Colin  Diesh  §   Deepak  Unni  §       Gene  Ontology   Chris  Mungall   Seth  Carbon   Heiko  Dietze     BBOP   Apollo:  hAp://GenomeArchitect.org     GO:  hAp://GeneOntology.org   i5K:  hAp://arthropodgenomes.org/wiki/i5K   ¡Gracias!   NAL  at  USDA   Monica  Poelchau   Christopher  Childers   Gary  Moore   HGSC  at  BCM   fringy  Richards   Kim  Worley     JBrowse          Eric  Yao  *