wings2014 Workshop 1 Design, sequence, align, count, visualize

Workshops
in
next-‐genera1on

science
at
UNC
Charlo7e
2014

Workshop
1
-‐
Design,
sequence,

align,
count,
visualize

1

Workshop
Loca1ons

•  Sec$on
1
-‐
Room
801

– Ann
Loraine,
UNC
Charlo7e

– Naim
Matasci,
University
of
Arizona,
iPlant

•  Sec$on
2
-‐
Room
802

– Ivory
Clabaugh
Blakley,
UNC
Charlo7e

– Xiangqin
Cui,
University
of
Alabama
Birmingham

•  Please
stay
in
your
sec$on

– Cover
same
material,
but
1ming
may
vary

2

Meet
your
TAs

•  Graduate
students
from
UNCC
Dept
of

Bioinforma1cs
and
Genomics

–  801
Roshonda
Barner,
Ibro
Mujacic,
Chi-‐Yu
"Jack"
Yen,

Warren
(G.)
Cole,
Tony
Dao,
Greg
Linchango,
Sushma

Madamanchi,
Anuja
Jain

–  802
Richard
Linchangco,

Fred
Lin,
Chris
Ball,
Lu
Tian,

Shawn
Chaﬃn,
Natascha
Moestl,
Walter
Clemens,

Adriano
Schneider

•  Loraine
Lab
members

–  801
Kyle
Su7lemyre
(IGB
support),
April
Estrada

(Research
Specialist,
Expert
IGB
User)

–  802
David
Norris
(IGB
Developer)

3

Schedule

•  Workshop
1
-‐
planning
an
experiment,
data

processing,
visualiza1on

– 9:00
to
11:30,
then
Lunch

•  Workshop
2
-‐
introduc1on
to
R
&
RStudio
for

data
analysis,
diﬀeren1al
expression

– 12:30
to
2:30,
then
a
30'
Break

•  Workshop
3
-‐
biological
interpreta1on
using

pathway
tools,
Gene
Ontology,
the
Web

– 3:00
to

5:00,
then
Done

4

Using
RNA-‐Seq
data
set
for
WiNGS2014

5

pollennetwork.org

•  Sponsored
by
Pollen
Research
Coordina1on

Network
in
Integra1ve
Pollen
Biology
(annual

mee1ng
starts
tonite)

•  Visit
Web
site
for
more
info

RNA-‐Seq
data
set
for
the
workshop

•  Goal:
Provide
resources
for
pollen
biology

–  Example
RNA-‐Seq
data
analysis

–  Catalog
of
genes
expressed
in
pollen

–  Highlight
important
area
of
pollen
research

•  Problem:
Pollen
in
some
plant
species
is
vulnerable
to

heat
stress,
reduces
yields

–  Exposure
to
mild
heat
stress
(acclima$on)
can
protect

against
more
severe
stress
later
-‐
called
acquired

thermotolerance
(Firon
2012)

•  To
learn
more,
we
sequenced
RNA
extracted
from

pollen
undergoing
a
mild
heat
stress

–  Same
temperature
that
can
establish
thermotolerance

6

Samples
from
the
lab
of
Nurit
Firon,

Volcani
Ins1tute,
Israel

•  Firon
lab
studies
eﬀects
of
heat
stress
on

tomato
pollen

•  Showed
(along
with
others)
that
high
temp.

reduces
pollen
viability,
sugar
content

•  Studying
a
heat-‐tolerant
tomato
cul1var:

Hazera
3042

– Pollen
is
sensi1ve
to
heat
stress
but
not
as
much

as
other
varie1es

7

Nurit's
experiment:
RNA-‐Seq
of
heat-‐
tolerant
tomato
cul1var
Hazera
3042

•  Collected
pollen
from
plants
growing
in

temperature-‐controlled
greenhouses

–  Control
25/18°
C
op$mal
temperature

–  Treatment
32/26°
C
mild
chronic
heat
stress

•  Collected
batches
of
pollen
from
~
10
plants

during
Sep.
&
Oct
2013

–  One
treatment,
one
control
per
collec1on

–  Made
RNA
from
ﬁve
collec1ons,
5
treatment,
5

control
"batches"

– 
sequenced
at
UCLA
(69
base,
PE)

8

Arabidopsis
cold
stress
RNA-‐Seq

•  Simpler
data
set
with
one
treatment
&
control

–  Using
data
from
part
of
chr1,
treatment
sample
to

illustrate
data
processing,
visualiza1on,
eﬀects
of

parameter
seongs
on
results
(maximum
intron
size
in

tophat
spliced
alignment
program)

•  For
details,
see:

–  experiment
record
at
the
Short
Read
Archive
h7p://www.ncbi.nlm.nih.gov/sra/SRP029896

–  sample
h7p://www.ncbi.nlm.nih.gov/sra/SRX348640

•  Published
in
Methods
in
Molecular
Biology

h7p://www.ncbi.nlm.nih.gov/pubmed/24792048

9

Workshop
1:
RNA-‐Seq:
Design,

sequence,
align,
count,
visualize

wings
2014

10
10

Goals

•  Learn
the
basics
(20')

– Plan
an
experiment

– Library
prep
for
RNA-‐Seq

– Illumina
sequencing

•  Prac1ce:
Quality
analysis
using
FastQC
(30')

•  Prac1ce:
Data
processing
(30')

– Align
reads
(make
BAM
files
and
junc1on
files)

– Make
counts
files
for
sta1s1cal
analysis

– Merge
reads
into
transcript
models
w/
Cufflinks

•  Prac1ce:
Visualize
results
in
IGB
(60')

– Compare
to
data
set
in
Galaxy,
TAIR10
gene
models

11

Visualiza1on
using
IGB

FASTQ
ﬁles

WildType1a.fastq
Work
Shop
2

Workshop
1

Overview
FASTQC

Alignment

onto
Genome

$Command Line…
WildType1a.bam
Genera1on
of
Counts
Data

Counts.txt
Sequencing
Strategy

RNA-‐seq:
ultra-‐high
throughput
cDNA

sequencing

•  Several
papers
published
in
2008,
ﬁrst
in
May

13
h7p://blog.sbgenomics.com/rna-‐seq-‐the-‐ﬁrst-‐wave/

Ecker
lab

Snyder
lab

999
cites

1,076
cites

Mortazavi
2008
"Mapping
and

quan1fying
mammalian
transcriptomes

by
RNA-‐Seq"
Nature
Methods

•  Published
later
in
2008,

but
>
3000
cita1ons

• 
Why?
Maybe
because

emphasized
RNA-‐Seq
as

replacement
for

expression
DNA

microarrays

•  Comment
in
same
issue:

"Beginning
of
the
end
for

microarrays?"

14

google
scholar

RNA-Seq Overview - Illumina

~
~
~
~

fragment

synthesize
cDNA

(random
hexamers)
-
-
-
-

-
-
-
-

-
-
-

-
-

-
-

-
-
-
-

-
-
-
-

-
-
-

-
-

-
-

repair

ends

add “A”
bases to 3’
ends

ligate
adapters

extract RNA,
purify polyA+

-
-
-
-
-
-
-
-
-
-
-

amplify

library

reflects RNA
from original
sample

Data, fastq
sequence files

Millions of reads

per library

Map to genome

Count reads
per gene

improve
gene
models

identify

differentially
expressed

genes

alignments

analyze
splicing

and much
more..

prepare
flowcell

Plan experiment

•  Biological replication

•  Sequencing strategy

•  Data analysis strategy

sequence
by
synthesis

collect samples

2. Making Libraries

quality assessment

3. Sequencing

4. Data Analysis

1. Design

15

Five
steps
for
design

1.  Ar1culate
your
ques$ons
or
hypothesis

2.  Deﬁne
your
unit
of
biological
replica1on.

3.  Write
up
your
sample
collec1on
protocol
in

detail

–  Does
the
protocol
allow
you
to
test
your
hypothesis?

4.  Deﬁne
library
synthesis
&
sequencing
strategy

–  Read
lengths,
paired
end
vs.
single
end,
depth,

barcoding

5.  Ask
an
experienced
data
analyst
to
review
your

plan,
revise
needed

16

Image:

David
C
Corney
Ph.
D.

h7p://www.labome.com/method/RNA-‐seq-‐Using-‐Next-‐Genera1on-‐Sequencing.html

Fork
or
"Y"
adapters

size
selec1on

Library
synthesis

17

Y
adapters

contain
indexes,

allow

mul1plexing

Example
library
molecule

Unknown

sequence
Rd1

Rd2

barcode

Universal

adapter

Index

Primer

18

Rd1

Rd2

Rd1
&
Rd
2
are
from
reverse
complements,
might
overlap.

Ref:
h7p://nextgen.mgh.harvard.edu/IlluminaChemistry.html

P5
P7

Flow
cell
prepara1on
&

sequencing
by
synthesis

19

h7ps://www.youtube.com/watch?v=HMyCqWhwB8E

Review:
Paired
End
vs
Single
End

•  Single
End
–
cheaper

•  Paired
End
–
more
expensive

– two
reads
per
fragment

– coun1ng
fragments,
not
reads

– call
normalized
counts
FPKM
not
RPKM

sequenced
in
SE

Sequenced
in
PE

SE

PE

indexed

adapter

20

Get
the
reads
in
a
FASTQ
ﬁle

•  File
contains
millions
of
records

– Each
record
has
four
lines,
represents
ONE

sequence

•  Line
1
–
the
name,
starts
with
@

•  Line
2
–
the
sequence,
starts
at
new
line

•  Line
3
–
some
other
stuﬀ,
op1onal,
starts
with
+

•  Line
4
–
the
quality
scores,
starts
at
new
line

@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!
CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA!
+!
BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII!
base
=
T

score
=
F
=
37

21

Phred
Quality
score
Q

h7p://en.wikipedia.org/wiki/FASTQ_format

Describes
how
exponen1ally
unlikely

it
is
that
a
given
base
call
is
wrong.

Q
=
-‐10
log10
pe

22

h7p://drive5.com/usearch/manual/quality_score.html

Diﬀerent
Illumina
data
processing
pipelines

used
diﬀerent
score
encodings

23

Get
two
files
-‐
Read1
&
Read2
-‐
from

paired
end
sequencing

•  Read1
and
Read2
have
same
read
iden$fier,
are

reverse
complements
of
the
same
fragment

•  Example
is
processing
pipeline
Cassava
1.8,
older

versions
used
different
naming
conven1ons

@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!
CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA!
+!
BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII!
@SN1083:379:H8VA1ADXX:2:1101:1248:2144 2:N:0:12!
CATTTTCGACGTTGTTAATAAGCTCTGCGTACTTGCAAGCTATCTGCGCGAACG!
+!
BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIFFF!
24

R1

R2

Sequence
iden1fier
line
in
Cassava
1.8

25

@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!
machine

run#

flow-‐cell-‐id

lane

1le

x-‐pos

y-‐pos

read#

index

is-‐filtered

(barcode)

control

FastQC

•  Many
groups
use
FastQC
as
a
ﬁrst
pass
quality

assessment

•  Free
from
Babraham
h7p://
www.bioinforma1cs.babraham.ac.uk/
projects/fastqc/

•  Run
interac1vely
(point-‐and-‐click)
or

command
line
(won’t
cover
this)

26

Prac1ce:
Using
FastQC

•  Go
to
Conference
DropBox
link:

–  h7p://bitly.com/rnaseq2014

•  Note
two
folders
–
FastQC
and
FastQC-‐Examples

–  FastQC-‐Examples
has
FastqQC
reports
from
diﬀerent

species,
sample
types
(next
slide)

•  FastQC
folder,
download

–  Example.fastq

–  FastQC_Manual.pdf

•  Start
FastQC,
open
Example.fastq

27

Prac1ce:
Watch
FastQC
video

•  h7ps://www.youtube.com/watch?
v=bz93ReOv87Y
(start
around
34
sec)

•  Take-‐home
#1:
FastQC
assesses
whether
your

data
ﬁles
are
typical

•  Take-‐home
#2:
A
"bad
result"
from
FastQC

doesn't
always
mean
your
data
are
not
useful

or
valuable

•  Explore
on
your
own!
(~
15
minutes)

28

Prac1ce:
View
reports
in
Fastqc-‐
Examples
(~
15
min)

•  Blueberry

– OnealRipe_1

– OzarkblueGreen_1

• 
Tomato
pollen

– T2_1

– C2_1

•  Rice

– Control2h-‐R2

Per
read
%GC

29

Prac1ce:
Data
processing

•  Double-‐click
"Alignment.tar.gz"
on
your

Desktop
to
unpack
it

•  Also
available
from

h7p://bitly.com/rnaseq2014

30

Prac1ce:
Look
at
"align.sh"

•  Open
Alignment
folder

•  Right-‐click
"align.sh"

•  Select
"open
with
text
editor"

•  This
is
a
shell
script

–  Commands
executed
in
sequence

–  Very
useful
for
automa1ng
tasks

•  First
line
is
"she-‐bang"
line

–  tells
Terminal
it's
a
shell
script

•  All
other
lines
star1ng
with
#
are

comments
(not
run)

31

Learning
the

bash
shell

Great
guide
to

wri1ng
shell

scripts

align.sh
-‐
simple
pipeline
for
RNA-‐
Seq
data
processing

•  Aligns
a
sample
fastq
file

to
genome

–  tophat2, bowtie2!
–  fastq
file
is
from
Arabidopsis
cold
stress
experiment

(Short
Read
Archive
SRX348640)

–  file
ColdTreatment-little.fastq.gz (gzip-‐
compressed,
.gz)

•  Counts
reads
that
align
to
TAIR10
genes

–  featureCounts!
–  only
coun1ng
reads
that
uniquely
align

•  Merges
alignments
into
transcript
models

–  cufflinks!
32

Prac1ce:
Intro
to
Terminal

Terminal
shortcut
on
desktop

–  Program
for
entering
commands
or
running
scripts

–  Also
called
a
"shell"
or
"Unix
shell"

–  Can
open
mul1ple
Terminal
windows

•  Each
window
called
a
"shell"
or
"Unix
shell"

•  Terminal
shows
hierarchical
view
of
ﬁle
system

–  An
upside-‐down
tree,
where
every
folder
is
inside

another
folder

–  Folders
are
also
called
"directories"

–  The
top
folder
(that
contains
everything
else)
is
called

"root"
directory
-‐

/
(forward
slash)

33

Prac1ce:
Open
Terminal,
try
these

commands

•  cd
change
directory

–  by
itself
means
"go
to
user

home
directory"

–  with
an
argument
means:
go

there

–  with
".."
means
go
up
one

•  pwd
-‐
"print
the
current

working
directory"
&
ﬁnd

out
where
you
are

34

Prac1ce:
Try
these
commands

ls lists
ﬁles
and
directories
in

the
current
directory

35

Prac1ce:
Try
these
commands

36

•  ls -l
"list
long"

– report
more
informa1on
about
ﬁles

– "d"
means
it's
a
directory
(folder)

Prac1ce:
Run
align.sh
in
Terminal

•  Go
to
home
directory

•  Go
to
Desktop

•  Go
to
Alignment

•  Run
align.sh

37

Now

Running:

tophat2

spliced

alignment

tool

38

TopHat:
discovering
splice

junc$ons
with
RNA-‐Seq

Cole
Trapnell1,
Lior
Pachter
and

Steven
L.
Salzberg

Figure
1

Tophat
Output
-‐
we'll
open
in
IGB

•  Creates
new
folder
with
files,
including...

•  accepted_hits.bam
-‐
"binary
alignments"
file

contains
read
alignments

–  BAM
-‐
compressed
version
of
SAM
-‐
"sequence
alignment",

needs
index
".bai"
file
(made
using
samtools)

•  junction.bed
-‐
reports
boundaries
of
introns,

called
"junc1on"
features

–  BED
format,
tab-‐delimited
plain
text
file

–  one
junc1on
feature
per
line

–  fi{h
field
is
score,
no.
spliced
reads
aligned
across
the

junc1on

–  see:
h7p://genome.ucsc.edu/FAQ/
FAQformat.html#format1

39

Prac1ce:
Start
IGB
while
script
runs

IGB
desktop
icon

•  Click
Arabidopsis
ﬂower
on
start
screen

40

Prac1ce:
How
to
get
IGB
if
you're
using

your
own
computer

•  Go
to
h7p://bioviz.org

•  Follow
Download
link

•  Choose
Medium
Memory
op1on
(typical)

41

TAIR10
annota1ons,
June
2009

Columbia-‐0
genome
release

•  TAIR10
protein-‐coding
gene
models
loaded

automa1cally
from
IGB
data
server

•  Forward
&
reverse
strand
in
separate
tracks

42

Forward

Reverse

RNA-‐Seq,
ChIP-‐Seq,
other
data
sets

available
in
Data
Access
tab

•  IGB
data
servers,
can
set
up
your
own

43

Arabidopsis
pollen
data
sets

•  Read
alignments,
coverage
graphs,
junc1on
ﬁles

•  From
2013
Plant
Phys.
Pollen
RNA-‐Seq
paper
44

Prac1ce:
Combine
Plus
&
Minus
Tracks

Click
"+/-‐"
to

combine
tracks

45

Use
Data
Management
Table
to
change
track

color,
name,
visibility,
load
op1ons,
strand
op1ons

Summary
of
moving
and
zooming

•  Animated
zooming

–  click
to
posi1on
zoom
stripe,
sets
zoom
focus

–  horizontal
zoom
&
ver1cal
stretch

•  Moving
from
side
to
side
(panning)

–  arrows
in
toolbar

–  hand
icon
-‐
the
move
tool

•  Jump-‐zooming

–  Click-‐drag
coordinate
axis
with
arrow
tool

–  Double-‐click
to
zoom
in
on
a
feature

–  Search
by
name

46

Prac1ce:
Zoom
in
on
a
feature

•  Zoom
in
on
alt-‐spliced
gene
models
*
on
chr1

•  This
is
animated
zooming

47

1.
Click
to
set

zoom
focus
2.
Drag
slider

to
zoom
in

*

Prac1ce:
Click
move
arrows
to
reposi1on

during
zoom

•  Click
data

display
to
re-‐
focus
zoom
on

target
loca1on

48

49

Prac1ce:
Or
use
move
tool
(hand)
to

reposi1on
during
zoom

•  Click
display
to
focus
zoom
on
target

1.
Select

move
tool

(hand)

2.
Click-‐drag

to
move

Prac1ce:
Click-‐drag
sequence
axis
to
jump-‐
zoom
to
a
region

2.
Click
number
line

50

3.
Drag

4.
Release

•  Highlighted
region
becomes
new
view

1.
Select

pointer
tool

Prac1ce:
Jump-‐zoom
to
gene
model

label,
space
a
li7le
above
exon
blocks,
or

intron
to
jump-‐zoom
to
a
gene
model

–  Also
selects
it,
selected
items
outlined
in
red

51

2.
double-‐click

label
or
intron

1.
Select

pointer
tool

A{er
jump-‐zoom,
gene
model
is
selected

•  Arrows
indicate
direc1on
of
transcrip1on

52

Selected
gene

model

outlined
in
red

Prac1ce:
Gene
model
close-‐up

•  Use
ver1cal
slider
to
make
gene
models
taller

•  Increase
window
size
to
make
more
room

53

Drag
slider
to
stretch
ver1cally

Prac1ce:
Interact
with
data
using
pointer.

Select
pointer
(arrow)
in
toolbar

•  Click
intron,
label,
or
region
above
blocks
to
select

whole
gene
model

•  Click
blocks
to
select
parts
of
a
gene
model

•  SHIFT-‐click
to
mul1-‐select

•  CLICK-‐drag
to
select
&
count
everything
in
a
region

•  Selec1on
Info,
top
right,
reports
counts

–  "i"
bu7on
shows
info
if
one
item
selected

54

Prac1ce:
View
edge
Matching

•  Edges
that
match
selected
item
edges
are

highlighted
in
red

•  To
change
edge-‐match
color
choose
File
>

Preferences
>
Other
Op$ons

•  To
turn
oﬀ
or
on,
see
View
>
Edge
Matching

55

Prac1ce:
to
work
with
sequence
data,
click

Load
Sequence

56
•  Sequence
appears
in
Coordinates
track

Prac1ce:
Zoom
in
to
see
amino
acids

•  Note:
Must
load
genomic
sequence
ﬁrst

57

Prac1ce:
Zoom
in
on
end
of
transla1on

•  Click
the
"thick
end"
and
then
zoom
in

•  Note:
Variants
encode
same
C-‐term
amino
acids

58

Prac1ce:
Select
genomic
sequence

1.
Choose

pointer
tool

in
toolbar

2.
Click-‐drag

genomic

sequence
to

select
a
region

3.
CNTRL-‐click

to
copy

•  Length
of
selected
region
reported
in
Selec$on
Info

box
(top
right)

•  Useful
for
designing
primers,
measuring
regions

59

Prac1ce:
Right-‐click
(or
CNTRL-‐click)
gene
model

•  Shows
op1ons
to
run
a
Web
search,
BLAST
search,

view
sequence

60

Prac1ce:
Quick
Search

•  Enter
search
text,
select
op1on

•  Jump-‐zoom
to
selected
gene

61

Choose

At-‐SR30

Zoomed
to
At-‐SR30,
RNA-‐binding

protein
involved
in
splicing

62

Looking
ahead
to
Workshop
3

•  Some
genes
that
were
highly
expressed
in

tomato
pollen
are
annotated
as
"Unknown"

proteins
&
have
no
counterpart
in
Arabidopsis.

•  You
can
use
IGB
to
quickly
ﬁnd
those
genes

and
then
run
BLASTX
or
BLASTP
searches
at

NCBI
to
ﬁnd
out...

– Are
they
unique
to
tomato?

– Could
they
be
non-‐coding?

63

Prac1ce:
Open
ﬁles
from
align.sh!
•  Zoom
out
to
show
more
of
At-‐SR30
region

•  Choose
File
>
Open

– Select
"accepted_hits.bam"
&

"junctions.bed"

•  A
new
empty
track
appears
for
each
ﬁle

•  Click
Load
Data
to
load
reads
and
junc1ons

64

65

read
alignments
stack

reads
at
top
of
stack

not
being
shown
(too

many
to
ﬁt)

66

junc1on
features,

summarizing

spliced
reads

junc1on
features,

summarizing

spliced
reads

Prac1ce:
Conﬁgure
view
-‐
Load

Sequence

67

Click
Load

Sequence
to

load
genomic

bases
for
this

region

Prac1ce:
Conﬁgure
view
-‐
Lock
mRNA
track
height

68

1.
Click
TAIR10
mRNA

track
label
to
select
it

2.
Open

Annota$on
tab

3.
Select
Lock
Track

Height,
enter
170,
click

Apply

Prac1ce:
Conﬁgure
view
-‐
conﬁgure
junc1on
track

69

1.
Click
junc$ons

track
label
to
select

junc1ons
track

2.
Open

Annota$on
tab

3.
Select

score
in
Label

Field

4.
Select
+/-‐

in
Strand

Prac1ce:
Conﬁgure
view
-‐
lock
junc1on
track
height

70

1.
Click
junc$ons

track
label
to

select
it

2.
Open

Annota$on
tab

3.
Select
Lock
Track
Height,

enter
120,
click
Apply

Prac1ce:
Change
read
stack
height
to
see
more
reads

1. 
CNTRL-‐click
(or
right-‐click)
accepted_hits.bam

track
label

2.  Choose
Set
Stack
Height...

71

Prac1ce:
Change
read
stack
height

3.
Enter
50

72

Prac1ce:
Change
read
stack
height
to
see
more
reads

Prac1ce:
Set
mRNA
stack
height

2.
Enter
3
-‐

tallest
stack

has
3
models

73

Note:
Tabs
are
minimized
to
make
more
space

1.
Right-‐click

TAIR10
mRNA

track
label,

choose
Set

Stack
Height

Prac1ce:
Note
read
support
for

alterna1ve
splicing

Take-‐home:
Many
spliced

reads
support
both

variants,
but
there
are
also

many
reads
inside
the

introns,
indica1ng
failure
to

splice.
This
may
be
typical

of
alt-‐spliced
introns?

74

Prac1ce:
Use
junc1on
track
to

quan1fy
support
for
splice
variants

1.  Click-‐drag
to
genes
track

2.  Scores
are
number
of

spliced
reads
suppor1ng

each
junc1on.

75

Prac1ce:
Compare
Cufflinks
GTF
file
to

Gene
models

•  Open
Alignments
>
cufflinks_cold
>

transcripts.gf

76

Prac1ce:
View
Cufflinks
gene

models

77

1.
Click
Load

Data
to
see

Cufflinks

models

2.
Click-‐drag

new
track

next
to
gene

models

3.
Use

ver$cal
slider

to
make
more

room

Take-‐home:

Cufflinks

annota1ons

close,
but

incomplete.

Prac1ce:
Load
data
from
Galaxy

78

1.
Go
to
usegalaxy.org

2.
Open
Shared
Data

3.
Choose

Published

Histories

Prac1ce:
Load
data
from
Galaxy

79

1.
Search
for
Cold

3.
Select
Cold

stress
in

Arabidopsis
(with

default
maximum

intron
size)

Prac1ce:
Load
data
from
Galaxy

•  Illustrates
results
when
tophat
is
run
with
default
seongs:

–  default
maximum
intron
size
is
500,000
bases

•  Tophat
was
developed
with
human
data
in
mind,
where

large
introns
are
common

80

Select

Import

History

Prac1ce:
Select
start
using
this
history

81

82

1.
Select
Treatment
junc1ons

2.
Select
display
in
IGB
View

83

New
tab
opens.
Select

Click
to
go
to
IGB

84

New
track

1.
Click

Load
Data

Prac1ce:
Remove
reads
-‐
don't
need
them
now

85

1.  Right-‐click

accepted_hits.bam

2.
Choose
Delete
Track

86

1.  Zoom
out

all
the
way

2.  Click
Load

Data

Your
data
are
here

87

Take-‐home:
Tophat
run

with
default
parameters

predicts
enormous

introns.
Important
to

understand
parameters

seongs
-‐-‐
defaults
are

not
always
best.

Now
you
can

•  Describe
Illumina
library
synthesis,
sequencing

•  Evaluate
data
quality
using
FastQC

•  Run
a
data
processing
pipeline
(shell
script)

•  View
and
explore
data
in
a
genome
browser

– and
load
data
sets
from
Galaxy,
local
ﬁles

88

Thank
you
for
your
a7en1on!

wings2014 Workshop 1 Design, sequence, align, count, visualize

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à wings2014 Workshop 1 Design, sequence, align, count, visualize

Similaire à wings2014 Workshop 1 Design, sequence, align, count, visualize (20)

Plus de Ann Loraine

Plus de Ann Loraine (15)

Dernier

Dernier (20)

wings2014 Workshop 1 Design, sequence, align, count, visualize