This document provides information about workshops on next-generation science being held at UNC Charlotte in 2014. It details the schedule, locations, instructors, and teaching assistants for Workshop 1 which will cover designing an RNA-Seq experiment, processing and visualizing the resulting data. The workshop will use a real RNA-Seq dataset from tomato pollen undergoing heat stress treatment, with the goal of understanding genes involved in pollen thermotolerance.
1. Workshops
in
next-‐genera1on
science
at
UNC
Charlo7e
2014
Workshop
1
-‐
Design,
sequence,
align,
count,
visualize
1
2. Workshop
Loca1ons
• Sec$on
1
-‐
Room
801
– Ann
Loraine,
UNC
Charlo7e
– Naim
Matasci,
University
of
Arizona,
iPlant
• Sec$on
2
-‐
Room
802
– Ivory
Clabaugh
Blakley,
UNC
Charlo7e
– Xiangqin
Cui,
University
of
Alabama
Birmingham
• Please
stay
in
your
sec$on
– Cover
same
material,
but
1ming
may
vary
2
3. Meet
your
TAs
• Graduate
students
from
UNCC
Dept
of
Bioinforma1cs
and
Genomics
– 801
Roshonda
Barner,
Ibro
Mujacic,
Chi-‐Yu
"Jack"
Yen,
Warren
(G.)
Cole,
Tony
Dao,
Greg
Linchango,
Sushma
Madamanchi,
Anuja
Jain
– 802
Richard
Linchangco,
Fred
Lin,
Chris
Ball,
Lu
Tian,
Shawn
Chaffin,
Natascha
Moestl,
Walter
Clemens,
Adriano
Schneider
• Loraine
Lab
members
– 801
Kyle
Su7lemyre
(IGB
support),
April
Estrada
(Research
Specialist,
Expert
IGB
User)
– 802
David
Norris
(IGB
Developer)
3
4. Schedule
• Workshop
1
-‐
planning
an
experiment,
data
processing,
visualiza1on
– 9:00
to
11:30,
then
Lunch
• Workshop
2
-‐
introduc1on
to
R
&
RStudio
for
data
analysis,
differen1al
expression
– 12:30
to
2:30,
then
a
30'
Break
• Workshop
3
-‐
biological
interpreta1on
using
pathway
tools,
Gene
Ontology,
the
Web
– 3:00
to
5:00,
then
Done
4
5. Using
RNA-‐Seq
data
set
for
WiNGS2014
5
pollennetwork.org
• Sponsored
by
Pollen
Research
Coordina1on
Network
in
Integra1ve
Pollen
Biology
(annual
mee1ng
starts
tonite)
• Visit
Web
site
for
more
info
6. RNA-‐Seq
data
set
for
the
workshop
• Goal:
Provide
resources
for
pollen
biology
– Example
RNA-‐Seq
data
analysis
– Catalog
of
genes
expressed
in
pollen
– Highlight
important
area
of
pollen
research
• Problem:
Pollen
in
some
plant
species
is
vulnerable
to
heat
stress,
reduces
yields
– Exposure
to
mild
heat
stress
(acclima$on)
can
protect
against
more
severe
stress
later
-‐
called
acquired
thermotolerance
(Firon
2012)
• To
learn
more,
we
sequenced
RNA
extracted
from
pollen
undergoing
a
mild
heat
stress
– Same
temperature
that
can
establish
thermotolerance
6
7. Samples
from
the
lab
of
Nurit
Firon,
Volcani
Ins1tute,
Israel
• Firon
lab
studies
effects
of
heat
stress
on
tomato
pollen
• Showed
(along
with
others)
that
high
temp.
reduces
pollen
viability,
sugar
content
• Studying
a
heat-‐tolerant
tomato
cul1var:
Hazera
3042
– Pollen
is
sensi1ve
to
heat
stress
but
not
as
much
as
other
varie1es
7
8. Nurit's
experiment:
RNA-‐Seq
of
heat-‐
tolerant
tomato
cul1var
Hazera
3042
• Collected
pollen
from
plants
growing
in
temperature-‐controlled
greenhouses
– Control
25/18°
C
op$mal
temperature
– Treatment
32/26°
C
mild
chronic
heat
stress
• Collected
batches
of
pollen
from
~
10
plants
during
Sep.
&
Oct
2013
– One
treatment,
one
control
per
collec1on
– Made
RNA
from
five
collec1ons,
5
treatment,
5
control
"batches"
–
sequenced
at
UCLA
(69
base,
PE)
8
9. Arabidopsis
cold
stress
RNA-‐Seq
• Simpler
data
set
with
one
treatment
&
control
– Using
data
from
part
of
chr1,
treatment
sample
to
illustrate
data
processing,
visualiza1on,
effects
of
parameter
seongs
on
results
(maximum
intron
size
in
tophat
spliced
alignment
program)
• For
details,
see:
– experiment
record
at
the
Short
Read
Archive
h7p://www.ncbi.nlm.nih.gov/sra/SRP029896
– sample
h7p://www.ncbi.nlm.nih.gov/sra/SRX348640
• Published
in
Methods
in
Molecular
Biology
h7p://www.ncbi.nlm.nih.gov/pubmed/24792048
9
11. Goals
• Learn
the
basics
(20')
– Plan
an
experiment
– Library
prep
for
RNA-‐Seq
– Illumina
sequencing
• Prac1ce:
Quality
analysis
using
FastQC
(30')
• Prac1ce:
Data
processing
(30')
– Align
reads
(make
BAM
files
and
junc1on
files)
– Make
counts
files
for
sta1s1cal
analysis
– Merge
reads
into
transcript
models
w/
Cufflinks
• Prac1ce:
Visualize
results
in
IGB
(60')
– Compare
to
data
set
in
Galaxy,
TAIR10
gene
models
11
12. Visualiza1on
using
IGB
FASTQ
files
WildType1a.fastq
Work
Shop
2
Workshop
1
Overview
FASTQC
Alignment
onto
Genome
$Command Line…
WildType1a.bam
Genera1on
of
Counts
Data
Counts.txt
Sequencing
Strategy
13. RNA-‐seq:
ultra-‐high
throughput
cDNA
sequencing
• Several
papers
published
in
2008,
first
in
May
13
h7p://blog.sbgenomics.com/rna-‐seq-‐the-‐first-‐wave/
Ecker
lab
Snyder
lab
999
cites
1,076
cites
14. Mortazavi
2008
"Mapping
and
quan1fying
mammalian
transcriptomes
by
RNA-‐Seq"
Nature
Methods
• Published
later
in
2008,
but
>
3000
cita1ons
•
Why?
Maybe
because
emphasized
RNA-‐Seq
as
replacement
for
expression
DNA
microarrays
• Comment
in
same
issue:
"Beginning
of
the
end
for
microarrays?"
14
google
scholar
15. RNA-Seq Overview - Illumina
~
~
~
~
fragment
synthesize
cDNA
(random
hexamers)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
repair
ends
add “A”
bases to 3’
ends
ligate
adapters
extract RNA,
purify polyA+
-
-
-
-
-
-
-
-
-
-
-
amplify
library
reflects RNA
from original
sample
Data, fastq
sequence files
Millions of reads
per library
Map to genome
Count reads
per gene
improve
gene
models
identify
differentially
expressed
genes
alignments
analyze
splicing
and much
more..
prepare
flowcell
Plan experiment
• Biological replication
• Sequencing strategy
• Data analysis strategy
sequence
by
synthesis
collect samples
2. Making Libraries
quality assessment
3. Sequencing
4. Data Analysis
1. Design
15
16. Five
steps
for
design
1. Ar1culate
your
ques$ons
or
hypothesis
2. Define
your
unit
of
biological
replica1on.
3. Write
up
your
sample
collec1on
protocol
in
detail
– Does
the
protocol
allow
you
to
test
your
hypothesis?
4. Define
library
synthesis
&
sequencing
strategy
– Read
lengths,
paired
end
vs.
single
end,
depth,
barcoding
5. Ask
an
experienced
data
analyst
to
review
your
plan,
revise
needed
16
17. Image:
David
C
Corney
Ph.
D.
h7p://www.labome.com/method/RNA-‐seq-‐Using-‐Next-‐Genera1on-‐Sequencing.html
Fork
or
"Y"
adapters
size
selec1on
Library
synthesis
17
Y
adapters
contain
indexes,
allow
mul1plexing
18. Example
library
molecule
Unknown
sequence
Rd1
Rd2
barcode
Universal
adapter
Index
Primer
18
Rd1
Rd2
Rd1
&
Rd
2
are
from
reverse
complements,
might
overlap.
Ref:
h7p://nextgen.mgh.harvard.edu/IlluminaChemistry.html
P5
P7
19. Flow
cell
prepara1on
&
sequencing
by
synthesis
19
h7ps://www.youtube.com/watch?v=HMyCqWhwB8E
20. Review:
Paired
End
vs
Single
End
• Single
End
–
cheaper
• Paired
End
–
more
expensive
– two
reads
per
fragment
– coun1ng
fragments,
not
reads
– call
normalized
counts
FPKM
not
RPKM
sequenced
in
SE
Sequenced
in
PE
SE
PE
indexed
adapter
20
21. Get
the
reads
in
a
FASTQ
file
• File
contains
millions
of
records
– Each
record
has
four
lines,
represents
ONE
sequence
• Line
1
–
the
name,
starts
with
@
• Line
2
–
the
sequence,
starts
at
new
line
• Line
3
–
some
other
stuff,
op1onal,
starts
with
+
• Line
4
–
the
quality
scores,
starts
at
new
line
@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!
CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA!
+!
BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII!
base
=
T
score
=
F
=
37
21
22. Phred
Quality
score
Q
h7p://en.wikipedia.org/wiki/FASTQ_format
Describes
how
exponen1ally
unlikely
it
is
that
a
given
base
call
is
wrong.
Q
=
-‐10
log10
pe
22
24. Get
two
files
-‐
Read1
&
Read2
-‐
from
paired
end
sequencing
• Read1
and
Read2
have
same
read
iden$fier,
are
reverse
complements
of
the
same
fragment
• Example
is
processing
pipeline
Cassava
1.8,
older
versions
used
different
naming
conven1ons
@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!
CCTAAATGGTGCCATGCTAGGAGGCCGTGCCCTTCTTGAAAAGTTGTATGTGAA!
+!
BBBFFFFFFBFFFIIIIFI<FFIIIIIFIIIIFBFIIIIIIIIFFFIIIIFIII!
@SN1083:379:H8VA1ADXX:2:1101:1248:2144 2:N:0:12!
CATTTTCGACGTTGTTAATAAGCTCTGCGTACTTGCAAGCTATCTGCGCGAACG!
+!
BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIIIIIIIIIIIIIIIIIIFFF!
24
R1
R2
25. Sequence
iden1fier
line
in
Cassava
1.8
25
@SN1083:379:H8VA1ADXX:2:1101:1248:2144 1:N:0:12!
machine
run#
flow-‐cell-‐id
lane
1le
x-‐pos
y-‐pos
read#
index
is-‐filtered
(barcode)
control
26. FastQC
• Many
groups
use
FastQC
as
a
first
pass
quality
assessment
• Free
from
Babraham
h7p://
www.bioinforma1cs.babraham.ac.uk/
projects/fastqc/
• Run
interac1vely
(point-‐and-‐click)
or
command
line
(won’t
cover
this)
26
27. Prac1ce:
Using
FastQC
• Go
to
Conference
DropBox
link:
– h7p://bitly.com/rnaseq2014
• Note
two
folders
–
FastQC
and
FastQC-‐Examples
– FastQC-‐Examples
has
FastqQC
reports
from
different
species,
sample
types
(next
slide)
• FastQC
folder,
download
– Example.fastq
– FastQC_Manual.pdf
• Start
FastQC,
open
Example.fastq
27
28. Prac1ce:
Watch
FastQC
video
• h7ps://www.youtube.com/watch?
v=bz93ReOv87Y
(start
around
34
sec)
• Take-‐home
#1:
FastQC
assesses
whether
your
data
files
are
typical
• Take-‐home
#2:
A
"bad
result"
from
FastQC
doesn't
always
mean
your
data
are
not
useful
or
valuable
• Explore
on
your
own!
(~
15
minutes)
28
30. Prac1ce:
Data
processing
• Double-‐click
"Alignment.tar.gz"
on
your
Desktop
to
unpack
it
• Also
available
from
h7p://bitly.com/rnaseq2014
30
31. Prac1ce:
Look
at
"align.sh"
• Open
Alignment
folder
• Right-‐click
"align.sh"
• Select
"open
with
text
editor"
• This
is
a
shell
script
– Commands
executed
in
sequence
– Very
useful
for
automa1ng
tasks
• First
line
is
"she-‐bang"
line
– tells
Terminal
it's
a
shell
script
• All
other
lines
star1ng
with
#
are
comments
(not
run)
31
Learning
the
bash
shell
Great
guide
to
wri1ng
shell
scripts
32. align.sh
-‐
simple
pipeline
for
RNA-‐
Seq
data
processing
• Aligns
a
sample
fastq
file
to
genome
– tophat2, bowtie2!
– fastq
file
is
from
Arabidopsis
cold
stress
experiment
(Short
Read
Archive
SRX348640)
– file
ColdTreatment-little.fastq.gz (gzip-‐
compressed,
.gz)
• Counts
reads
that
align
to
TAIR10
genes
– featureCounts!
– only
coun1ng
reads
that
uniquely
align
• Merges
alignments
into
transcript
models
– cufflinks!
32
33. Prac1ce:
Intro
to
Terminal
• Double-‐click
Terminal
shortcut
on
desktop
– Program
for
entering
commands
or
running
scripts
– Also
called
a
"shell"
or
"Unix
shell"
– Can
open
mul1ple
Terminal
windows
• Each
window
called
a
"shell"
or
"Unix
shell"
• Terminal
shows
hierarchical
view
of
file
system
– An
upside-‐down
tree,
where
every
folder
is
inside
another
folder
– Folders
are
also
called
"directories"
– The
top
folder
(that
contains
everything
else)
is
called
"root"
directory
-‐
/
(forward
slash)
33
34. Prac1ce:
Open
Terminal,
try
these
commands
• cd
change
directory
– by
itself
means
"go
to
user
home
directory"
– with
an
argument
means:
go
there
– with
".."
means
go
up
one
• pwd
-‐
"print
the
current
working
directory"
&
find
out
where
you
are
34
35. Prac1ce:
Try
these
commands
ls lists
files
and
directories
in
the
current
directory
35
36. Prac1ce:
Try
these
commands
36
• ls -l
"list
long"
– report
more
informa1on
about
files
– "d"
means
it's
a
directory
(folder)
37. Prac1ce:
Run
align.sh
in
Terminal
• Go
to
home
directory
• Go
to
Desktop
• Go
to
Alignment
• Run
align.sh
37
38. Now
Running:
tophat2
spliced
alignment
tool
38
TopHat:
discovering
splice
junc$ons
with
RNA-‐Seq
Cole
Trapnell1,
Lior
Pachter
and
Steven
L.
Salzberg
Figure
1
39. Tophat
Output
-‐
we'll
open
in
IGB
• Creates
new
folder
with
files,
including...
• accepted_hits.bam
-‐
"binary
alignments"
file
contains
read
alignments
– BAM
-‐
compressed
version
of
SAM
-‐
"sequence
alignment",
needs
index
".bai"
file
(made
using
samtools)
• junction.bed
-‐
reports
boundaries
of
introns,
called
"junc1on"
features
– BED
format,
tab-‐delimited
plain
text
file
– one
junc1on
feature
per
line
– fi{h
field
is
score,
no.
spliced
reads
aligned
across
the
junc1on
– see:
h7p://genome.ucsc.edu/FAQ/
FAQformat.html#format1
39
40. Prac1ce:
Start
IGB
while
script
runs
• Double-‐click
IGB
desktop
icon
• Click
Arabidopsis
flower
on
start
screen
40
41. Prac1ce:
How
to
get
IGB
if
you're
using
your
own
computer
• Go
to
h7p://bioviz.org
• Follow
Download
link
• Choose
Medium
Memory
op1on
(typical)
41
42. TAIR10
annota1ons,
June
2009
Columbia-‐0
genome
release
• TAIR10
protein-‐coding
gene
models
loaded
automa1cally
from
IGB
data
server
• Forward
&
reverse
strand
in
separate
tracks
42
Forward
Reverse
44. Arabidopsis
pollen
data
sets
• Read
alignments,
coverage
graphs,
junc1on
files
• From
2013
Plant
Phys.
Pollen
RNA-‐Seq
paper
44
45. Prac1ce:
Combine
Plus
&
Minus
Tracks
Click
"+/-‐"
to
combine
tracks
45
Use
Data
Management
Table
to
change
track
color,
name,
visibility,
load
op1ons,
strand
op1ons
46. Summary
of
moving
and
zooming
• Animated
zooming
– click
to
posi1on
zoom
stripe,
sets
zoom
focus
– horizontal
zoom
&
ver1cal
stretch
• Moving
from
side
to
side
(panning)
– arrows
in
toolbar
– hand
icon
-‐
the
move
tool
• Jump-‐zooming
– Click-‐drag
coordinate
axis
with
arrow
tool
– Double-‐click
to
zoom
in
on
a
feature
– Search
by
name
46
47. Prac1ce:
Zoom
in
on
a
feature
• Zoom
in
on
alt-‐spliced
gene
models
*
on
chr1
• This
is
animated
zooming
47
1.
Click
to
set
zoom
focus
2.
Drag
slider
to
zoom
in
*
48. Prac1ce:
Click
move
arrows
to
reposi1on
during
zoom
• Click
data
display
to
re-‐
focus
zoom
on
target
loca1on
48
49. 49
Prac1ce:
Or
use
move
tool
(hand)
to
reposi1on
during
zoom
• Click
display
to
focus
zoom
on
target
1.
Select
move
tool
(hand)
2.
Click-‐drag
to
move
50. Prac1ce:
Click-‐drag
sequence
axis
to
jump-‐
zoom
to
a
region
2.
Click
number
line
50
3.
Drag
4.
Release
• Highlighted
region
becomes
new
view
1.
Select
pointer
tool
51. Prac1ce:
Jump-‐zoom
to
gene
model
• Double-‐click
label,
space
a
li7le
above
exon
blocks,
or
intron
to
jump-‐zoom
to
a
gene
model
– Also
selects
it,
selected
items
outlined
in
red
51
2.
double-‐click
label
or
intron
1.
Select
pointer
tool
52. A{er
jump-‐zoom,
gene
model
is
selected
• Arrows
indicate
direc1on
of
transcrip1on
52
Selected
gene
model
outlined
in
red
53. Prac1ce:
Gene
model
close-‐up
• Use
ver1cal
slider
to
make
gene
models
taller
• Increase
window
size
to
make
more
room
53
Drag
slider
to
stretch
ver1cally
54. Prac1ce:
Interact
with
data
using
pointer.
Select
pointer
(arrow)
in
toolbar
• Click
intron,
label,
or
region
above
blocks
to
select
whole
gene
model
• Click
blocks
to
select
parts
of
a
gene
model
• SHIFT-‐click
to
mul1-‐select
• CLICK-‐drag
to
select
&
count
everything
in
a
region
• Selec1on
Info,
top
right,
reports
counts
– "i"
bu7on
shows
info
if
one
item
selected
54
55. Prac1ce:
View
edge
Matching
• Edges
that
match
selected
item
edges
are
highlighted
in
red
• To
change
edge-‐match
color
choose
File
>
Preferences
>
Other
Op$ons
• To
turn
off
or
on,
see
View
>
Edge
Matching
55
56. Prac1ce:
to
work
with
sequence
data,
click
Load
Sequence
56
• Sequence
appears
in
Coordinates
track
57. Prac1ce:
Zoom
in
to
see
amino
acids
• Note:
Must
load
genomic
sequence
first
57
58. Prac1ce:
Zoom
in
on
end
of
transla1on
• Click
the
"thick
end"
and
then
zoom
in
• Note:
Variants
encode
same
C-‐term
amino
acids
58
59. Prac1ce:
Select
genomic
sequence
1.
Choose
pointer
tool
in
toolbar
2.
Click-‐drag
genomic
sequence
to
select
a
region
3.
CNTRL-‐click
to
copy
• Length
of
selected
region
reported
in
Selec$on
Info
box
(top
right)
• Useful
for
designing
primers,
measuring
regions
59
60. Prac1ce:
Right-‐click
(or
CNTRL-‐click)
gene
model
• Shows
op1ons
to
run
a
Web
search,
BLAST
search,
view
sequence
60
61. Prac1ce:
Quick
Search
• Enter
search
text,
select
op1on
• Jump-‐zoom
to
selected
gene
61
Choose
At-‐SR30
63. Looking
ahead
to
Workshop
3
• Some
genes
that
were
highly
expressed
in
tomato
pollen
are
annotated
as
"Unknown"
proteins
&
have
no
counterpart
in
Arabidopsis.
• You
can
use
IGB
to
quickly
find
those
genes
and
then
run
BLASTX
or
BLASTP
searches
at
NCBI
to
find
out...
– Are
they
unique
to
tomato?
– Could
they
be
non-‐coding?
63
64. Prac1ce:
Open
files
from
align.sh!
• Zoom
out
to
show
more
of
At-‐SR30
region
• Choose
File
>
Open
– Select
"accepted_hits.bam"
&
"junctions.bed"
• A
new
empty
track
appears
for
each
file
• Click
Load
Data
to
load
reads
and
junc1ons
64
65. 65
read
alignments
stack
reads
at
top
of
stack
not
being
shown
(too
many
to
fit)
67. Prac1ce:
Configure
view
-‐
Load
Sequence
67
Click
Load
Sequence
to
load
genomic
bases
for
this
region
68. Prac1ce:
Configure
view
-‐
Lock
mRNA
track
height
68
1.
Click
TAIR10
mRNA
track
label
to
select
it
2.
Open
Annota$on
tab
3.
Select
Lock
Track
Height,
enter
170,
click
Apply
69. Prac1ce:
Configure
view
-‐
configure
junc1on
track
69
1.
Click
junc$ons
track
label
to
select
junc1ons
track
2.
Open
Annota$on
tab
3.
Select
score
in
Label
Field
4.
Select
+/-‐
in
Strand
70. Prac1ce:
Configure
view
-‐
lock
junc1on
track
height
70
1.
Click
junc$ons
track
label
to
select
it
2.
Open
Annota$on
tab
3.
Select
Lock
Track
Height,
enter
120,
click
Apply
71. Prac1ce:
Change
read
stack
height
to
see
more
reads
1.
CNTRL-‐click
(or
right-‐click)
accepted_hits.bam
track
label
2. Choose
Set
Stack
Height...
71
72. Prac1ce:
Change
read
stack
height
3.
Enter
50
72
Prac1ce:
Change
read
stack
height
to
see
more
reads
73. Prac1ce:
Set
mRNA
stack
height
2.
Enter
3
-‐
tallest
stack
has
3
models
73
Note:
Tabs
are
minimized
to
make
more
space
1.
Right-‐click
TAIR10
mRNA
track
label,
choose
Set
Stack
Height
74. Prac1ce:
Note
read
support
for
alterna1ve
splicing
Take-‐home:
Many
spliced
reads
support
both
variants,
but
there
are
also
many
reads
inside
the
introns,
indica1ng
failure
to
splice.
This
may
be
typical
of
alt-‐spliced
introns?
74
75. Prac1ce:
Use
junc1on
track
to
quan1fy
support
for
splice
variants
1. Click-‐drag
to
genes
track
2. Scores
are
number
of
spliced
reads
suppor1ng
each
junc1on.
75
76. Prac1ce:
Compare
Cufflinks
GTF
file
to
Gene
models
• Open
Alignments
>
cufflinks_cold
>
transcripts.gf
76
77. Prac1ce:
View
Cufflinks
gene
models
77
1.
Click
Load
Data
to
see
Cufflinks
models
2.
Click-‐drag
new
track
next
to
gene
models
3.
Use
ver$cal
slider
to
make
more
room
Take-‐home:
Cufflinks
annota1ons
close,
but
incomplete.
78. Prac1ce:
Load
data
from
Galaxy
78
1.
Go
to
usegalaxy.org
2.
Open
Shared
Data
3.
Choose
Published
Histories
79. Prac1ce:
Load
data
from
Galaxy
79
1.
Search
for
Cold
3.
Select
Cold
stress
in
Arabidopsis
(with
default
maximum
intron
size)
80. Prac1ce:
Load
data
from
Galaxy
• Illustrates
results
when
tophat
is
run
with
default
seongs:
– default
maximum
intron
size
is
500,000
bases
• Tophat
was
developed
with
human
data
in
mind,
where
large
introns
are
common
80
Select
Import
History
85. Prac1ce:
Remove
reads
-‐
don't
need
them
now
85
1. Right-‐click
accepted_hits.bam
2.
Choose
Delete
Track
86. 86
1. Zoom
out
all
the
way
2. Click
Load
Data
Your
data
are
here
87. 87
Take-‐home:
Tophat
run
with
default
parameters
predicts
enormous
introns.
Important
to
understand
parameters
seongs
-‐-‐
defaults
are
not
always
best.
88. Now
you
can
• Describe
Illumina
library
synthesis,
sequencing
• Evaluate
data
quality
using
FastQC
• Run
a
data
processing
pipeline
(shell
script)
• View
and
explore
data
in
a
genome
browser
– and
load
data
sets
from
Galaxy,
local
files
88
Thank
you
for
your
a7en1on!