The Arabidopsis Information Portal (araport.org) is a resource for the plant genomics research community. The AIP conducts developer workshops to help other labs get involved. This presentation introduces the web site with a case study about contributing new module built around a legacy data set.
4. 4
JBrowse
Data
types
Ac+ons
Chromosomes
Scroll
&
zoom
Transcripts
Track
layering
Exons
Data
integraDon
5. 5
ThaleMine
Data
types
Ac+ons
FuncDon
Search
InteracDon
Drill
down
Expression
List
manipulaDon
PublicaDons
Save
results
6. 6
Science
Apps
Growing
list
of
applicaDons.
Contributed
by
community
members.
VisualizaDon
apps,
computaDon
apps.
Eleanor
Pence,
2014
summer
intern.
7. Arabidopsis
InformaDon
Portal
7
ThaleMine
• Instance
of
InterMine
soUware
• Classic
data
mart:
• Data
snapshot
• OpDmized
for
retrieval
• Many
roles
within
AIP
• InteracDve
app
• Ontology
master
• Index
of
terms
• Web
services
engine
• Web
services:
• Standard
services
• User
template
queries
JBrowse
• InteracDve
browser
app
• Scroll
&
semanDc
zoom
• Displays
GFF,
BED,
BAM
• Consumes
web
services
Tripal
• Chado
database
explorer
• AIP
community
annotaDon
Science
Apps
• Viz
tools
&
front
ends
• Community
contributed
• Consume
web
services
Adama
• Community
web
services
• AIP
data
API
manager
• MediaDon
&
pass-‐thru
ThaleMine
• InteracDve
data
mining
app
• Build
your
own
template
query
9. Incoming
Skills
9
What is it? Read aabout it Have used it Use it a lot Could teach it
HTML/CSS 0% 0% 17% 44% 39%
Client-side Javascript 0% 17% 17% 56% 11%
JSON 0% 22% 22% 44% 11%
XML 0% 0% 28% 56% 17%
REST 11% 39% 22% 28% 0%
Oauth2 50% 50% 0% 0% 0%
Web services 0% 6% 28% 61% 6%
Git and Github 0% 0% 50% 39% 11%
Mobile-responsive design 6% 56% 33% 6% 0%
cURL 22% 28% 44% 6% 0%
Framework like Bootstrap.js 17% 44% 11% 22% 6%
Python 0% 6% 39% 39% 17%
Javascript 0% 0% 39% 44% 17%
Java 0% 6% 44% 22% 28%
Perl 0% 22% 39% 28% 11%
average
7%
20%
29%
33%
11%
Survey
taken
Sep-‐Oct
2014
in
advance
of
Araport
Developer
Workshop,
Nov
5-‐6,
at
TACC.
10. The
TIGR
Catalog
of
the
Arabidopsis
Transcriptome:
a
case
study
for
exposing
legacy
data
Presented
by
Jason
Miller,
JCVI
10
11. Web
Design
for
Dynamic
Pages
CSS
JS
Server
Browser
DB
DB
HTML
<form>
CGI
HTML
<table>
URL
HTML
<script>
WebServices
JavaScript
<table>
URL
HTML3
CSS3
HTML5
Server
Browser
TradiDonal
Requests
by
HTTP(s)
GET
or
POST.
Server-‐side
staDc
HTML
content.
Server/database
interacDon.
Content
generaDon
by
e.g.
perl
CGI.
Deliver
content-‐type=HTML
(etc.).
Modern
Requests
by
HTTP(s)
GET
or
POST.
Browser-‐side
dynamic
HTML
content.
Browser/database
interacDon.
WebServices
by
e.g.
perl
CGI.
Deliver
content-‐type=JSON
(etc.).
HTTP
HTTP
sta-c
files
ac-ve
code
11
12. Expression
Data
12
RT-‐qPCR
expression
values
for
3000
genes
of
interest
from
8
experiment
types
(single
Dssue
or
single
condiDon).
Images
of
localized
expression
for
GFP
reporter
+
promoter
construct
for
1000
genes
in
1000
ecotypes,
135K
images
in
all.
hjp://www.jcvi.org/arabidopsis/qpcr/
This
data
was
collected
at
TIGR
(now
JCVI)
by
Chris
Town
with
funding
from
the
NSF.
13. 13
The
data
offers
exciDng
possibiliDes
for
apps:
• Compare
expression
root
vs
leaf
• Compare
expression
per
treatment
• Correlate
images
to
genotypes
• Integrate
data
from
other
sources
14. Legacy
web
architecture
PHP
mySQL
HTML
form
perl
HTML
table
server
browser
14
Human-‐facing
front
end
in
HTML
form.
Srcripts
and
database
on
back
end.
Jun
Zhuang,
2009.
Hui
Quan,
2007.
15. Legacy
web
code
15
<?php
$username="access";
$password="access";
$hostName="mysql51-dmz-pro”;
if (!($connection = @ mysql_pconnect
($hostName, $username, $password)))
showerror();
?>
if ($format ne "text") {
$tmpl = HTML::Template->new
(filename => "search_return1.tmpl");
$tmpl->param
(search_table=>@result_presentation);
print header;
print $tmpl->output;
}
php
perl
16. Legacy
bugs
16
Choose
opDon
to
return
plain
text...
Returns
error
text
instead...
SoUware
error:
HTML::Template-‐
>output()
:
fatal
error
in
loop
output
:
HTML::Template
:
Ajempt
to
set
nonexistent
parameter
'elem_conc2'
-‐
this
parameter
name
doesn't
match
any
declaraDons
in
the
template
file
:
(die_on_bad_params
=>
1)
at
/usr/
local/packages/perl-‐5.16.1/lib/5.16.1/
HTML/Template.pm
line
3340.
at
/
opt/www/arabidopsis/cgi-‐bin/
arabidopsis/qpcr/SingleSearch
line
179.
For
help,
please
send
mail
to
the
webmaster
(helpdesk@jcvi.org),
giving
this
error
message
and
the
Dme
and
date
of
the
error.
17. Legacy
databases
expression
stats
reporter
images
For
each
locus
+
“Dssue”
• Absolute
expression
• RelaDve
expression
Metadata
per
image
• Line
ID
• PO
code
• Locus
ID
(free
text)
17
18. Desired
Improvements
• Break
the
monolith
– Separate
the
data
access
from
the
presentaDon
app
– Expose
the
data
with
documented
RESTful
web
services
• Enable
dynamic
interacDon
– Allow
table
interacDon
• e.g.
bujons
for
“next
page”
and
“sort
by
this
column”
– Allow
query
refinement
or
hide
&
expose
• Improve
programmaDc
accessibility
and
interoperability
– Expose
a
documented
HTTP
GET
API
taking
parameters
in
the
URL
– Provide
precise
means
to
supply
mulDple
accessions
• was
using
human-‐readable
text
formats
– Translate
anDquated
ID
formats
to
AGI
• was
mixing
TAIR
and
pre-‐TAIR
accessions,
like
“AT1G33930.1,
AT.CHR4.7.322”
– Use
precise
ontological
terms
• was
mixing
Dssue
and
condiDon
(“Leaf”
and
“NaCl”)
within
the
“Dssue”
ajribute
18
19. Araport
to
the
Rescue!
• Arabidopsis
InformaDon
Portal
(AIP)
– A
5yr
project
funded
by
NSF
(US)
and
BBSRC
(UK)
at
end
of
2013
• First
two
years:
build
a
prototype
to
prove
feasibility
• Next
three
years:
provide
producDon
quality
services
for
Arabidopsis
community
– Mission
to
build
a
sustainable
community
web
portal
• Sustainability
is
based
on
community-‐contributed
modules
• Module
=
your
data
+
your
code
+
AIP
infrastructure
• AIP
implements
data
federaDon
not
data
warehousing
• Infrastructure
that
is
light
weight,
scalable,
reproducible
• Araport.org
– Went
live
in
2014
with
2
main
apps:
ThaleMine,
JBrowse
– Now
exposing
web
services
• AIP
services
backed
by
Araport
apps
• External
services
registered,
exposed,
and
mediated
through
AIP
ADAMA
– Ready
to
provide
app
services
• app
registry
and
hosDng
for
developers
• app
store
and
workspaces
for
users
19
20. Araport
DB
Moving
RT-‐qPCR
to
Araport
Science
App
JCVI
Modify
JCVI
CGI
to...
• accept
GET,
parameters
in
the
URL
• return
JSON
Use
Araport
to...
• Install
a
Python
mediator
• Design
an
AIP-‐compliant
API
• Install
a
JavaScript
science
app
20
WebService
at
JCVI
REST
API
REST
Mediator
API
WebService
at
Araport
App
Store
21. Developer
Choices
• Wrap
legacy
system
– EnDre
system
must
remain
a
black
box
– Use
it
as
back
end
for
new
service
• An
Araport
mediator
might
convert
new
URL
to
old
HTTP
POST
• An
Araport
mediator
might
extract
new
JSON
from
old
HTML
• Re-‐engineer
the
legacy
system
– Use
an
all
new
database
(e.g.
Oracle-‐>mySQL)
or
– Use
an
all
new
server
technology
(e.g.
CGI
-‐>
EJB)
• Update
legacy
system
– Legacy
databases
are
sDll
workable
and
– Legacy
code
is
available
and
can
be
extended
21
22. Our
Choice
• Modify
the
legacy
server
– Leave
the
HTML-‐based
system
at
JCVI
– Add
a
RESTful
URL-‐to-‐JSON
web
service
at
JCVI
• Register
an
Adama
mediator
at
Araport
– Expose
a
documented,
AIP-‐compliant
web
service
– Use
the
Araport.org
base
URL
• Submit
a
science
app
to
Araport
– Use
JavaScript,
jQuery,
DataTables
22
23. Web
Service
at
JCVI
• HTML
form
– DocumentaDon
only
– Form
accepts...
• one
locus
ID
• one
condiDon
• one
output
format
– Form
submits
• HTTP
GET
• URL
exposed
parameters
• Deployed
at
JCVI
23
hjp://www.jcvi.org/arabidopsis/qpcr/
MinimalForm_ExpressionPerGenePerTissue.html
24. Web
Service
at
JCVI
• New
URL
– endpoint
replace
forms
– parameters
in
the
URL
• New
return
type:
JSON
– This
addiDon
required
a
few
lines
of
code
to
the
server
side
perl
• Support
legacy
returns:
– HMLT
– Text
as
csv
• Deployed
at
JCVI
24
hjp://www.jcvi.org/cgi-‐bin/arabidopsis/qpcr/
ExpressionPerGenePerTissue?
gene=AT1G33930.1&Dssue=Leaf&format=json
30. Web
devel
technology
stack
• HTTP,
HTML,
DOM,
JavaScript,
AJAX
• jQuery/jQueryUI/DataTables/Dojo/Moo
– programming
libraries
wrijen
in
JavaScript
• AngularJS/EmberJS/BackBone/GWT
– applicaDon
frameworks
to
help
write
big
web
apps
• Bower
– dependency
manager
• Yeoman
– scaffolding
tools
help
developers
generate
web
apps
• Grunt
– interacDve
development
environment
• Git
– version
control
and
publishing
30
34. Araport
Architecture
CLI
clients,
Scripts,
3rd
party
applicaDons
Agave
Enterprise
Service
Bus
Agave
Services
systems
apps
jobs
profile
meta
files
Physical
resources
HPC
|
Files
|
DB
Araport
API
manage
Manager
enroll
a b c d e f
AIP
&
3rd
party
data
providers
• Single-‐sign
API
Mediators
• Simple
proxy
• Mediator
• Aggregator
• Filter
on
• Throjling
• Unified
logging
• API
versioning
• AutomaDc
HTTPS
REST*
REST-‐like
SOAP
POX
Cambrian
CGI
35. AIP
Web
Services
• Backed
by
ThaleMine
(InterMine
soUware)
– Arabidopsis
genome
&
released
annotaDon
(TAIR10)
– General
purpose
API,
unauthenDcated
– User-‐configurable
AIP,
authenDcated
• Expose
your
ThaleMine
Template
Queries
• Backed
by
Tripal
(Drupal
soUware)
– Stock
center
data
– Community
annotaDon
pre-‐release
• JBrowse
tracks
– Many
already
exposed
as
Web
Services
by
EPIC
CoGo
– AIP
tracks
could
be
exposed
via
an
InterMine/JBrowse
adapter
• Backed
by
the
Community
– Provide
organizaDon,
documentaDon,
uniformity
35
36. AjribuDon
&
Provenance
• AjribuDon
mechanisms
– AjribuDon
on
science
apps
– Recursive
ajribuDon
• Provenance
mechanisms
– Provenance
of
data
displayed
– Recursive
provenance
• RecogniDon
mechanisms
– Recognize
new,
exciDng,
and
widely
used
apps
– Promote
creaDvity,
diligence,
sagacity
36
37. Further
Reading
• Portals
– I.A.I.C.
(2012)
Taking
the
Next
Step:
Building
an
Arabidopsis
Informa+on
Portal.
The
Plant
Cell.
– Lamesch
et
al.
(2012)
The
Arabidopsis
InformaDon
Resource
(TAIR):
improved
gene
annotaDon
and
new
tools.
Nucleic
Acids
Research.
– Joshi
et
al.
(2011)
MASCP
Gator:
An
AggregaDon
Portal
for
the
VisualizaDon
of
Arabidopsis
Proteomics
Data.
Plant
Physiology.
• SoUware
– Westesson
et
al.
(2012)
Visualizing
next-‐generaDon
sequencing
data
with
JBrowse.
Briefings
in
Bioinforma-cs.
– Smith
et
al.
(2012)
InterMine:
a
flexible
data
warehouse
system
for
the
integraDon
and
analysis
of
heterogeneous
biological
data.
Bioinforma-cs.
– Lee
et
al.
(2013)
Web
Apollo:
a
web-‐based
genomic
annotaDon
ediDng
pla|orm.
Genome
Biology.
– Kohl
et
al.
(2010)
Cytoscape:
SoUware
for
VisualizaDon
and
Analysis
of
Biological
Networks.
Data
Mining
in
Proteomics.
• Databases
– Eilbeck
et
al.
(2005)
The
Sequence
Ontology:
a
tool
for
the
unificaDon
of
genome
annotaDons.
Genome
Biology.
– G.O.
ConsorDum.
(2008)
The
Gene
Ontology
project
in
2008.
Nucleic
Acids
Research.
– Avraham
et
al.
(2008)
The
Plant
Ontology
Database:
a
community
resource
for
plant
structure
and
developmental
stages
controlled
vocabulary
and
annotaDons.
Nucleic
Acids
Research.
– Lyons
et
al.
(2008)
How
to
usefully
compare
homologous
plant
genes
and
chromosomes
as
DNA
sequences.
The
Plant
Journal.
[EPIC-‐CoGe]
– Brady
et
al.
(2009)
Web-‐Queryable
Large-‐Scale
Data
Sets
for
Hypothesis
GeneraDon
in
Plant
Biology.
The
Plant
Cell.
[BAR]
– Kerrien
et
al.
(2007)
IntAct—open
source
resource
for
molecular
interacDon
data.
Nucleic
Acids
Research.
37
38. Arabidopsis
InformaDon
Portal
Presenta-on
by
Jason
Miller,
JCVI
Funding
by
NSF
(USA)
BBSRC
(UK)
Contribu-ng
ins-tutes
Plant
Genomics
group
at
J.
Craig
Venter
InsDtute
Texas
Advanced
CompuDng
Center,
University
of
Texas
at
AusDn
InterMine
group
at
University
of
Cambridge
TAIR
group
at
Phoenix
Technologies
and
YOU
38