18. Chart Editor
- Visualize multiple data points
to a single view
- Time series data
- Multiple GO terms
- Chart types: Bar, Box, Pie,
Heat Map, Ring
- Part of standard Visual Style
Editor
- Everything will be saved
into session files
27. User Type I
- Average computing skills
- Use Excel as their primary
workbench for data analysis
- For them, bioinformatics
means using some of
NCBI/EBI web tools or
DAVID
- Have tons of data not
analyzed / visualized yet
- Excel is my friend.
28. User Type II
- Advanced computing skills
- Use Python + SciPy /
NumPy, R +
Bioconductor, or
MATLAB every day
- If necessary, write their
own packages
- Use HPC technologies a lot
- Manual operation is evil.
29. Both of them are Important!
- Type I: “Bench Biologists”
- Domain experts
- Data producers
- Type II: Computational Biologists
- Experts of large-scale data analysis
- Especially important for genome-scale
data analysis
They are ignored for a long
time in Cytoscape world…
30. User Type II
- Advanced computing skills
- Use Python + SciPy /
NumPy, R +
Bioconductor, or
MATLAB every day
- If necessary, write their
own packages
- Use HPC technologies a lot
- Manual operation is evil.
31. Requests from Type II Users
- I have 200 networks in my session and I need to create
one PDF per view. How can I do it with Cytoscape?
- I need to use igraph for network analysis, but its
visualization feature is limited. I want to use Cytoscape
as an external visualization engine for R.
- Usually I use IPython Notebook to record my work.
How can I integrate Cytoscape into my workflow?
- I want to generate Style for each time point and create
small multiples of networks.
48. 2014
- Cytoscape 3.2.0: (Modularized) Java Application
- Client applications are migrating to the web browsers
- “Pure” desktop applications are dying slowly…
- Even desktop applications depend on eternal services
- JavaScript everywhere
- Cloud Computing
- Scale-out over scale-up
49. Trend in Software Design
- An application is a collection of smaller services
- JavaScript is a first-class citizen in the world of
programming languages
- Design application with cloud services in mind
51. In the modern era, software is commonly delivered as a
service: called web apps, or software-as-a-service. The twelve-factor
app is a methodology for building software-as-a-service apps that:
• Use declarative formats for setup automation, to minimize time and
cost for new developers joining the project
• Have a clean contract with the underlying operating system, offering
maximum portability between execution environments
• Are suitable for deployment on modern cloud platforms, obviating
the need for servers and systems administration
• Minimize divergence between development and production,
enabling continuous deployment for maximum agility
• And can scale up without significant changes to tooling,
architecture, or development practices.
53. This MANIFESTO counters
current trends in
bioinformatics where
institutes and companies
are creating monolithic
software solutions aimed
mostly at end-users.
57. What I Have Learned…
- Python is becoming the standard
language for “Data Scientists”
- Python itself is a very slow language,
but is a perfect glue
- Lots of tools are made by scientists
(e.g. Anaconda by Continuum)
- They do understand current
problems in modern scientific
computing, and trying to solve them
59. What I Have Learned…
- Data visualization
- Visualization needs varies, especially for
complex data sets like the one from life
science domain
- For that purpose, Java is not the best
language to implement applications
- Even large-scale data visualization
applications are moving to the web browsers
- Canvas (Cytoscape.js), WebGL (Three.js),
SVG (D3.js)
- Most of the talented hackers are working on
the web browsers, i.e., JavaScript
62. Problems in Scientific Computing
- No more free lunch
- Even if you buy expensive machines, you cannot get free performance gain
anymore. You have to design your code for massively distributed
environment. (From Scale-up to Scale-out)
- Complex Data Analysis Pipeline
- Need to build pipeline by connecting multiple resources, or services
- Needs for complex, customized data visualization
- Reproducibility
➡ But building, deploying, and maintaining reproducible pipeline is not
straight-forward
63. What does this mean to biologists?
- “Omics-Scale" Data Analysis
- Need computing power beyond your workstations
- Need to build pipelines by connecting multiple
resources, or services
➡ But developing, deploying, and maintaining
reproducible, or “portable” pipeline is not straight-forward
64. What does this mean to biologists?
- Collaboration between scientists and software
engineers will be more important
- Scientists should spend their time on science, not the
details of JavaScript syntax or how to build large scale
pipeline
- In other words, building bioinformatics computing
environment itself is a research project
65. What does this mean to Cytoscape team?
- Cytoscape should work nicely
with other tools
- All bioinformatics tools should
work as a building block of large
workflows
- In a long term, Cytoscape
should be a collection of
services
70. Srivas, Rohith et al. “Assembling Global Maps of Cellular Function through
Integrative Analysis of Physical and Genetic Networks.” Nature Protocols
6.9 (2011): 1308–1323. PMC. Web. 1 Dec. 2014.
71. Core algorithm 1
as Python
Java Implementation of
Algorithms
Cytoscape 2.x Plugin
Biological
Problem
by Sourav
Cytoscape 3.x App
Core algorithm 2
as Python
Core algorithm n
as Python
PanGIA Service
(Implement in Python again…?)
by Greg, Rohith
by Greg, Rothith and Cytoscape Team
by David
History of PanGIA Application
75. NeXO Web
- Term Enrichment Analysis
- From list of genes, perform
hypergeometric test over set of
machine-generated ontology (NeXO)
terms and display terms with p-values
- It is independent from all other parts of
NeXO Web application
76. NeXO Web RESTful API
Term
Enrichment Service API by Flask
Python Core
SciPy
NumPy
Overview of NeXO Term
Enrichment Service
77. NeXO Web RESTful API
Term
Enrichment Service API by Flask
Python Core
SciPy
NumPy
Overview of NeXO Term
Enrichment Service
78. Option 1: As a Cytoscape App
- Re-implement this algorithm as a Cytoscape App
(Java Application)
- Pros:
- Easy to install
- Cons:
- A lot of work…
- Should be written in Java
- Does not scale-out!
79. Option 2: As a Service
- Wrap existing applications and deploy to platform of users’ choice:
- Laptops, private servers, and commercial cloud services (AWS/Google
Computing Cloud, etc.)
- Pros:
- Scales-out
- Client-independent
- Workflow-friendly
- Cons:
- Need to adopt to the new way of software design
- Relatively more complex deployment
80. Summary
- Best practice: for future applications, implementing
them as services and then call them from Cytoscape,
IPython, RStudio, and other tools
- To make your algorithms available to both Type I
(domain experts) and Type II (hardcore computational
biologists) users, it is better to deploy them as a
service, instead of an App
88. Software Distribution Problem
- “It-worked-on-my-machine” syndrome
- This is a serious problem especially when
you want to share your workflow with
collaborators.
91. What is Docker?
- Container to run applications in an isolated
environment
- Application = Layer of images
- Sharable Environments
- Environments as code
95. We (the NIH) Are Working On, But As
Yet Do Not Have Good Answers To:
1. Today, how much are we actually
spending on data and software related
activities?
2. How much should we be spending to
achieve the maximum benefit to
biomedical science relative to what we
spend in other areas?
Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.
Associate Director for Data Science (NIH)
96. Reproducibility
! Most of the 27 Institutes and Centers of the NIH are
currently reviewing the ability to reproduce research
they are funding
! The NIH recently convened a meeting with publishers
to discuss the issue – a set of guiding principles
arose
Biomedical Research as an Open Digital Enterprise by Philip E. Bourne Ph.D.
Associate Director for Data Science (NIH)
97. The Cytoscape to a Cytoscape
- Shares Core Concepts
- Graph Model
- Table associated with graph
- Style (Collection of visual mappings)
- Implemented as different collection of services
- Desktop Cytoscape
- Interactive network data visualizer on the web
- Optimized for ontology browsing (i.e., future version of NeXO Web)