3. •Microarray data analysis support
•Microarray data analysis support
•Load public microarray data from GEO data from GEO
•Load public microarray
•Store and retrievesaved analyses
•Store and retrievesaved analyses
•Search on gene name,on gene name etc.
•Search disease name, disease name e
•Genomicvariants and VCF support VCF support
•Genomicvariants and
•Load TCGA studies we have accesswe have access to
•Load TCGA studies to
•Load 1000 Genomes1000 Genomes data
•Load data
$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$
9. NotInventedHereSyndrome
Image from Rob Hooft, CTO NetherlandsBioinformatics Centre
http://nothinkingbeyondthispoint.blogspot.nl/2011/11/decision-tree-for-scientific.html
13. Phenotype Database
Written in Grails, supports several types of
omics data, provides data integration and
visualization, has R, Groovy and PHP API’s.
Sounds familiar?
http://phenotypefoundation.org
16. Sofar…
• TranSMART has a huge business potential. It’s
nosilverbulletthough.
• Scientistssometimes have troublereusingeachothers’ work. Especiallywhenit
comes to open source software.
21. Governance of R community
BrianRipley: “The R Project is governedby aselfperpetuatingoligarchy, a groupwith a lot of
power. R was principallydevelopedfor the
benefit of the core team.”
As citedon http://blog.revolutionanalytics.com/2011/08/brian-ripley-onthe-r-development-process.html
23. Galaxy is the most widelyused open
sourcebioinformatics web interface AFAIK.
Probably in nosmallamountthanks to
theircontinuousdedication to
improving the UI.
Butthere’ssomethingelse.
25. • An open source CMS (Content Management
System) written in Python, nowadays backing
thousands of productiongrade websites
• Startedby 2 developers in 2000, nowanactive
open source project withhundreds of
activedevelopers
• In 2004, the Plone Foundation was formed to
formalize IP and secure the future of Plone
• PloneCollective has hundreds of plugins
26.
27. What do all these successstories
have in common?
BioconductorPackages
GalaxyToolshed
PloneCollective
Drupal Modules
30. TranSMARTContributions - Pharma
• Janssen
– Initialversion of tranSMART
– Genomics viewer using IGV and GenePattern
– Faceted Search interface (resultsbrowsing)
• Millenium
– Loading TCGA andmany GEO studies
– R interface forinteractingwith data directly in R
– Several R analyses availabledirectly in GUI
31. TranSMARTContributions - Pharma
• Sanofi
– Cleaner user interface
– Added metadata layerfor all concepts
– Study/Program categorization& file management
• Pfizer
– GWAS upload (VCF), data storage and analysis
– Enhanced data export capabilities
32.
33. This is a mess.
Anotherreasonwhy we needthat
core.
34. Start the Core: I2B2 Refactoring
1. I2B2 was integratedwithtranSMART, but the
I2B2 API abstractionswereleaked all over the
place in the tranSMARTapplication.
2. We agreed in the London meeting that all
partieswould set some time apart
forworkingon the core.
3. Combined, it made sense to start working at
the clinical data API, properlyusing the I2B2
API wherepossible, and re-implement all I2B2
functionality in a new ‘core-db’ plugin.
35. The firstversion of core-integration
was completed half April.
Bythen, all webservice calls to whatformerly
was anoutdatedversion of the I2B2 Ontology
and CRC cells, were handled by the
newlyimplementedcore-dbplugin.
Also, a set of tests was written in the
process and API documentationgenerated.
36. In the long run, I believeforming a
gooddistributedworkinggroupon the
core API is a more important
delivery of this workshop
thancrunching out a stable 1.1
version.
That’show we writethathistory
39. TranSMART’s Strong Points
• Powerful, ready to go user interface
forcommon analyses (survival analysis, gene
expressionheatmapsetc.)
• Leverages i2b2 data model forclinical data and
offers unified view over different studies
• Uses a lot of good open
sourcetechnologyunder the hood (Grails, R,
SOLR, Pentaho)
leveragingexistingcommunitydevelopments
40. TranSMART Building Blocks
• R: open source statistics package with CRAN,
an active repository in which many algorithms
and statistical packages are published
• Grails: a rapid application development
framework in Groovy leveraging Java
technology such as Hibernate, Spring, Quartz
• I2b2: domain specific open source package for
storing and querying clinical data
• GenePattern, maybe soon: Galaxy, KNIME?
41. TranSMART’sWeaknesses
• Largemonolithic codebase
withlittlemodularizationbeyond the
standardGrails MVC setup
• Code quality is problematic, especiallyJavaScript
• Test coverage is low, nofunctional / web tests and
little unit and integration tests
• No clearinternalAPI’s, only a service level that
does the plumbing.
• I2b2 integrationviolates i2b2 abstractions
42. tranSMART Plans
• Use a clearly modularized architecture with
separation of clinical, high dimensional, search
and metadata storage; workflow execution
enginges and knowledge repository
• Define clear API and rewrite current
implementations with good test coverage
• Use i2b2 data model, re-harmonize with latest
i2b2 APIs, and don’t use i2b2 binaries directly
• Separate analysis definitions and abstract from
workflow execution engine
http://prezi.com/t6twshyctdsk/transmart-core-refactoring
44. Further reading
• Description of core API efforts:
http://thehyve.nl/rewiring-transmart
• In depthdescription of i2b2 refactoring:
http://thehyve.nl/inital-work-on-transmarts-core
• Overview of tranSMART Core API sofar:
http://thehyve.github.io/transmart-core-api/
• Example of continuousintegration test suite
(ofcore-db): https://ci.ctmmtrait.nl/browse/TMCOREDB-JOB1-51/test