Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Redbasin Networks: Researching Cancer In the Cloud
- Using Spring, Neo4J, Mongo and Redis
By Smita Kulkarni Gudur and Mano...
Introduction
Smitha Kulkarni Gudur, CEO
Manoj Joshi, CTO
Allan Grimes, VP Business
Neeta Potdar, HR & Admin

Friday, Septe...
Redbasin Networks Overview
Redbasin Networks provides a cloud based platform
for cancer drug researchers in Pharma and Bio...
Cancer Ecosystem Today (It’s complex!)
Drug
Labs

FDA
Certification,
Approval

EPA

Legal
Patients
Instrument
vendors

Hos...
Cancer Market Research
US cancer
spending $108b

1.5m
new
cancer
deaths
89m
deaths
2005-2015

Friday, September 6, 13

Red...
Spring Data: Redbasin Cancer Research

Protein

Spring
Data

Gene

MongoDB

Disease

Neo4j

Drug

Antibody

Redis
Cloud
RE...
Typical Drug Life Cycle Costs

Friday, September 6, 13
Why Not Go Relational?
Oncological meta-data is
multi-dimensional

Unpredictable schemas
during mining

Pervasive joins ar...
Redbasin Core Data Technologies

• Mongo
• Neo4J
• Redis
• Lucene
• HBase/Hadoop
Friday, September 6, 13
Why Mongo?
Lots of XML and JSON documents
Very easy to use
High performance and scalability
Strong Java & REST Support

Fr...
Why Neo4j?
Neo4j is a modern graph database
Very easy to use
Complex features that are used less often have been
dropped
S...
How does Redbasin use Neo4J


We have 225 oncology dimensions

 Everything

either a node or relationship or a

property...
Numerous dim and sub-dim in Redbasin’s big data
DI
Protein

Epigenetics

Gene

Ontology

Disease

Structure

Drug

Antibod...
Dimensions have sub-dimensions
DI

Pharmacodynamics

Absorption

Distribution

Metabolism
Sub-dimensions

Friday, Septembe...
Data is Logical. But Big Data is not.
Data is more than
just data

Real-time lookups

DI

Impressive
computational
abiliti...
No enterprise! Just plain cloud...
DI

AOP
TX

Friday, September 6, 13
Perhaps a Nebula(e), but why?
DI

AOP
TX

Friday, September 6, 13

•Contextual correlation
•Ontology driven
•Multi-dimensi...
How does Redbasin use Spring Data
Redbasin Cloud Connects to hundred’s of cancer data
sources
Redbasin uses contextual m...
Neo4J Node Index Example
IndexHits <Node> pNodeHits = drugIdIndex.get(DRUG_ID,
drugConceptCode);
if (pNodeHits != null && ...
Spring Stack: Spring Data with Mongo JSON
"@molecule_type" : "complex",
"@id" : "208314",
"Name" : {
"@name_type" : "PF",
...
Spring Data with Mongo Objects
Redbasin data
grows
and changes
over time

Collection ideal for
Redbasin’s unstructured
Dat...
Spring Data: Redis

Value
Key

Usage:
Ontologies & Taxonomy for unique key value pairs.
In auto completion as our data is ...
Redis - Ontology Lookups

Ontology
Lookups
Can Be
Very
Handy

Friday, September 6, 13
Redis - Analytics Cache

MineBot and Multi-entity Analytics is Nifty

Friday, September 6, 13
Redis - Managing Aliases

Gene Aliases for Instance are Numerous

Friday, September 6, 13
Redis - Key Value Pairs
Key

Value

ATP

Adenosine Tri-phosphate

Large Number of Key Value Pairs

Friday, September 6, 13
Redis - Slaves

Redis Slaves
Simply Work

Friday, September 6, 13
Redis - Monitor

https://github.com/nkrode/RedisLive

Friday, September 6, 13
Redis - Subgraph Caching

•Subgraph Similarity Analytics
•Pathway Rules Cache

Friday, September 6, 13
Redis - Spring data

• Using connection package Jedis
• Spring’s data access exception for redis driver
• Built abstractio...
Spring Data: Redis Usage
Key
NCBI_TAXONOMY_ID
Key: 9606

Homo Sapien

DISEASE_CODE
Key: x46859

Metastases from colorectal...
Redbasin vs Other BioModels
Redbasin

Other BioModels

Focused on Oncology

No focus on any specific Disease

Commercial/p...
Neo4J Node Validation
binds-to

Bcl-2 Protein

Beclin 1 Gene

inhibits

Apoptosis

Biologically aware nodes and relationsh...
Spring Data Relationship Entity
Metadata for
recognition of a
relationship
class

Annotation for
@RelationshipE
ntity
@Rel...
Relationships always have start/end nodes
@RelationshipEntity
public class BioRelation {
@EndNode
private Object endNode;
...
Spring Data Relationship Entity
@RelationshipEntity
public class BioRelation {
.....
@GraphId
private Long id;
.....
}

• ...
Spring Data Relationship Entity
@RelationshipEntity
public class BioRelation {
.....
@RelProperty
private String name;
......
Spring Data Relationship Entity
@RelationshipEntity
public class BioRelation {
....
@RelType
private String relType;
@RelP...
Spring Data Relationship Entity
@RelationshipEntity
public class BioRelation {
@EndNode
private Object endNode;
@StartNode...
Spring Data-isms
@Retention(RetentionPolicy.RUNTIME)
public @interface BioEntity {
public BioTypes bioType();
}

Retention...
Spring Data-isms Neo4j
Retention(RetentionPolicy.RUNTIME)
public @interface RelatedTo {
public Direction direction() defau...
Bio Entity
@Retention(RetentionPolicy.RUNTIME)
public @interface BioEntity {
public BioTypes bioType();
}
• This is usuall...
End Node annotation
package com.redbasin.bio.meta;
@Retention(RetentionPolicy.RUNTIME)
@Target({ ElementType.ANNOTATION_TY...
Redbasin Open Doc Share
https://github.com/redbasin/redbasin-org

• It’s our “social taxonomy” for scientific documents

•...
What can developers do?

• Help us with development of our public domain API
• We support Jquery, d3js, JSON/XML, REST and...
Redbasin Cloud Projects

Open Stack Project
Cloud Foundry Integration
AWS Project

Friday, September 6, 13
Why have Java developers chosen Spring?
J(2)EE usability

Deployment
Flexibility

DI

Powerful Service
Abstractions

Appli...
Spring
Deploy to Cloud
or on premise

Big,
Fast,
Flexible
Data
GemFire

Friday, September 6, 13

Core
Model
Web,
Integrati...
Spring Stack
Spring Data

Spring for Apache Hadoop

Redis

HBase

GemFire

JPA

QueryDSL

HDFS

MapReduce

Hive

MongoDB

...
Learn More. Stay Connected.

Contact Redbasin: bit.ly/redbasin
<related sessions>

Talk to us on Twitter: @springcentral
F...
Prochain SlideShare
Chargement dans…5
×

Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

1 644 vues

Publié le

Speakers: Smitha Gudur and Manoj Joshi
Cancer/life science drug research models are very rich in relationships, relationship heterogeneity and entity inter-dependencies. Most entity metadata is dynamic and unpredictable making it difficult to fit such models in traditional relational landscape. Redbasin Networks uses a hybrid Nosql strategy that supports composite and rich document metadata that is interconnected pervasively. Cancer and life science data is excessively nested. You will find this useful if you are building complex engineering and/or scientific applications, and need insights on how to merge data from many diverse data-sets and map it to an intuitive and effective graph database model.
We will show using code examples how complex metadata can be engineered using Spring, Neo4J and Mongo, to create useful drug insights for the drug researcher, and also provide a platform for technologists to build sophisticated life science applications.

Publié dans : Technologie, Formation
  • Soyez le premier à commenter

Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis In the Cloud

  1. 1. Redbasin Networks: Researching Cancer In the Cloud - Using Spring, Neo4J, Mongo and Redis By Smita Kulkarni Gudur and Manoj Joshi © 2013 SpringOne 2GX. All rights reserved. Do not distribute without permission. Friday, September 6, 13
  2. 2. Introduction Smitha Kulkarni Gudur, CEO Manoj Joshi, CTO Allan Grimes, VP Business Neeta Potdar, HR & Admin Friday, September 6, 13
  3. 3. Redbasin Networks Overview Redbasin Networks provides a cloud based platform for cancer drug researchers in Pharma and Bio-tech. Redbasin is a scalable technology and platform that allows Life Science researchers to gain insights about viable drug molecules and pathways. Friday, September 6, 13
  4. 4. Cancer Ecosystem Today (It’s complex!) Drug Labs FDA Certification, Approval EPA Legal Patients Instrument vendors Hospitals, Treatment Centers Insurance Biotech Labs Pharma Lab tests Contract Research Organizatio n CDC NIH/NLM 4 Friday, September 6, 13 Universities
  5. 5. Cancer Market Research US cancer spending $108b 1.5m new cancer deaths 89m deaths 2005-2015 Friday, September 6, 13 Redbasin Networks 10% of top 200 drugs cancer related generate $1b/yr
  6. 6. Spring Data: Redbasin Cancer Research Protein Spring Data Gene MongoDB Disease Neo4j Drug Antibody Redis Cloud REST API XML Friday, September 6, 13 JSON Ligand Complex HBase Epigenetics Lucene
  7. 7. Typical Drug Life Cycle Costs Friday, September 6, 13
  8. 8. Why Not Go Relational? Oncological meta-data is multi-dimensional Unpredictable schemas during mining Pervasive joins are a drag on performance Temporality is difficult to represent Friday, September 6, 13
  9. 9. Redbasin Core Data Technologies • Mongo • Neo4J • Redis • Lucene • HBase/Hadoop Friday, September 6, 13
  10. 10. Why Mongo? Lots of XML and JSON documents Very easy to use High performance and scalability Strong Java & REST Support Friday, September 6, 13
  11. 11. Why Neo4j? Neo4j is a modern graph database Very easy to use Complex features that are used less often have been dropped Strong Java & REST Support Friday, September 6, 13
  12. 12. How does Redbasin use Neo4J  We have 225 oncology dimensions  Everything either a node or relationship or a property  We Friday, September 6, 13 use indexes liberally
  13. 13. Numerous dim and sub-dim in Redbasin’s big data DI Protein Epigenetics Gene Ontology Disease Structure Drug Antibody PD/PK Instrument Ligand Method Pathway Enzyme Aminoacid Organism Interaction Physicochemical TX FDA Friday, September 6, 13 Research Experiment Pharma ClinicalTrial Researcher Institute Time Location
  14. 14. Dimensions have sub-dimensions DI Pharmacodynamics Absorption Distribution Metabolism Sub-dimensions Friday, September 6, 13 Principal Dimension (What drug does to body?) TX Elimination Toxicity
  15. 15. Data is Logical. But Big Data is not. Data is more than just data Real-time lookups DI Impressive computational abilities Logical AOP Asymptotic convergence to human Friday, September 6, 13 TX Understands human ideosyncracies
  16. 16. No enterprise! Just plain cloud... DI AOP TX Friday, September 6, 13
  17. 17. Perhaps a Nebula(e), but why? DI AOP TX Friday, September 6, 13 •Contextual correlation •Ontology driven •Multi-dimensional •Hierarchical •Fractal like •Clustering •Dynamic/Evolving •Stars(facts) are born •Zoom for details •Humongous •Transparency •Dynamic metadata* •Interconnected •Graph like •Complexity
  18. 18. How does Redbasin use Spring Data Redbasin Cloud Connects to hundred’s of cancer data sources Redbasin uses contextual mining to create dynamic models We map nodes, relationships, attributes to Redbasin Object Model We separate analytics from queries Friday, September 6, 13
  19. 19. Neo4J Node Index Example IndexHits <Node> pNodeHits = drugIdIndex.get(DRUG_ID, drugConceptCode); if (pNodeHits != null && pNodeHits.size() > 0) { // if node already exists drugNode = pNodeHits.getSingle(); if (drugNode != null) { if (!drugNode.hasProperty(DRUG_CONCEPT_CODE)) { drugNode.setProperty(DRUG_CONCEPT_CODE, drugConceptCode); } if (!drugNode.hasProperty(BioEntityTypes.NODE_TYPE)) { drugNode.setProperty(BioEntityTypes.NODE_TYPE, BioEntityTypes.RB_DRUG); } } } Friday, September 6, 13
  20. 20. Spring Stack: Spring Data with Mongo JSON "@molecule_type" : "complex", "@id" : "208314", "Name" : { "@name_type" : "PF", "@long_name_type" : "preferred symbol", "@value" : "TXA2/TP beta/beta Arrestin3/RAB11/GDP" }, "ComplexComponentList" : [ { "@molecule_idref" : "202489" }, { "@molecule_idref" : "202493", "PTMExpression" : [ { "@protein" : "O75228", "@position" : "239", "@aa" : "C", "@modification" : "palmitoylation" } Friday, September 6, 13
  21. 21. Spring Data with Mongo Objects Redbasin data grows and changes over time Collection ideal for Redbasin’s unstructured Data Friday, September 6, 13 Retrieve nested objects with ease participantList.experimentalRoleList.experimentalRole.xre f.secondaryRef.@db" : "pubmed" DBObject utilities well suited for mapping to BioEntities
  22. 22. Spring Data: Redis Value Key Usage: Ontologies & Taxonomy for unique key value pairs. In auto completion as our data is “N” column based Friday, September 6, 13
  23. 23. Redis - Ontology Lookups Ontology Lookups Can Be Very Handy Friday, September 6, 13
  24. 24. Redis - Analytics Cache MineBot and Multi-entity Analytics is Nifty Friday, September 6, 13
  25. 25. Redis - Managing Aliases Gene Aliases for Instance are Numerous Friday, September 6, 13
  26. 26. Redis - Key Value Pairs Key Value ATP Adenosine Tri-phosphate Large Number of Key Value Pairs Friday, September 6, 13
  27. 27. Redis - Slaves Redis Slaves Simply Work Friday, September 6, 13
  28. 28. Redis - Monitor https://github.com/nkrode/RedisLive Friday, September 6, 13
  29. 29. Redis - Subgraph Caching •Subgraph Similarity Analytics •Pathway Rules Cache Friday, September 6, 13
  30. 30. Redis - Spring data • Using connection package Jedis • Spring’s data access exception for redis driver • Built abstraction - Redis template • Not using pubsub support • Using our our own JSON/XML mapping serializers • Atomic counter for redis - useful • Sorting (using) and pipelining (not using) • Not using 3.1 spring cache abstraction Friday, September 6, 13
  31. 31. Spring Data: Redis Usage Key NCBI_TAXONOMY_ID Key: 9606 Homo Sapien DISEASE_CODE Key: x46859 Metastases from colorectal carcinoma HGNC_ID (Human Gene Identifier) Key: 1817 Friday, September 6, 13 Value CEACAM5
  32. 32. Redbasin vs Other BioModels Redbasin Other BioModels Focused on Oncology No focus on any specific Disease Commercial/public domain correlations Information density is “infinite” Focused on academic knowledge Temporality/pathway dependent No time element Hybrid vendor strategy No co-existence scenario One cloud for all Oncology Typically downloadable software Friday, September 6, 13 Information size is “infinite”
  33. 33. Neo4J Node Validation binds-to Bcl-2 Protein Beclin 1 Gene inhibits Apoptosis Biologically aware nodes and relationships Friday, September 6, 13
  34. 34. Spring Data Relationship Entity Metadata for recognition of a relationship class Annotation for @RelationshipE ntity @RelationshipEntity public class BioRelation { } Convenient relationship abstraction Friday, September 6, 13
  35. 35. Relationships always have start/end nodes @RelationshipEntity public class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; } • A unique field must be marked as @EndNode • A unique field must be marked as @StartNode • Field can be any variable name • Flexibility for the programmer • Must be @BioEntity class Friday, September 6, 13
  36. 36. Spring Data Relationship Entity @RelationshipEntity public class BioRelation { ..... @GraphId private Long id; ..... } • Id of the relationship • This is an unreliable field • But we have it hear for reference Friday, September 6, 13
  37. 37. Spring Data Relationship Entity @RelationshipEntity public class BioRelation { ..... @RelProperty private String name; .... } • @RelProperty tells if this is a property • There could be non-property fields • The property here is “name” • It’s always a String Friday, September 6, 13
  38. 38. Spring Data Relationship Entity @RelationshipEntity public class BioRelation { .... @RelType private String relType; @RelProperty private String message; } • @RelType is the actual relation • message is a default @RelProperty Friday, September 6, 13
  39. 39. Spring Data Relationship Entity @RelationshipEntity public class BioRelation { @EndNode private Object endNode; @StartNode private Object startNode; @GraphId private Long id; @RelProperty private String name; @RelType private String relType; @RelProperty private String message; } Friday, September 6, 13
  40. 40. Spring Data-isms @Retention(RetentionPolicy.RUNTIME) public @interface BioEntity { public BioTypes bioType(); } Retention(RetentionPolicy.RUNTIME) public @interface RelationshipEntity { } Friday, September 6, 13
  41. 41. Spring Data-isms Neo4j Retention(RetentionPolicy.RUNTIME) public @interface RelatedTo { public Direction direction() default Direction.BOTH; BioRelTypes relType() default BioRelTypes.DEFAULT_RELATION; public Class<?> elementClass() default Object.class; public BioTypes endNodeBioType() default BioTypes.UNKNOWN; public BioTypes startNodeBioType() default BioTypes.UNKNOWN; } Friday, September 6, 13
  42. 42. Bio Entity @Retention(RetentionPolicy.RUNTIME) public @interface BioEntity { public BioTypes bioType(); } • This is usually a node in Neo4J • @Retention - How long to retain annotations? • CLASS - Annotations are to be recorded in the class file by the compiler but need not be retained by the VM at run time. • RUNTIME - Annotations are to be recorded in the class file by the compiler and retained by the VM at run time, so they may be read reflectively. • SOURCE - Annotations are to be discarded by the compiler. Friday, September 6, 13
  43. 43. End Node annotation package com.redbasin.bio.meta; @Retention(RetentionPolicy.RUNTIME) @Target({ ElementType.ANNOTATION_TYPE, ElementType.FIELD }) public @interface Reference { } @Retention(RetentionPolicy.RUNTIME) @Target({ElementType.FIELD,ElementType.METHOD}) @Reference public @interface EndNode { } • There is no concept of start and end nodes in Neo4J • This is a Redbasin abstraction • The @Reference can be used by annotation types and fields only • The annotation @EndNode can be used by methods and fields only • It cannot be used by classes or other elements Friday, September 6, 13
  44. 44. Redbasin Open Doc Share https://github.com/redbasin/redbasin-org • It’s our “social taxonomy” for scientific documents • github community project • Scientists can collaborate over zillions of documents and media • Downloadable code, can run in cloud mode • Can be modified to support any data access • Redbasin.org uses it for collaboration in schools • A Spring champion cause, underprivileged schools Friday, September 6, 13
  45. 45. What can developers do? • Help us with development of our public domain API • We support Jquery, d3js, JSON/XML, REST and more • We support Android, iOS on mobiles/tablets • Spring data integration - developer plugins Friday, September 6, 13
  46. 46. Redbasin Cloud Projects Open Stack Project Cloud Foundry Integration AWS Project Friday, September 6, 13
  47. 47. Why have Java developers chosen Spring? J(2)EE usability Deployment Flexibility DI Powerful Service Abstractions Application Portability Friday, September 6, 13 AOP Core Model TX Testable, lightweight model for programming
  48. 48. Spring Deploy to Cloud or on premise Big, Fast, Flexible Data GemFire Friday, September 6, 13 Core Model Web, Integration, Batch
  49. 49. Spring Stack Spring Data Spring for Apache Hadoop Redis HBase GemFire JPA QueryDSL HDFS MapReduce Hive MongoDB Neo4j Solr JDBC Splunk Pig Cascading SI/Batch Google App Eng. AWS Beanstalk Spring Batch Spring Social Spring Integration Twitter Heroku Spring AMQP Spring Web Services Facebook Spring XD Spring Web Flow LinkedIn Cloud Foundry OpenShift Spring Security Spring Security OAuth Spring Framework DI AOP TX JMS JDBC ORM OXM Scheduling MVC REST HATEOAS JMX Testing Caching Profiles Expression JTA JDBC 4.1 JMX 1.0+ Tomcat 5+ GlassFish 2.1+ WebLogic 9+ WebSphere 6.1+ Friday, September 6, 13 Java EE 1.4+/SE5+ JPA 2.0 JSF 2.0 JSR-250 JSR-330 JSR-303
  50. 50. Learn More. Stay Connected. Contact Redbasin: bit.ly/redbasin <related sessions> Talk to us on Twitter: @springcentral Find session replays on YouTube: spring.io/video Friday, September 6, 13

×