The introduction of Instant JChem and underlying ChemAxon technologies, along with a new data infrastructure designed with analytics in mind, has provided a platform with significantly more flexibility in bringing chemistry and data to the scientist’s desktop. We will discuss the architecture we evolved to and the myriad of new use cases supported by an improved data flow and new ways of looking at the data that have improved decision making, design, and collaboration in drug discovery.
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
USUGM 2014 - Dana Vanderwall (Bristol-Myers Squibb): Instant JChem
1. Instant JChemInstant JChem -- enabling newenabling new
ways of working with data andways of working with data andways of working with data andways of working with data and
access to new data to work withaccess to new data to work with
Dana Vanderwall
Bristol-Myers Squibb
Research Information Technology & Automation
Chemaxon US UGM, Sept 2014
1
2. Initial State in Chemistry AnalyticsInitial State in Chemistry Analyticsy yy y
CDR
SI FormsSI Forms
KnowledgeKnowledge
•Annotation
•Folks-onomies
Additional dataAdditional data
Manual copyManual copy
& paste,& paste,
typingtyping
SI FormsSI Forms
HPLC log P vs. rat Vds
y = 0.0344x + 3.886
R2
= 0.2737
4.00
4.50
5.00
5.50
logP
ExcelExcel
•Folks-onomies
VisualizationVisualization
ExportExport
2.00
2.50
3.00
3.50
0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00
Vds
HPLC
Scatter PlotScatter Plot
Master Spreadsheet (Excel, Word)Master Spreadsheet (Excel, Word)
Additional chemical structure analyses:
Rat Pct BoundRat Pct Bound
ExportExport
Manual copy & pasteManual copy & paste
2
Additional chemical structure analyses:
•SAR R-group analysis
•Clustering (CADD and in-house solutions)
•Predictive models (HERG, Solubility, Permeability; FACT)
compoundcompound
structures, IDsstructures, IDs
3. The DARE ProjectThe DARE Project
(Data & Analytics for Research)(Data & Analytics for Research)(Data & Analytics for Research)(Data & Analytics for Research)
Simplify.
• Replace legacy app/workflow… with integrated tools for analytics
• Decrease stand alone docs/reports
• Put any needed calculations & predicted properties where they’re
neededneeded
Modernize
• A new product that maintains the functionality of form view…
• Plus a richer set of views, tables, in-place conditional formatting,
graphs, & more chemistry functionality
• Learn by doing; established base camp in 1st yr, then ramped up
Phased approach to dev. & migration
y g y
• Gradually phasing in IJC over 2013-2014
3
4. DARE technology mapDARE technology map
User interface
gy pgy p
Drill down: web service for
conc. response curves &
secondary results
SOLR Index for text queries
(IBM Patent DB only)
Data Alerts
Annotations
Data Marts:
New data layer
for access &
integration
Lead
Evaluation
PAMPACellular CYP Inh
DWG A
Enzyme
DWG A
Cellular
DWG B
Receptor
DWG B
Cellular
Data common to most DWGs Data unique to a DWG
ss
Evaluation
Profiling:
Enzymes
MetStab CYP
Induction
DWG A
selectivity
DWG B
Selectivity
InformaticaInformatica
Operational
Screening
BioBook
calculated fields
Chemical
structures,
properties,
calculated fields
Meta DataAnnotation
WebWeb
servicesservices
Central Data
Repository (CDR)
4
6. Start with the basics & build upStart with the basics & build uppp
Foundation
• Program Specific Forms and use cases
• Universal Forms (profiling platforms or compilations of data commonly
used)
Extended use cases
• Use cases requiring bespoke data structures, scripting, or visualization
• Unique data sources, combinations of data, all biological data
• Hooks into internal web services: drill down for curves/secondary data
Extended functionality
6
Hooks into internal web services: drill down for curves/secondary data
• Query to SOLR index
• Data Alerts
7. DatamartDatamart InfrastructureInfrastructure
General
• ETL from primarily CDR, some additional sources
• Provides environment to create tables & other data structures for IJC
• Tables in IJC not enormously popular with users
• Comfort and orientation with data in text box, fixed in position on form
‘Cell Factory’
, p
• Cell = entity in oracle that effectively provides the data for one assay;
CDR queries sometimes require complex set of conditions
• Captures metadata associated with cell creation, keeps them unique,
etcetc
Incremental updates
7
• Via Informatica Power Center, 15-30 min incremental updates
• Gentle failure in face of long running jobs
8. Data management v1Data management v1gg
BA catalogs data required
for new project teamfor new project team
Passes it to DB
Manually:
• New tables/entities
promoted to IJC
• New data tree created
• Build Form
• Add newPasses it to DB
developer to define
new ETL
New data tree created
• Build edges cells/columns to form
IJCIJC
FormsForms
DAREDARE
D t tD t t
IJC SchemaIJC SchemaCDRCDR
ETL
FormsForms
Data martData mart
Manual coding/scripting
Rate determining step
8
Rate determining step
DB development not self documenting
9. Automated data managementAutomated data managementgg
UserUser
• UI to search/define/create cells, tables, calc. fields
• Consumes metadata & creates meta data ‘cell’
definitioncreatescreates
cell/tablecell/table
definition
• Promotes the new table / new fields into IJC
• If it’s a new entity then
o Creates a new data Tree using a data tree Template
Add th T bl t th d t t
MetadataMetadata
UIUI
RepositoryRepository
MetadataMetadata
RepositoryRepository
o Adds the new Table to the new data tree
o Create a new form on the new data tree
• Creates edges
Auto PromotionAuto PromotionETL
IJCIJC
FormsForms
DAREDARE
Data martData mart
IJC SchemaIJC SchemaCDRCDR
ETL
Promote QueuePromote Queue
Data martData mart
• Creates tables & columns
immediately upon cell ‘activation’ 9
10. ScaleScale
Instant JChem
• 1455 forms + Grids
• 288 saved queries
Data
• 211 data trees
• 526 ‘entities’
Traffic
• 631 users (to date)
• 1000-2000 db
• 474 saved lists
• 10 scripts
• 8 schema
• 2400 assays
• 41,571 ‘fields’
connections daily
10
11. The flexibility of [datamart + IJC] have enabled
solutions well beyond the standard ‘program’ formsolutions well beyond the standard program form
IBM
PatentExternal data source
Novel & multiple data
structures & presentationsPatent
DB
HTMetabolite
structures & presentations
IJC
MutagenesisDB
Visualizations;
Integration of custom
scripts & calculations
Datamarts
Chiral
Alliance
D t
scripts & calculations
Chiral
Separations
Drug Safety
Data
AccessIntegration active &
historical of BMS data
Drug Safety
Warehouse
Integration of BMS data not in the CDR
11
12. HighHigh--Throughput Mutagenesis: SAR, but differentThroughput Mutagenesis: SAR, but differentgg g p gg p g
• Lead Evaluation Applied Genomics Research IT & Automation ComputerLead Evaluation, Applied Genomics, Research IT & Automation, Computer
Aided Drug Design designed & built cloning and screening platform
• >150 mutants, testing >30 compounds
12
13. A Different DataA Different Data ScaleScale
For each cmpd compare WT to 150 mutants For 30 compounds
13
14. Endpoint variation over mutants by compoundEndpoint variation over mutants by compound
Datamart
• Mutagenesis datamart created drawing on data from 2 operational data sources
• ‘Mart generation automated & refreshed as new data is available
• DataMart structure is heavily augmented based on the need of Instant JChem(IJC)y g ( )
• Utilize IJC’s flexible entity relationship model & charting fxns to aid data visualization
14
15. All compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutantsAll compounds per endpoint variation over mutants
• Offered summary birds-eye view on all compounds by each individual
result type (EC50, WTRATIO, KBWTRATIO etc) to identify trendsresult type (EC50, WTRATIO, KBWTRATIO etc) to identify trends
• Compound as column header- a novel pivot
15
16. Shift workload from queryquery & discovering todiscovering to
alertingalerting & reportingreportinggg p gp g
Define what the teams want to monitor
Automate the delivery of new data packages
Base case: Go find the data & construct analysis
Open SI
Forms
Open form query
SelectSelect
data
Export
data
Import
data
Table or
visualization
Table or
visualization
Is my data there yet?
Q d t
New capability: Push data alerts
Is my data there yet?
Is my data….
Instant
JChem with
new data
Spreadsheet &
link to open
project-form-list
in IJC
Datamart
Query data
source
in IJCin IJC
Automated email to user
when new data
User data alert
parameters
16
17. Alert manager (internal GWT), 2Alert manager (internal GWT), 2--wayway
integration with IJCintegration with IJCgg
• Grab active data tree ID and bring it to alert tool
• Take all the assays under the data tree as selection
source for data alert
St thi i f ti d t h it i t DMART• Store this information and match it against DMART
• Create hit list using compound ID/lot ID as
‘permanent list’.
• Send the link to subscribers
17
18. What do the users think about all this?What do the users think about all this?
• Change is never easyg y
• Sub-populations are attracted to new capabilities and
adopt new tools and practices
• Others need more encouragement; stability is critical
• Maintaining the capabilities of the familiar and well
understood in the new environment a pre requisite forunderstood in the new environment a pre-requisite for
complete migration
• We’re getting thereg g
18
19. Legacy application usage vs. Instant JChem
Unique Users per Month
Announcement of SI
Forms retirement
700
800
900
500
600
DARE
SI
200
300
400
SI
0
100
200
Mar 2014 Apr 2014 May 2014 Jun 2014Mar 2014 Apr 2014 May 2014 Jun 2014
19
20. Reduced number of data sets exported for analysis
Number of Data Exports per month
2600
2700
Number of Data Exports per month
2400
2500
2300
2400
2100
2200
1900
2000
2014 - MAR 2014 - APR 2014 - May 2014 - Jun
20
21. Monitor URL Sharing in IJC
70
80
Launched URLs
50
60
70
30
40 Total Form URL
List URLs
Query URLs
10
20
0
1 2 3 4 5 6
2014
21
22. a moment for reflectiona moment for reflection
cause for dancing
• Conditional formatting!
what we learned
• Train just in time
coaching
• More thorough regressionConditional formatting!
• Grid view
• Query builder
• Query/browse
performance!
Train just in time
• Listen; listen some more
• STOP the presses if it’s not
right- they’d rather wait
• Simple >> rich
More thorough regression
testing
• Clearer release notes
• Login/start-up
performanceperformance!
• Tabbed panes
• URL sharing*
• Help from CXN!!
• Simple >> rich
• Provide a thread of
continuity to lead through
new tools
• Don’t disrupt the
performance
• List query result retains
original order
• Cleaner Excel export, keep
structure orientation• Don t disrupt the
workflow, let it evolve
structure orientation
• More conversations!
• Web services
• Plexus! 22
23. The DARE teamThe DARE team
Heather Artman Dong Li
Acknowledgements
Scientific Computing
Core Team
Heather Artman
Dawn Cohen
John Duncan
Dong Li
Mark Manfredi
Minimol Mathew
Scientific Computing
Services
Ray Reichard
Padma Vellanki
Ramesh Durvasula
James Ewen
Lisa Johnson
Christa Musial
Matthias Nolte
Anusha Ramanathan
Padma Vellanki
Thomas Curneal
Mike Beluch
Lisa Johnson
Sangeet Khullar
Anusha Ramanathan
David Vanderbrooke
Dana Vanderwall
Nelly Masias
Mahesh Nawade
BMS Internal 23
24. End user supportEnd user support
Support email group
User Community SharePoint
Training and reference, FAQs, External links, contact info
All reported issues and status All reported issues and status
– [open, in progress, scheduled fix/improvement, resolved]
Internal BMS User Group Meeting
1 h thl i t d t ti & t i i f 2 3 i l 1 hr. monthly session to cover demonstration & training for 2-3 special
topics or features
Topics drawn from suggestions and requests for more info or training;
topics covered to date:topics covered to date:
– IJC: Query Builder; Visualization; Sharing by URL; exporting; working list (pick
list); R group decomposition; Markush draw/search
– JChem4XL- patent doc creation
BMS Internal
p
– IBM Patent Database; Metabolite Database
24
25. Assay meta dataAssay meta datayy
• Describe assay protocol & conditions
in controlled vocabulary
Biological description
Targety
• Protocols would have a minimum set
of fields that would have to be
populated before going into production
Gene name (look-up, and
capture locus link)
Species
C ll t
• Opportunity for business rules that guide the
protocol registration
• All downstream systems would utilize
th f k & t d t
Cell type
Assay description
Assay type
A dthe same framework & meta-data
• Propose adopting established
standard, aligning/collaborating with
Assay mode
Detection method
Results
R lt t
NIH BARD Project & BioAssay
Ontology (BAO)
• Requires process & roles for
Result type
Modifier
Units
t
Requires process & roles for
maintaining up to date dictionaries and
governance
etc
25
26. BAO scope and purposeBAO scope and purposep p pp p p
• BAO to describe assays and screening results
• Defines relevant assays and result annotations
• Provides controlled terminology
Formalizes knowledge of assays and screening results• Formalizes knowledge of assays and screening results
• Describes and formalizes screening campaigns, i.e.
relationship between assays in terms of their use
• BAO addresses problems with using data and
facilitates
• Leveraging existing data in discovery projects
• Global analysis across diverse data sets
I t ti f d t f diff t• Integration of data from different resources
26
27. What do we need to describe assaysWhat do we need to describe assaysyy
27