1. Online chemical modeling
environment: models
Iurii Sushko, Sergey Novotarskiy
Thursday, August 13, 2009
2. Existent alternatives
Classical approach: Weka, R, Mathematica
Advantages:
1. Most flexible
2. Suitable for research and deep analysis
Disadvantages:
1. It’s complex: suitable for mathematician,
informatician, statistician but not
chemist and biologist
2. Very tedious data preparation
5. Collaboration in QSAR
Possibilities for collaboration in QSAR:
1.Use others' data
a.build models, based on others' data
b.validate your models against others' data
2. Use others' models
a.validate your data against published models
b.use output of published models
as an input for new ones
c.compare performance of published models
with own ones
All existent modeling tools lack means of collaboration
6. OCHEM advantages
Collaboration-targeted features:
1. Tight connection between database and
modeling tools
2. Wiki, discussion, comments, tags
Simplified modeling workflow:
1. Sensible defaults for most parameters
2. Only necessary parameters requested
3. Data representation is targeted for chemist
4. Possibility of fine tune for experts
7. Modeling workflow
1. Data preparation
2. Building a model
3. Analysing the model
AD
4. Application of the
model
8. Stage 1 – Data preparation
Property Filtering
Condition
logP = 0.5 Toxicology, Biology, Temperature,
Partition coefficient. pH, species,
Melting Point = 100
C tissue, method
Data Point Introducer
Tags Bill G., Sergey B.
Toxicology, Biology,
Partition coefficient.
Date of modification
Informationsystem
Structure Article
Manipulation
Benzene, Urea, ... Editing Garberg, P
Organization “In vitro models for …”
Working sets<
22. Stage 4: Application of the model
Prediction results
Target compound Prediction Accuracy assessment
23. Stage 4: Application of the model
Assessment of accuracy of predictions
Target compound
24. Need for distribution of calculations
Fact: QSAR modeling is calculation-intensive
Examples of calculations:
• Training of neural network ensembles
• Computing 3D conformations
• Computing complex molecular descriptors
Solution:
• Distributed calculation network
• User can postpone, cancel or fetch task results later
25. Automatic updates and testing
Calculation servers are automatically updated upon
availability of new release
Automatic testing of servers upon updates
Tasks that did not pass tests are disabled, keeping
the server functional
26. Backend - distributed calculation
Central metaserver, distributed calculation servers
Automatic server updates, on-the-fly server testing
27. Basic facts
About 50000 experimental measurements on
285 physicochemical properties published in
about 2000 articles
Implemented modeling methods:
ANN, KNN, MLR, Kernel ridge regression
Integrated descriptors: Dragon, E-State,
Fragments
28. Backend - basic facts
Platform: Java EE
Database: MySQL
Server: Tomcat
ORM: Hibernate
MVC: Spring framework
Client side: AJAX, HTML+Javascript