SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Online chemical modeling
  environment: models

    Iurii Sushko, Sergey Novotarskiy
        Thursday, August 13, 2009
Existent alternatives
Classical approach: Weka, R, Mathematica

Advantages:

       1. Most flexible
       2. Suitable for research and deep analysis

Disadvantages:

       1. It’s complex: suitable for mathematician,
          informatician, statistician but not
          chemist and biologist
       2. Very tedious data preparation
Community driven source   Authority driven source
Collaboration in QSAR
Possibilities for collaboration in QSAR:

 1.Use others' data
      a.build models, based on others' data
      b.validate your models against others' data
 2. Use others' models
      a.validate your data against published models
      b.use output of published models
        as an input for new ones
      c.compare performance of published models
        with own ones

 All existent modeling tools lack means of collaboration
OCHEM advantages
Collaboration-targeted features:
    1. Tight connection between database and
       modeling tools
    2. Wiki, discussion, comments, tags



Simplified modeling workflow:
    1.   Sensible defaults for most parameters
    2.   Only necessary parameters requested
    3.   Data representation is targeted for chemist
    4.   Possibility of fine tune for experts
Modeling workflow

1. Data preparation


2. Building a model


3. Analysing the model
                         AD
4. Application of the
   model
Stage 1 – Data preparation
   Property                    Filtering
                                                        Condition
     logP = 0.5            Toxicology, Biology,         Temperature,
                           Partition coefficient.         pH, species,
Melting Point = 100
          C                                             tissue, method




                             Data Point                   Introducer
         Tags                                            Bill G., Sergey B.
  Toxicology, Biology,
  Partition coefficient.
                                                     Date of modification
                                                     Informationsystem




   Structure                                             Article
                            Manipulation
  Benzene, Urea, ...             Editing                   Garberg, P
                            Organization            “In vitro models for …”
                              Working sets<
Stage 1 – Data preparation                       Tags
                                            Toxicology, Biology,
                                            Partition coefficient.




                          Manipulation
                               Editing
                           Organization
                            Working sets<




    Filtering
Toxicology, Biology,
Partition coefficient.
Stage 1: Data preparation
Stage 1: Data preparation
Stage 1: Data preparation
Stage 1: Data preparation
Stage 2: Model building - input data
Stage 2: Model building - descriptors (I)
Stage 2: Model building - descriptors (II)
Stage 2: Model building – descriptors (manual)
Stage 3: Analysing the model (I)
Basic model statistics
Stage 3: Analysing the model (II)
Applicability domain assessment
Stage 4: Application of the model
Selection of the model of interest




                              Model, published by another user
        Newly created model
Stage 4: Application of the model
Provide target compounds
Stage 4: Application of the model
 Prediction results




Target compound       Prediction   Accuracy assessment
Stage 4: Application of the model
Assessment of accuracy of predictions




Target compound
Need for distribution of calculations
Fact: QSAR modeling is calculation-intensive

Examples of calculations:
• Training of neural network ensembles
• Computing 3D conformations
• Computing complex molecular descriptors

Solution:
• Distributed calculation network
• User can postpone, cancel or fetch task results later
Automatic updates and testing




  Calculation servers are automatically updated upon
  availability of new release
  Automatic testing of servers upon updates
  Tasks that did not pass tests are disabled, keeping
  the server functional
Backend - distributed calculation
Central metaserver, distributed calculation servers
Automatic server updates, on-the-fly server testing
Basic facts

  About 50000 experimental measurements on
  285 physicochemical properties published in
  about 2000 articles
  Implemented modeling methods:
  ANN, KNN, MLR, Kernel ridge regression
  Integrated descriptors: Dragon, E-State,
  Fragments
Backend - basic facts

 Platform: Java EE
 Database: MySQL
 Server: Tomcat
 ORM: Hibernate
 MVC: Spring framework
 Client side: AJAX, HTML+Javascript

Contenu connexe

Tendances

Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 month
biinoida
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
JAVAID AHMAD WANI
 

Tendances (15)

IRJET- Leaf Disease Detecting using CNN Technique
IRJET- Leaf Disease Detecting using CNN TechniqueIRJET- Leaf Disease Detecting using CNN Technique
IRJET- Leaf Disease Detecting using CNN Technique
 
Introduction to Chemoinfornatics
Introduction to ChemoinfornaticsIntroduction to Chemoinfornatics
Introduction to Chemoinfornatics
 
Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015Gene Ontology WormBase Workshop International Worm Meeting 2015
Gene Ontology WormBase Workshop International Worm Meeting 2015
 
Tomato leaves diseases detection approach based on support vector machines
Tomato leaves diseases detection approach based on support vector machinesTomato leaves diseases detection approach based on support vector machines
Tomato leaves diseases detection approach based on support vector machines
 
OECD Webinar | Assessing the dispersion stability and dissolution (rate) of n...
OECD Webinar | Assessing the dispersion stability and dissolution (rate) of n...OECD Webinar | Assessing the dispersion stability and dissolution (rate) of n...
OECD Webinar | Assessing the dispersion stability and dissolution (rate) of n...
 
ReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for HealthReComp and P4@NU: Reproducible Data Science for Health
ReComp and P4@NU: Reproducible Data Science for Health
 
Complex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine LearningComplex Systems Biology Informed Data Analysis and Machine Learning
Complex Systems Biology Informed Data Analysis and Machine Learning
 
Indexing of large biometric database
Indexing of large biometric databaseIndexing of large biometric database
Indexing of large biometric database
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 month
 
IRJET - Disease Detection in Plant using Machine Learning
IRJET -  	  Disease Detection in Plant using Machine LearningIRJET -  	  Disease Detection in Plant using Machine Learning
IRJET - Disease Detection in Plant using Machine Learning
 
State of Art analysis
State of Art analysisState of Art analysis
State of Art analysis
 
xtremes
xtremesxtremes
xtremes
 
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
An Exploration on the Identification of Plant Leaf Diseases using Image Proce...
 
Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning Plant disease detection and classification using deep learning
Plant disease detection and classification using deep learning
 
bioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics databioinformatics enabling knowledge generation from agricultural omics data
bioinformatics enabling knowledge generation from agricultural omics data
 

En vedette

S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical MetaphorS-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
virtual-campus
 
Chemical Spaces: Modeling, Exploration & Understanding
Chemical Spaces: Modeling, Exploration & UnderstandingChemical Spaces: Modeling, Exploration & Understanding
Chemical Spaces: Modeling, Exploration & Understanding
Rajarshi Guha
 

En vedette (6)

S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical MetaphorS-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
S-CUBE LP: Chemical Modeling: Workflow Enactment based on the Chemical Metaphor
 
Distimo G Star Presentation
Distimo G Star PresentationDistimo G Star Presentation
Distimo G Star Presentation
 
PHYSICO-CHEMICAL CHARACTERIZATION AND PRODUCT DEVELOPMENT FROM BANANA GERMPLA...
PHYSICO-CHEMICAL CHARACTERIZATION AND PRODUCT DEVELOPMENT FROM BANANA GERMPLA...PHYSICO-CHEMICAL CHARACTERIZATION AND PRODUCT DEVELOPMENT FROM BANANA GERMPLA...
PHYSICO-CHEMICAL CHARACTERIZATION AND PRODUCT DEVELOPMENT FROM BANANA GERMPLA...
 
4th International Conference on Process Analytical Technologies in Organic Pr...
4th International Conference on Process Analytical Technologies in Organic Pr...4th International Conference on Process Analytical Technologies in Organic Pr...
4th International Conference on Process Analytical Technologies in Organic Pr...
 
Chemical Spaces: Modeling, Exploration & Understanding
Chemical Spaces: Modeling, Exploration & UnderstandingChemical Spaces: Modeling, Exploration & Understanding
Chemical Spaces: Modeling, Exploration & Understanding
 
Is there a future for conventional abstracting and indexing services?
Is there a future for conventional abstracting  and indexing services?Is there a future for conventional abstracting  and indexing services?
Is there a future for conventional abstracting and indexing services?
 

Similaire à Online Chemical Modeling Environment: Models

Rattani - Ph.D. Defense Slides
Rattani - Ph.D. Defense SlidesRattani - Ph.D. Defense Slides
Rattani - Ph.D. Defense Slides
Pluribus One
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Christos Kannas
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
Neil Swainston
 

Similaire à Online Chemical Modeling Environment: Models (20)

Rattani - Ph.D. Defense Slides
Rattani - Ph.D. Defense SlidesRattani - Ph.D. Defense Slides
Rattani - Ph.D. Defense Slides
 
LAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_frameworkLAK13 linkedup tutorial_evaluation_framework
LAK13 linkedup tutorial_evaluation_framework
 
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSISSEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
SEMI SUPERVISED BASED SPATIAL EM FRAMEWORK FOR MICROARRAY ANALYSIS
 
Research proposal
Research proposalResearch proposal
Research proposal
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
 
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
Adam Margolin & Nicole DeFlaux Science Online London 2011-09-01
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
IRJET- Disease Prediction System
IRJET- Disease Prediction SystemIRJET- Disease Prediction System
IRJET- Disease Prediction System
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...Delineation of techniques to implement on the enhanced proposed model using d...
Delineation of techniques to implement on the enhanced proposed model using d...
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 
CV_10/17
CV_10/17CV_10/17
CV_10/17
 
Cv long
Cv longCv long
Cv long
 
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
Micro B3 Information System and Biovel: Resources, Services, Workflows and In...
 
A2 annotation approach
A2 annotation approachA2 annotation approach
A2 annotation approach
 
Collaboration and Sharing
Collaboration and SharingCollaboration and Sharing
Collaboration and Sharing
 
G046024851
G046024851G046024851
G046024851
 
Hypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining AlgorithmsHypothesis on Different Data Mining Algorithms
Hypothesis on Different Data Mining Algorithms
 
From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...From sensor readings to prediction: on the process of developing practical so...
From sensor readings to prediction: on the process of developing practical so...
 

Plus de SSA KPI

Germany presentation
Germany presentationGermany presentation
Germany presentation
SSA KPI
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
SSA KPI
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
SSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
SSA KPI
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
SSA KPI
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
SSA KPI
 

Plus de SSA KPI (20)

Germany presentation
Germany presentationGermany presentation
Germany presentation
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011
 
Talking with money
Talking with moneyTalking with money
Talking with money
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investment
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice games
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security Costs
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biology
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functions
 

Dernier

Dernier (20)

Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 

Online Chemical Modeling Environment: Models

  • 1. Online chemical modeling environment: models Iurii Sushko, Sergey Novotarskiy Thursday, August 13, 2009
  • 2. Existent alternatives Classical approach: Weka, R, Mathematica Advantages: 1. Most flexible 2. Suitable for research and deep analysis Disadvantages: 1. It’s complex: suitable for mathematician, informatician, statistician but not chemist and biologist 2. Very tedious data preparation
  • 3.
  • 4. Community driven source Authority driven source
  • 5. Collaboration in QSAR Possibilities for collaboration in QSAR: 1.Use others' data a.build models, based on others' data b.validate your models against others' data 2. Use others' models a.validate your data against published models b.use output of published models as an input for new ones c.compare performance of published models with own ones All existent modeling tools lack means of collaboration
  • 6. OCHEM advantages Collaboration-targeted features: 1. Tight connection between database and modeling tools 2. Wiki, discussion, comments, tags Simplified modeling workflow: 1. Sensible defaults for most parameters 2. Only necessary parameters requested 3. Data representation is targeted for chemist 4. Possibility of fine tune for experts
  • 7. Modeling workflow 1. Data preparation 2. Building a model 3. Analysing the model AD 4. Application of the model
  • 8. Stage 1 – Data preparation Property Filtering Condition logP = 0.5 Toxicology, Biology, Temperature, Partition coefficient. pH, species, Melting Point = 100 C tissue, method Data Point Introducer Tags Bill G., Sergey B. Toxicology, Biology, Partition coefficient. Date of modification Informationsystem Structure Article Manipulation Benzene, Urea, ... Editing Garberg, P Organization “In vitro models for …” Working sets<
  • 9. Stage 1 – Data preparation Tags Toxicology, Biology, Partition coefficient. Manipulation Editing Organization Working sets< Filtering Toxicology, Biology, Partition coefficient.
  • 10. Stage 1: Data preparation
  • 11. Stage 1: Data preparation
  • 12. Stage 1: Data preparation
  • 13. Stage 1: Data preparation
  • 14. Stage 2: Model building - input data
  • 15. Stage 2: Model building - descriptors (I)
  • 16. Stage 2: Model building - descriptors (II)
  • 17. Stage 2: Model building – descriptors (manual)
  • 18. Stage 3: Analysing the model (I) Basic model statistics
  • 19. Stage 3: Analysing the model (II) Applicability domain assessment
  • 20. Stage 4: Application of the model Selection of the model of interest Model, published by another user Newly created model
  • 21. Stage 4: Application of the model Provide target compounds
  • 22. Stage 4: Application of the model Prediction results Target compound Prediction Accuracy assessment
  • 23. Stage 4: Application of the model Assessment of accuracy of predictions Target compound
  • 24. Need for distribution of calculations Fact: QSAR modeling is calculation-intensive Examples of calculations: • Training of neural network ensembles • Computing 3D conformations • Computing complex molecular descriptors Solution: • Distributed calculation network • User can postpone, cancel or fetch task results later
  • 25. Automatic updates and testing Calculation servers are automatically updated upon availability of new release Automatic testing of servers upon updates Tasks that did not pass tests are disabled, keeping the server functional
  • 26. Backend - distributed calculation Central metaserver, distributed calculation servers Automatic server updates, on-the-fly server testing
  • 27. Basic facts About 50000 experimental measurements on 285 physicochemical properties published in about 2000 articles Implemented modeling methods: ANN, KNN, MLR, Kernel ridge regression Integrated descriptors: Dragon, E-State, Fragments
  • 28. Backend - basic facts Platform: Java EE Database: MySQL Server: Tomcat ORM: Hibernate MVC: Spring framework Client side: AJAX, HTML+Javascript