The rsc e science - reflecting the change in the world we live in
1. The RSC & e-Science:
Reflecting the Change in the
World we Live In
Valery Tkachenko
RSC-OSDD Consultative Workshop on
Cheminformatics
Delhi, September 28th
2013
3. The World we live in
Internet World
20+ years into the Internet Revolution
Web 2.0 -> Web 3.0
Connected World
Social Networks
Real-time Communications
Big Data World
Semantic content
New Interfaces
4. Pillars of the World
Data
Data (knowledge) is a King
Dataflow
Navigation
Domain-specific search and navigation
Navigate inside and link out - federation
Interfaces
HCI (human computer interface)
M2M (machine to machine)
14. • 29 million chemicals and growing
• Data sourced from >500 different sources
• Crowdsourced curation and annotation
• Ongoing deposition of data from our
journals and our collaborators
• A structure centric hub for web-searching
33. DERA and Text Mining
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-
thiadiazol-5-yl)urea prepared in Example 6, thionyl chloride ( 5
ml ) and benzene ( 50 ml ) were charged into a glass reaction
vessel equipped with a mechanical stirrer, thermometer and
reflux condenser .
The reaction mixture was heated at reflux with stirring, for a
period of about one-half hour .
After this time the benzene and unreacted thionyl chloride
were stripped from the reaction mixture under reduced
pressure to yield the desired product N-(β-chloroethyl)-N-
methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid
residue
34. Text Mining
The N-(β-hydroxyethyl)-N-methyl-N'-(2-trifluoromethyl-1,3,4-
thiadiazol-5-yl)urea prepared in Example 6 , thionyl chloride ( 5
ml ) and benzene ( 50 ml ) were charged into a glass reaction
vessel equipped with a mechanical stirrer , thermometer and
reflux condenser .
The reaction mixture was heated at reflux with stirring , for a
period of about one-half hour .
After this time the benzene and unreacted thionyl chloride
were stripped from the reaction mixture under reduced
pressure to yield the desired product N-(β-chloroethyl)-N-
methyl-N'-(2-trifluoromethyl-1,3,4-thiaidazol-5-yl)urea as a solid
residue
35. It is so difficult to navigate…
What’s the
structure?
What’s the
structure?
Are they in
our file?
Are they in
our file?
What’s
similar?
What’s
similar?
What’s the
target?
What’s the
target?Pharmacology
data?
Pharmacology
data?
Known
Pathways?
Known
Pathways?
Working On
Now?
Working On
Now?Connections
to disease?
Connections
to disease?
Expressed in
right cell type?
Expressed in
right cell type?
Competitors?Competitors?
IP?IP?
36. Digitally Enabling RSC Archive
Text, PDF, XML
Structures
Reactions
Spectra
Materials
Chemistry Validation and
Standardization Platform
(CVSP)
DERA
(Text Mining)
Biological
Activities
37. Data quality issue and CVSP
Robochemistry
Proliferation of errors in public and
private databases
Automated quality control system
41. 2 records where Smiles, InChI, and name did not match
the structure
DB00611 DB01547
42. ~40 records where InChIs did not match the structure
DrugBank ID: DB00755
InChI=1S/C20H28O2/c1-15(8-6-9-16(2)14-19(21)22)11-12-18-17(3)10-7-13-
20(18,4)5/h6,8-9,11-12,14H,7,10,13H2,1-5H3,(H,21,22)/b9-6+,12-11+,15-8+,16-14+
DruGBank ID: DB00614
43. DB08128
J. Brechner, IUPAC
Graphical Representation of
stereochem. configurations
Section: ST-1.1.10
DB06287
7 records with 2 stereo bonds at chiral
atoms
44. CVSP validation of ChEMBL 16 (~1.3 mln. records)
• Overall 0.7% of records had validation issues
• Stereo problems (~82%)
• Directions of bonds do not make sense (~63%)
• Ambiguous stereo : 2 stereo bonds at chiral center (~19%)
62. National Data Repository
University 1
Data Hub
Workstations
University 2
Data Hub
Workstations
Company 3
Data Hub
Workstations
Data Repository
indexed storage
Data Repository provided
data storage
Chemically
intelligent services
Indexes
Data
External clients Publishers
Scientists Funding bodies
63. http://www.openphacts.org
Open PHACTS is an Innovative
Medicines Initiative (IMI) project,
aiming to reduce the barriers to
drug discovery in industry,
academia and for small
businesses.
Semantic web is one of the
corner stones
64. What does e-Science do in
?
ChemSpider provides many of the
physicochemical properties within the
Open PHACTS Discovery Platform
e-Science develop tools to check and
standardise chemical structures
•
•
e-Science is creating the Open PHACTS
chemical registration system
•