SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
In grammars we trust: LeadMine,
a knowledge driven solution
Daniel Lowe and Roger Sayle
NextMove Software
Cambridge, UK
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Approaches to Entity
recognition
• Dictionary based
• Grammar based
• Machine Learning
LeadMineLeadMine
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Optional
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Normalization
Input Normalized
œstradiol oestradiol
5` or 5’ or 5′ (backtick/quotation mark/prime) 5'
<p>H<sub>2</sub>O</p> H2O
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Blue: Grammars
Green: Traditional dictionaries
Orange: Blocking dictionaries
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Advantages of grammars
• Don’t require annotated corpora
• Encode knowledge about the domain
• Very fast recognition
• Allow spelling correction if an entity is a near
match to one recognized by the grammar
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Simple grammar Example
Digit1to9 : ‘1’ | ‘2’ |’4’ |’5’ |’6’ |’7’ |’8’ |’9’
Digit : Digit1to9 | ‘0’
Cid : ‘CID:’ Digit1to9 Digit*
C I D 1..9:
0..9
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Grammar for IUPAC names
• Grammar for complete molecules: 485 rules
– trivialRing : 'aceanthren'|'aceanthrylen'|'acenaphthen'...
– ringGroup : trivialRing | hantzschWidmanRing | vonBaeyerSystem ...
• Generally aims to match a superset of the
nomenclature covered by IUPAC
• Specifically this is the superset that can be
theoretically be converted to structures
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Grammar inheritance
• Molecule grammar serves as a good starting
point for a substituent grammar or generic
chemical grammar
– Inherit rules rather than duplicate them
– Allow overriding of rules
pluralizedChemical : chemical 's'
elementaryMetalAtom : 'lanthanide'|'lanthanoid'|'transition
metal'|'transuranic element' | _elementaryMetalAtom
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Dictionaries… bigger is better
• For high recall of trivial names, dictionaries
with high coverage are required.
• The largest publically available dictionary is
PubChem with over 94 million terms
• However most of these terms are either not
useful or actually detrimental to text mining
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Aggressive filtering
• “what you don't see won't hurt you”
• Hence remove terms are also English words or start with an
English word
– Accomplished using a large English dictionary with
chemistry terms removed
• Remove internal identifiers used by depositors
• Remove terms that are matched by our grammars
• Ultimate result: 94 million  2.94 million
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Structure Aware filtering
• “Do not tag proteins, polypeptides (> 15aa),
nucleic acid polymers, polysaccharides,
oligosaccharides [tetrasaccharide or longer] and other
biochemicals.”
• About 40,000 polypeptides and
oligosaccharides excluded from PubChem
using these criteria
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Entity Extension
• Even PubChem is far from comprehensive hence it can be
useful to extend the start and/or end of entities to avoid
partial hits
– α-santalol can be recognized from santalol in the
dictionary
• Extension is bracketing aware and blocked by English words
• Entity trimming also performed to comply with the
annotation guidelines
– ‘Allura Red AC dye’  ‘Allura Red AC’
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Entity Merging
• Adjacent entities may actually be part of one
entity
– Ethyl ester one entity
– (+)-limonene epoxide  one entity
BUT
– Hexane-benzene two entities
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Using an ontology to determine
when terms add information
• Genistein isoflavone  two entities
• Glycine ester  one entity
Genistein showing isoflavone core structure
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Abbreviation detection
• Based on the Hearst and Schwartz algorithm
• Detects abbreviations of the following forms:
– Tetrahydrofuran (THF)
– THF (tetrahydrofuran)
– Tetrahydrofuran (THF;
– Tetrahydrofuran (THF,
– (tetrahydrofuran, THF)
– THF = tetrahydrofuran
Schwartz, A.; Hearst, M. Proceedings of the Pacific Symposium on Biocomputing 2003.
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Domain-specific abbreviations
• Some abbreviations are not acronyms
• Can use string replacements to recognize
them e.g.
– Sodium  Na
– Estradiol  E2
Hence can recognize: 17α-ethinylestradiol  EE2
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Non-entity abbreviation
removal
• Finds entities detected as abbreviations of
unrecognized entities
– Can mean a common chemical abbreviation has
been redefined in the scope of the document
current good manufacturing practice (cGMP)
cGMP = Cyclic guanosine monophosphate =
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Making the most of the
knowledge provided
• Use training data to identify:
– Terms that are not currently recognized (whitelist)
– Terms that are often false positives (blacklist)
• Each false positive and false negative is placed
into such a list if its inclusion increased F-score
(harmonic mean of precision and recall)
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
CEM Task Results
(on development set)
Configuration Precision Recall F-score
Baseline 0.87 0.82 0.84
WhiteList 0.86 0.85 0.86
BlackList 0.88 0.80 0.84
WhiteList +
BlackList
0.87 0.83 0.85
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
CDI task ranking
• Uses precision of entities when running
against the development set with the results
broken down by:
– Title vs abstract?
– Which dictionary matched?
– Was the entity’s bounds modified?
– Did the entity occur more than once in the
document?
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Conclusions
• Grammars complement dictionaries to allow recognition
of novel entities
• Both the coverage and quality of dictionaries is
important
• The meaning of novel abbreviations can be determined
algorithmically
• Entities can be classified based on the resource that
recognized them
BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013
Thank you for your time!
http://nextmovesoftware.com
http://nextmovesoftware.com/blog
daniel@nextmovesoftware.com

Contenu connexe

Tendances

ICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASDr. Haxel Consult
 
2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) englishPOSTECH Library
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?Dr. Haxel Consult
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAlasdair Gray
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...NextMove Software
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinSimon Jupp
 
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...NextMove Software
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Valery Tkachenko
 
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...Dr. Haxel Consult
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4jSimon Jupp
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...Dr. Haxel Consult
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...mhaendel
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholdermhaendel
 

Tendances (20)

ICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CASICIC 2016: New Product Introduction CAS
ICIC 2016: New Product Introduction CAS
 
2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english2020 scifinder-n manual (2020) english
2020 scifinder-n manual (2020) english
 
The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...The importance of the InChI identifier as a foundation technology for eScienc...
The importance of the InChI identifier as a foundation technology for eScienc...
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...Classification, representation and analysis of cyclic peptides and peptide-li...
Classification, representation and analysis of cyclic peptides and peptide-li...
 
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
ChemSpider – A Community Platform for Chemistry and Resources Supporting the ...
 
Why Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpiderWhy Chemistry and the Web Will Benefit from a ChemSpider
Why Chemistry and the Web Will Benefit from a ChemSpider
 
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...Data Mining Dissertations and Adventures and Experiences in the World of Chem...
Data Mining Dissertations and Adventures and Experiences in the World of Chem...
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
ChemSpider - Building a Foundation for the Semantic Web by Hosting a Crowd So...
 
Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...Standardized Representations of ELN Reactions for Categorization and Duplicat...
Standardized Representations of ELN Reactions for Categorization and Duplicat...
 
Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...Building linked data large-scale chemistry platform - challenges, lessons and...
Building linked data large-scale chemistry platform - challenges, lessons and...
 
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...ICIC 2016: Mind the Gap:  The novel benefits of human-curated substance locat...
ICIC 2016: Mind the Gap: The novel benefits of human-curated substance locat...
 
Importing life science at a into Neo4j
Importing life science at a into Neo4jImporting life science at a into Neo4j
Importing life science at a into Neo4j
 
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
ICIC 2017: Tutorial - Digging bioactive chemistry out of patents using open r...
 
Value of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry communityValue of the mediawiki platform for providing content to the chemistry community
Value of the mediawiki platform for providing content to the chemistry community
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
Building (and traveling) the data-brick road: A report from the front lines ...
Building (and traveling) the data-brick road:  A report from the front lines ...Building (and traveling) the data-brick road:  A report from the front lines ...
Building (and traveling) the data-brick road: A report from the front lines ...
 
Equivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholderEquivalence is in the (ID) of the beholder
Equivalence is in the (ID) of the beholder
 

En vedette

Infografik: Wie fit ist Deutschland für die Zukunft?
Infografik: Wie fit ist Deutschland für die Zukunft?Infografik: Wie fit ist Deutschland für die Zukunft?
Infografik: Wie fit ist Deutschland für die Zukunft?Bertelsmann Stiftung
 
Scaling mondrian
Scaling mondrianScaling mondrian
Scaling mondrianlucboudreau
 
8th grade list 2014
8th grade list 20148th grade list 2014
8th grade list 2014Liz Slavens
 
Receta pinxto banderilla olmeda origenes
Receta pinxto banderilla olmeda origenesReceta pinxto banderilla olmeda origenes
Receta pinxto banderilla olmeda origenesOlmeda Orígenes
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchAmye Kenall
 
Daily Newsletter: 15th December, 2010
Daily Newsletter: 15th December, 2010Daily Newsletter: 15th December, 2010
Daily Newsletter: 15th December, 2010Fullerton Securities
 
From Macro to Micro: Greening Your Campus HANDOUT
From Macro to Micro: Greening Your Campus HANDOUTFrom Macro to Micro: Greening Your Campus HANDOUT
From Macro to Micro: Greening Your Campus HANDOUTPaul Brown
 
Applying testing mindset to software development
Applying testing mindset to software developmentApplying testing mindset to software development
Applying testing mindset to software developmentAndrii Dzynia
 
Prueba de portada
Prueba de portadaPrueba de portada
Prueba de portadapatricio
 
Digital badging at the OU
Digital badging at the OUDigital badging at the OU
Digital badging at the OUDr Patrina Law
 
presentation for BPC
presentation for BPCpresentation for BPC
presentation for BPCjjoyce
 
Story Testimonial Pitch
Story Testimonial PitchStory Testimonial Pitch
Story Testimonial PitchGaurav Gaur
 
Information Architecture class13 04 10
Information Architecture class13 04 10Information Architecture class13 04 10
Information Architecture class13 04 10Marti Gukeisen
 

En vedette (18)

Infografik: Wie fit ist Deutschland für die Zukunft?
Infografik: Wie fit ist Deutschland für die Zukunft?Infografik: Wie fit ist Deutschland für die Zukunft?
Infografik: Wie fit ist Deutschland für die Zukunft?
 
Scaling mondrian
Scaling mondrianScaling mondrian
Scaling mondrian
 
8th grade list 2014
8th grade list 20148th grade list 2014
8th grade list 2014
 
Receta pinxto banderilla olmeda origenes
Receta pinxto banderilla olmeda origenesReceta pinxto banderilla olmeda origenes
Receta pinxto banderilla olmeda origenes
 
asdfasdf
asdfasdfasdfasdf
asdfasdf
 
Revolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational ResearchRevolutionising the Journal through Big Data Computational Research
Revolutionising the Journal through Big Data Computational Research
 
Narmada Kannan_Resume
Narmada Kannan_ResumeNarmada Kannan_Resume
Narmada Kannan_Resume
 
Daily Newsletter: 15th December, 2010
Daily Newsletter: 15th December, 2010Daily Newsletter: 15th December, 2010
Daily Newsletter: 15th December, 2010
 
Peter Kunzlik
Peter KunzlikPeter Kunzlik
Peter Kunzlik
 
From Macro to Micro: Greening Your Campus HANDOUT
From Macro to Micro: Greening Your Campus HANDOUTFrom Macro to Micro: Greening Your Campus HANDOUT
From Macro to Micro: Greening Your Campus HANDOUT
 
Applying testing mindset to software development
Applying testing mindset to software developmentApplying testing mindset to software development
Applying testing mindset to software development
 
Prueba de portada
Prueba de portadaPrueba de portada
Prueba de portada
 
Digital badging at the OU
Digital badging at the OUDigital badging at the OU
Digital badging at the OU
 
API-diskusjonen
API-diskusjonenAPI-diskusjonen
API-diskusjonen
 
National and global public inclusive infrastructures
National and global public inclusive infrastructuresNational and global public inclusive infrastructures
National and global public inclusive infrastructures
 
presentation for BPC
presentation for BPCpresentation for BPC
presentation for BPC
 
Story Testimonial Pitch
Story Testimonial PitchStory Testimonial Pitch
Story Testimonial Pitch
 
Information Architecture class13 04 10
Information Architecture class13 04 10Information Architecture class13 04 10
Information Architecture class13 04 10
 

Similaire à In grammars we trust: LeadMine, a knowledge driven solution

Tackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extractionTackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extractionNextMove Software
 
Engl313 ada project4_slidedoc2 (1)
Engl313 ada project4_slidedoc2 (1)Engl313 ada project4_slidedoc2 (1)
Engl313 ada project4_slidedoc2 (1)KatieKrahn
 
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docxoswald1horne84988
 
Engl313 ada project4_slidedoc2
Engl313 ada project4_slidedoc2Engl313 ada project4_slidedoc2
Engl313 ada project4_slidedoc2ScottDorsch
 
FHIR tutorial - Afternoon
FHIR tutorial - AfternoonFHIR tutorial - Afternoon
FHIR tutorial - AfternoonEwout Kramer
 
Ethics reproducibility and data stewardship
Ethics reproducibility and data stewardshipEthics reproducibility and data stewardship
Ethics reproducibility and data stewardshipRussell Jarvis
 
FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014Ewout Kramer
 
The Killer Question(s) and Associated Experiment(s)
The Killer Question(s) and Associated Experiment(s)The Killer Question(s) and Associated Experiment(s)
The Killer Question(s) and Associated Experiment(s)CIMIT
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open DataRoss Mounce
 
BIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docxBIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docxAASTHA76
 
2018 Bio-IT World Agile in Wet Labs Speeds Big Data
2018 Bio-IT World Agile in Wet Labs Speeds Big Data2018 Bio-IT World Agile in Wet Labs Speeds Big Data
2018 Bio-IT World Agile in Wet Labs Speeds Big DataBruce Kozuma
 
Week 6 Discussion Putting it All Together - Revising the Justif.docx
Week 6 Discussion Putting it All Together - Revising the Justif.docxWeek 6 Discussion Putting it All Together - Revising the Justif.docx
Week 6 Discussion Putting it All Together - Revising the Justif.docxcockekeshia
 
Optimizing the project portfolio oracle Instantis enterprise track and crys...
Optimizing the project portfolio   oracle Instantis enterprise track and crys...Optimizing the project portfolio   oracle Instantis enterprise track and crys...
Optimizing the project portfolio oracle Instantis enterprise track and crys...p6academy
 
BEM 3701, Hazardous Waste Management 1 Course Descriptio.docx
BEM 3701, Hazardous Waste Management 1 Course Descriptio.docxBEM 3701, Hazardous Waste Management 1 Course Descriptio.docx
BEM 3701, Hazardous Waste Management 1 Course Descriptio.docxAASTHA76
 
How Free is Free?: Building courses with OERs
How Free is Free?: Building courses with OERsHow Free is Free?: Building courses with OERs
How Free is Free?: Building courses with OERsBCcampus
 
Agile User Studies (Agile & Beyond 2012)
Agile User Studies (Agile & Beyond 2012)Agile User Studies (Agile & Beyond 2012)
Agile User Studies (Agile & Beyond 2012)Derek Poppink CXA CUA
 

Similaire à In grammars we trust: LeadMine, a knowledge driven solution (20)

Tackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extractionTackling the difficult areas of chemical entity extraction
Tackling the difficult areas of chemical entity extraction
 
Engl313 ada project4_slidedoc2 (1)
Engl313 ada project4_slidedoc2 (1)Engl313 ada project4_slidedoc2 (1)
Engl313 ada project4_slidedoc2 (1)
 
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx1  ASSIGNMENT 1   REVIEWING RESEARCH AND MAKIN.docx
1 ASSIGNMENT 1 REVIEWING RESEARCH AND MAKIN.docx
 
Engl313 ada project4_slidedoc2
Engl313 ada project4_slidedoc2Engl313 ada project4_slidedoc2
Engl313 ada project4_slidedoc2
 
dScribe Workshop - U-M
dScribe Workshop - U-MdScribe Workshop - U-M
dScribe Workshop - U-M
 
FHIR tutorial - Afternoon
FHIR tutorial - AfternoonFHIR tutorial - Afternoon
FHIR tutorial - Afternoon
 
Ethics reproducibility and data stewardship
Ethics reproducibility and data stewardshipEthics reproducibility and data stewardship
Ethics reproducibility and data stewardship
 
FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014FHIR intro and background at HL7 Germany 2014
FHIR intro and background at HL7 Germany 2014
 
From OER to Open Culture
From OER to Open CultureFrom OER to Open Culture
From OER to Open Culture
 
Identifying Keywords and Searching Techniques
Identifying Keywords and Searching TechniquesIdentifying Keywords and Searching Techniques
Identifying Keywords and Searching Techniques
 
The Killer Question(s) and Associated Experiment(s)
The Killer Question(s) and Associated Experiment(s)The Killer Question(s) and Associated Experiment(s)
The Killer Question(s) and Associated Experiment(s)
 
The OpenCon Intro to Open Data
The OpenCon Intro to Open DataThe OpenCon Intro to Open Data
The OpenCon Intro to Open Data
 
BIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docxBIO 1030, Principles of Biology 1 Course Description .docx
BIO 1030, Principles of Biology 1 Course Description .docx
 
2018 Bio-IT World Agile in Wet Labs Speeds Big Data
2018 Bio-IT World Agile in Wet Labs Speeds Big Data2018 Bio-IT World Agile in Wet Labs Speeds Big Data
2018 Bio-IT World Agile in Wet Labs Speeds Big Data
 
Week 6 Discussion Putting it All Together - Revising the Justif.docx
Week 6 Discussion Putting it All Together - Revising the Justif.docxWeek 6 Discussion Putting it All Together - Revising the Justif.docx
Week 6 Discussion Putting it All Together - Revising the Justif.docx
 
Optimizing the project portfolio oracle Instantis enterprise track and crys...
Optimizing the project portfolio   oracle Instantis enterprise track and crys...Optimizing the project portfolio   oracle Instantis enterprise track and crys...
Optimizing the project portfolio oracle Instantis enterprise track and crys...
 
BEM 3701, Hazardous Waste Management 1 Course Descriptio.docx
BEM 3701, Hazardous Waste Management 1 Course Descriptio.docxBEM 3701, Hazardous Waste Management 1 Course Descriptio.docx
BEM 3701, Hazardous Waste Management 1 Course Descriptio.docx
 
How Free is Free?: Building courses with OERs
How Free is Free?: Building courses with OERsHow Free is Free?: Building courses with OERs
How Free is Free?: Building courses with OERs
 
Agile User Studies (Agile & Beyond 2012)
Agile User Studies (Agile & Beyond 2012)Agile User Studies (Agile & Beyond 2012)
Agile User Studies (Agile & Beyond 2012)
 
Assessment and Feedback - ORHEP
Assessment and Feedback - ORHEPAssessment and Feedback - ORHEP
Assessment and Feedback - ORHEP
 

Plus de NextMove Software

CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...NextMove Software
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...NextMove Software
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedNextMove Software
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESNextMove Software
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionNextMove Software
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...NextMove Software
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsNextMove Software
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...NextMove Software
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKitNextMove Software
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...NextMove Software
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical RepresentationsNextMove Software
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics DatabaseNextMove Software
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...NextMove Software
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesNextMove Software
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...NextMove Software
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)NextMove Software
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeNextMove Software
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]NextMove Software
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsNextMove Software
 

Plus de NextMove Software (20)

DeepSMILES
DeepSMILESDeepSMILES
DeepSMILES
 
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...CINF 170: Regioselectivity: An application of expert systems and ontologies t...
CINF 170: Regioselectivity: An application of expert systems and ontologies t...
 
Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...Building a bridge between human-readable and machine-readable representations...
Building a bridge between human-readable and machine-readable representations...
 
CINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speedCINF 35: Structure searching for patent information: The need for speed
CINF 35: Structure searching for patent information: The need for speed
 
A de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILESA de facto standard or a free-for-all? A benchmark for reading SMILES
A de facto standard or a free-for-all? A benchmark for reading SMILES
 
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs RevolutionRecent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
Recent Advances in Chemical & Biological Search Systems: Evolution vs Revolution
 
Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...Can we agree on the structure represented by a SMILES string? A benchmark dat...
Can we agree on the structure represented by a SMILES string? A benchmark dat...
 
Comparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule ImplementationsComparing Cahn-Ingold-Prelog Rule Implementations
Comparing Cahn-Ingold-Prelog Rule Implementations
 
Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...Eugene Garfield: the father of chemical text mining and artificial intelligen...
Eugene Garfield: the father of chemical text mining and artificial intelligen...
 
Recent improvements to the RDKit
Recent improvements to the RDKitRecent improvements to the RDKit
Recent improvements to the RDKit
 
Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...Pharmaceutical industry best practices in lessons learned: ELN implementation...
Pharmaceutical industry best practices in lessons learned: ELN implementation...
 
Digital Chemical Representations
Digital Chemical RepresentationsDigital Chemical Representations
Digital Chemical Representations
 
PubChem as a Biologics Database
PubChem as a Biologics DatabasePubChem as a Biologics Database
PubChem as a Biologics Database
 
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
CINF 17: Comparing Cahn-Ingold-Prelog Rule Implementations: The need for an o...
 
Building on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfilesBuilding on Sand: Standard InChIs on non-standard molfiles
Building on Sand: Standard InChIs on non-standard molfiles
 
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
Chemical Structure Representation of Inorganic Salts and Mixtures of Gases: A...
 
Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)Advanced grammars for state-of-the-art named entity recognition (NER)
Advanced grammars for state-of-the-art named entity recognition (NER)
 
Challenges in Chemical Information Exchange
Challenges in Chemical Information ExchangeChallenges in Chemical Information Exchange
Challenges in Chemical Information Exchange
 
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
RDKit: Six Not-So-Easy Pieces [RDKit UGM 2016]
 
RDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical DepictionsRDKit UGM 2016: Higher Quality Chemical Depictions
RDKit UGM 2016: Higher Quality Chemical Depictions
 

Dernier

Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sampleCasey Keith
 
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot ModelBhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot ModelDeiva Sain Call Girl
 
Alipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Alipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableAlipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Alipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableNitya salvi
 
Hire 💕 8617697112 Surat Call Girls Service Call Girls Agency
Hire 💕 8617697112 Surat Call Girls Service Call Girls AgencyHire 💕 8617697112 Surat Call Girls Service Call Girls Agency
Hire 💕 8617697112 Surat Call Girls Service Call Girls AgencyNitya salvi
 
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call GirlsDeiva Sain Call Girl
 
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRLTamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRLNitya salvi
 
Genuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call GirlsGenuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call GirlsDeiva Sain Call Girl
 
Top places to visit, top tourist destinations
Top places to visit, top tourist destinationsTop places to visit, top tourist destinations
Top places to visit, top tourist destinationsswarajdm34
 
ITALY - Visa Options for expats and digital nomads
ITALY - Visa Options for expats and digital nomadsITALY - Visa Options for expats and digital nomads
ITALY - Visa Options for expats and digital nomadsMarco Mazzeschi
 
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot ModelPapi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot ModelDeiva Sain Call Girl
 
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls AgencyHire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls AgencyNitya salvi
 
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call GirlsDeiva Sain Call Girl
 
2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)Delhi Call girls
 
WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...
WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...
WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...Nitya salvi
 
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room packageWhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room packageNitya salvi
 
Genuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call GirlsDeiva Sain Call Girl
 
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call GirlsDeiva Sain Call Girl
 
Hire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing NightHire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing NightNitya salvi
 
WhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in DarjeelingWhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in DarjeelingNitya salvi
 

Dernier (20)

Sample sample sample sample sample sample
Sample sample sample sample sample sampleSample sample sample sample sample sample
Sample sample sample sample sample sample
 
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot ModelBhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
Bhubaneswar Call Girls 8250077686 Service Offer VIP Hot Model
 
Alipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Alipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service AvailableAlipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
Alipore Call Girls - 📞 8617697112 🔝 Top Class Call Girls Service Available
 
Hire 💕 8617697112 Surat Call Girls Service Call Girls Agency
Hire 💕 8617697112 Surat Call Girls Service Call Girls AgencyHire 💕 8617697112 Surat Call Girls Service Call Girls Agency
Hire 💕 8617697112 Surat Call Girls Service Call Girls Agency
 
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Bhavnagar Escorts call Girls
 
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRLTamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
Tamluk ❤CALL GIRL 8617697112 ❤CALL GIRLS IN Tamluk ESCORT SERVICE❤CALL GIRL
 
Genuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call GirlsGenuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call Girls
Genuine 9332606886 Hot and Beautiful 💕 Bilaspur Escorts call Girls
 
Top places to visit, top tourist destinations
Top places to visit, top tourist destinationsTop places to visit, top tourist destinations
Top places to visit, top tourist destinations
 
ITALY - Visa Options for expats and digital nomads
ITALY - Visa Options for expats and digital nomadsITALY - Visa Options for expats and digital nomads
ITALY - Visa Options for expats and digital nomads
 
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot ModelPapi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
Papi kondalu Call Girls 8250077686 Service Offer VIP Hot Model
 
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls AgencyHire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
Hire 💕 8617697112 Reckong Peo Call Girls Service Call Girls Agency
 
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Chennai Escorts call Girls
 
2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)
2k Shots ≽ 9205541914 ≼ Call Girls In Tagore Garden (Delhi)
 
WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...
WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...
WhatsApp Chat: 📞 8617697112 Hire Call Girls Cooch Behar For a Sensual Sex Exp...
 
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room packageWhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
WhatsApp Chat: 📞 8617697112 Suri Call Girls available for hotel room package
 
Genuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Visakhapatnam Escorts call Girls
 
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call GirlsGenuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
Genuine 8250077686 Hot and Beautiful 💕 Hosur Escorts call Girls
 
CYTOTEC DUBAI ☎️ +966572737505 } Abortion pills in Abu dhabi,get misoprostal ...
CYTOTEC DUBAI ☎️ +966572737505 } Abortion pills in Abu dhabi,get misoprostal ...CYTOTEC DUBAI ☎️ +966572737505 } Abortion pills in Abu dhabi,get misoprostal ...
CYTOTEC DUBAI ☎️ +966572737505 } Abortion pills in Abu dhabi,get misoprostal ...
 
Hire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing NightHire 8617697112 Call Girls Udhampur For an Amazing Night
Hire 8617697112 Call Girls Udhampur For an Amazing Night
 
WhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in DarjeelingWhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
WhatsApp Chat: 📞 8617697112 Independent Call Girls in Darjeeling
 

In grammars we trust: LeadMine, a knowledge driven solution

  • 1. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 In grammars we trust: LeadMine, a knowledge driven solution Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK
  • 2. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Approaches to Entity recognition • Dictionary based • Grammar based • Machine Learning LeadMineLeadMine
  • 3. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Optional
  • 4. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Normalization Input Normalized œstradiol oestradiol 5` or 5’ or 5′ (backtick/quotation mark/prime) 5' <p>H<sub>2</sub>O</p> H2O
  • 5. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Blue: Grammars Green: Traditional dictionaries Orange: Blocking dictionaries
  • 6. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Advantages of grammars • Don’t require annotated corpora • Encode knowledge about the domain • Very fast recognition • Allow spelling correction if an entity is a near match to one recognized by the grammar
  • 7. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Simple grammar Example Digit1to9 : ‘1’ | ‘2’ |’4’ |’5’ |’6’ |’7’ |’8’ |’9’ Digit : Digit1to9 | ‘0’ Cid : ‘CID:’ Digit1to9 Digit* C I D 1..9: 0..9
  • 8. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Grammar for IUPAC names • Grammar for complete molecules: 485 rules – trivialRing : 'aceanthren'|'aceanthrylen'|'acenaphthen'... – ringGroup : trivialRing | hantzschWidmanRing | vonBaeyerSystem ... • Generally aims to match a superset of the nomenclature covered by IUPAC • Specifically this is the superset that can be theoretically be converted to structures
  • 9. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Grammar inheritance • Molecule grammar serves as a good starting point for a substituent grammar or generic chemical grammar – Inherit rules rather than duplicate them – Allow overriding of rules pluralizedChemical : chemical 's' elementaryMetalAtom : 'lanthanide'|'lanthanoid'|'transition metal'|'transuranic element' | _elementaryMetalAtom
  • 10. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Dictionaries… bigger is better • For high recall of trivial names, dictionaries with high coverage are required. • The largest publically available dictionary is PubChem with over 94 million terms • However most of these terms are either not useful or actually detrimental to text mining
  • 11. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Aggressive filtering • “what you don't see won't hurt you” • Hence remove terms are also English words or start with an English word – Accomplished using a large English dictionary with chemistry terms removed • Remove internal identifiers used by depositors • Remove terms that are matched by our grammars • Ultimate result: 94 million  2.94 million
  • 12. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Structure Aware filtering • “Do not tag proteins, polypeptides (> 15aa), nucleic acid polymers, polysaccharides, oligosaccharides [tetrasaccharide or longer] and other biochemicals.” • About 40,000 polypeptides and oligosaccharides excluded from PubChem using these criteria
  • 13. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Entity Extension • Even PubChem is far from comprehensive hence it can be useful to extend the start and/or end of entities to avoid partial hits – α-santalol can be recognized from santalol in the dictionary • Extension is bracketing aware and blocked by English words • Entity trimming also performed to comply with the annotation guidelines – ‘Allura Red AC dye’  ‘Allura Red AC’
  • 14. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Entity Merging • Adjacent entities may actually be part of one entity – Ethyl ester one entity – (+)-limonene epoxide  one entity BUT – Hexane-benzene two entities
  • 15. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Using an ontology to determine when terms add information • Genistein isoflavone  two entities • Glycine ester  one entity Genistein showing isoflavone core structure
  • 16. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Abbreviation detection • Based on the Hearst and Schwartz algorithm • Detects abbreviations of the following forms: – Tetrahydrofuran (THF) – THF (tetrahydrofuran) – Tetrahydrofuran (THF; – Tetrahydrofuran (THF, – (tetrahydrofuran, THF) – THF = tetrahydrofuran Schwartz, A.; Hearst, M. Proceedings of the Pacific Symposium on Biocomputing 2003.
  • 17. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Domain-specific abbreviations • Some abbreviations are not acronyms • Can use string replacements to recognize them e.g. – Sodium  Na – Estradiol  E2 Hence can recognize: 17α-ethinylestradiol  EE2
  • 18. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Non-entity abbreviation removal • Finds entities detected as abbreviations of unrecognized entities – Can mean a common chemical abbreviation has been redefined in the scope of the document current good manufacturing practice (cGMP) cGMP = Cyclic guanosine monophosphate =
  • 19. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Making the most of the knowledge provided • Use training data to identify: – Terms that are not currently recognized (whitelist) – Terms that are often false positives (blacklist) • Each false positive and false negative is placed into such a list if its inclusion increased F-score (harmonic mean of precision and recall)
  • 20. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 CEM Task Results (on development set) Configuration Precision Recall F-score Baseline 0.87 0.82 0.84 WhiteList 0.86 0.85 0.86 BlackList 0.88 0.80 0.84 WhiteList + BlackList 0.87 0.83 0.85
  • 21. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 CDI task ranking • Uses precision of entities when running against the development set with the results broken down by: – Title vs abstract? – Which dictionary matched? – Was the entity’s bounds modified? – Did the entity occur more than once in the document?
  • 22. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Conclusions • Grammars complement dictionaries to allow recognition of novel entities • Both the coverage and quality of dictionaries is important • The meaning of novel abbreviations can be determined algorithmically • Entities can be classified based on the resource that recognized them
  • 23. BioCreative IV workshop, DoubleTree by Hilton Hotel, Washington DC, USA 8th October 2013 Thank you for your time! http://nextmovesoftware.com http://nextmovesoftware.com/blog daniel@nextmovesoftware.com