TechInvestLab.ru is starting a research program into an automation of formal modelling. The first project is developed together with ABBYY - the leading linguistic company. The project studies possibilities to build a Gellish-like formal model of a natural language technical document, for further transformation into an ISO 15926 compliant data model with TabLan.15926 engine. This presentation shows preliminary comparisons between syntactic and semantic structures parsed by ABBYY Compreno and manually prepared formal text models.
Ontology Modelling of an Engineering Document – Perspectives of Linguistics Analysis
1. Ontology Modelling of an
Engineering Document –
Perspectives of Linguistics
Analysis
26.08.2012
2. First Step: Requirements
Modelling
ROSENERGOATOM project, July 2011
– Manual processing methodology for Technical
Requirements document
– Special software for ISO 15926 data model
transformation
– Sample Nuclear Power Plant requirements
processing:
• Sample size: 12 paragraphs of text
• Content identified: 16 requirements, 3 classifiers
• Resulting model: 96 items, 35 relationships
2
3. Technical Document Semantic
Modelling
TabLan methodology, March 2012
– Manual processing methodology for technical
documents (English)
– Using subset of Gellish http://
sourceforge.net/apps/trac/gellish/
– Mapping to the enhanced Initial Template Set
– .15926 Editor for ISO 15926 data model
transformation
– Dowload free from http://
techinvestlab.ru/files/TabLan/TabLan.rar 3
4. Document Modelling Lessons
• Technical document modelling promise:
– Requirements verification
– Project IT systems customisation (classifiers for
CAD/CAM/PLM/ERP/etc.)
– Data integration support (reference data library content
generation)
– Tracing design decisions to requirements
– Design decisions verification
• Formal modelling problems:
– Labour-intensive process of manual modelling
– Large volume of «dumb» preparatory work
– Need for a professional engineering verification in a new
formalism unknown to engineers
– Fragmented architecture of project IT environment — an
obstacle for model reuse
4
5. Preconditions for Automation of
Technical Document Modelling
• Restricted and relatively formal engineering
subset of natural language
• Contemporary developments in computer based
natural language processing
• Contemporary developments in ontology
extraction from natural language texts
• Controlled language for engineering (Gellish)
• Gellish to ISO 15926 mapping development
5
6. Experimenting with
ABBYY Compreno
Technology That Translates from Human
into Computer Language
http://www.abbyy.ru/science/techno
logies/business/compreno
7. ABBYY Compreno
ABBYY Compreno is ABBYY’s innovative technology that performs full semantic and syntactic analysis for
comprehensive handling of natural language texts.
ABBYY Compreno is the first ever practical implementation of fundamental linguistic research carried
out internationally over the past fifty years. A result of seventeen years of intensive R&D, ABBYY
Compreno offers robust solutions to many long-standing language processing problems of the
information age, such as:
• Intelligent search and retrieval
– Intelligent semantic search
– Multilingual search
– Semantic tagging of documents for more powerful searching
• Comprehensive text analysis
– Information monitoring
– Controlling access to cofidential information
– Summarizing and annotating documents
– Sentiment analysis
• Efficient handling of text documents
– Document classification and filtering
– Text comparison
• High quality machine translation
8. Research Plan
• Starting point – comparison between:
• syntactic and semantic structure (parsed by ABBYY
Compreno)
• formal text model (manually prepared)
• Rule development for mapping between
linguistic and engineering ontologies (current)
• Customisation with domain thesauri (plans)
• Testing on a corpus of engineering texts (plans)
8
9. «The containment system shall include a
primary containment and a secondary
containment.»
ABBYY Compreno parser results: text view
9
11. «The containment system shall include a
primary containment and a secondary
containment.»
Formal model:
Containment system
A: is a whole for Primary containment
B: is a whole for Secondary containment
А is classified as a Requirement
B is classified as a Requirement
11
12. «Inner surfaces should be smooth to prevent
corrosion residue and to simplify decontamination.»
ABBYY Compreno
parser: tree view 12
13. «Inner surfaces should be smooth to prevent
corrosion residue and to simplify decontamination.»
Formal model:
Inner surfaces
is a specialization of Surface
is a specialization of Inner
Inner surfaces
A: is a specialization of Smooth
A
is classified as a Requirement
is intended to achieve To prevent corrosion residue and to simplify
decontamination
To prevent corrosion residue and to simplify decontamination
is a whole for To prevent corrosion residue
has as subject Corrosion residue
is a whole for To simplify decontamination
has as subject Decontamination
13