Extracting static and dynamic model elements from textual specifications in humanities
1. Extracting static and dynamic
model elements from textual
specifications in humanities
Patricia Martín-Rodilla
César González-Pérez
Institute of Heritage Sciences, Spanish National Research Council
Santiago de Compostela, Spain.
2. Index
Research Context & Problem
Goal(s)
Related Work
Proposal:
oProposal Overview
oProposal Phases
Case Study in Cultural Heritage Information Systems
Discussion & Open Issues
3. Research Context
Information Systems are composed
of different information dimensions:
…
Structural (STATIC)
Architectural
Behavioral
Methodological (DYNAMIC)
…
BUT, IS support humans activities
SOFTWARE ANALYST
Software Textual Specifications
Documents about practices
…
Structural (STATIC) MODEL
Architectural MODEL
Behavioral MODEL
Methodological (DYNAMIC) MODEL
…
BUT, in Humanities information…
Narrative-based domains
Importance about the methodological context of information
(Static and dynamic link very pronounced)
Software analysts require hard information dimension effort
Software analysts are far from DH expertise
4. • To study how other works deal with the different information dimensions
from an holistic point of view, also:
• For humanities IS
• Directly from textual specifications in early stages software conception
• To propose a pipeline method as a tentatively semiautomatic approach for
our needs in humanities domains
Goal(s)
5. Related work
Works in modelling and automatic
extraction of DIFFERENT information
dimensions
Methods (Domain rules)
Processes (Process Mining)
Notations (BPMN, Topic maps,
Mind Maps, Concept Maps, i*,…)
Practices (Scenarios)
Works in HOLISTIC modelling and
automatic extraction of information
dimensions
Open/METIS
ISO/IEC 24744
Requirements: Cross-cutting concerns
NEED: From early stages textual specifications?
NEED: More than a conceptual bridge…Semi-supervised?
6. Pipeline approach: based on previous works: TextProcessMiner tool (Epure, Martin-Rodilla et al. 2015)
Initial dynamic information -> Process Mining Algorithms: Activity Logs
Initial static information -> Identification of domain key concepts: Concept map
Proposal
7. Phase I: TextProcessMiner
• Natural Language Processing approach
• TextProcessMiner extracts activities from
historical and archaeological official reports.
• Previously tested at CSIC, ADS…: in different
languages, validated by report’s authors.
• Locality principle in the activity identification:
tree-based syntactic structure.
TextCleaner
(Lemmatization, Automatic
cleaning, activities
recognition)
ActivityMiner
ActivityRelationshipMiner
Phase II: Preliminary Concept Map
Historical and Archaeological
Methodological Textual
Specifications
Discovered Log
(DYNAMIC INFO
DIMENSION)
Discovered Log
(DYNAMIC INFO
DIMENSION)
• Automatic identification of domain key concepts
• Part of Speech (POS) tagging techniques: decoupling
action verbs (activities candidates)
countable nouns (key concepts candidates)
• Why concepts maps?:
Intermediate formalization degree
Learning potential
Iterative methodology in concept map creation
• Why semi-automatic?
Better results in annotation approaches in humanities
Entities Decoupling (POS tagg.)
Activities Decoupling (POS tagg.)
Cross-links matching
(tree-based syntactic structure)
Preliminary Concept Map
8. Phase III: Supervised Phase
Preliminary Concept Map
Iterative Phase
Concepts and activity names verification: terminology, synonyms
Order and dependence cross links verification
Domain key concepts learning
Pipeline offers to Software Analysts:
- Most important concepts identification in the domain in a learning environment
- Activities identification and logs
- Static and dynamic preliminary link in domains’ terminology
Pipeline is current used:
- As a preliminary tool for extracting an holistic information view from early stages
textual specifications.
- As a tool for improving the model quality in terms of humanities terminology.
Supervised Concept Map
+
Activity Log sequence
10. Phase I: Extracting models in Cultural Heritage IS
“The trench was excavated using a toothed bucket using the
back actor of a small excavating machine. The watching brief
archaeologist inspected the sides of the trench for any past
cultural remains below the overburden. The removed spoil was
inspected in order to recover any past cultural artefacts.
Where archaeological deposits were revealed, each layer, fill
and cut was individually numbered and described in terms of soil
detail, stratigraphic position, dimensions, artefact content,
environmental samples and interpretation. The context system
was cross-referenced to other records. Registers were maintained
for all photographs, levels, plans, section, finds and samples
taken, made or gathered in the field.”
(From ADS Archaeological Report, Gerry Martin Associates Ltd. Glasgow)
- excavate trench -take photograph
- use bucket -take level
- use back_actor_of_machine -take plan
- inspect side_of_trench -take section
- inspect spoil -take find
- recover artefact -make photograph
- reveal deposit -make level
- number layer -make plan
- number fill -make section
- number cut -make find
- describe layer -gather photograph
- describe fill -gather level
- describe cut -gather plan
- cross_referenced context_system -gather section
- maintain register -gather find
Discovered LogTextual Specification
11. Phase II: Extracting models in Cultural Heritage IS
Trench
Bucket
Back_actor_of
_machine
LayerSide_of_trench
Level
Cut Find
Plan
SectionFill
Deposit
Artefact
Spoil Photograph
Register
Context
system
- excavate trench -take photograph
- use bucket -take level
- use back_actor_of_machine -take plan
- inspect side_of_trench -take section
- inspect spoil -take find
- recover artefact -make photograph
- reveal deposit -make level
- number layer -make plan
- number fill -make section
- number cut -make find
- describe layer -gather photograph
- describe fill -gather level
- describe cut -gather plan
- cross_referenced context_system -gather section
- maintain register -gather find
Discovered Log String Concept Map
+
Activity List
12. Phase II: Extracting models in Cultural Heritage IS
Preliminary Concept Map
Eastgate,
Hexham
Bucket
Back_actor_
of_machine
Layer
Side_of_trench
Level
Cut
Find PlanSection
Fill
DepositArtefact
Spoil
Photograph
Register
Context
system
USE
INSPECT
REVEAL
RECOVER
NUMBER
DESCRIBE
CROSS-
REFERENCE
GATHER
MANTAIN
EXCAVATE
Trench
13. Phase III: Extracting models in Cultural Heritage IS
Supervised Concept Map
Eastgate,
Hexham
BucketBack_actor_
of_machine
Layer
Side_of_trench
Level
Cut
Find PlanSection
Fill
DepositArtefact
Spoil
Photograph
Register
Context
system
USES
INSPECTS
ALLOWS
REVEALING
ALLOWS
RECOVERING
NUMBERS
DESCRIBES
CROSS-
REFERENCE
TO GATHER
HAS TO
MANTAIN
EXCAVATES
Trench
14. Discussion & Open Issues
Work-in-progress proposal: holistic static and dynamic approach in information modelling
Software analysts do not need previous domain knowledge to start creating models
Maintenance of the semantic static and dynamic link in humanities domains’ terminology
Semi-supervised approach: Software analysts can gradually learn domains’ key concepts and
practices
Iterative pipeline: incremental improvement of the outputs
Tested and evaluated by experts at historical and archaeological textual specifications
Technological dependences: TextProcessMiner (NLP toolkit by Standford) -> TOWARDS A METAMODEL
Locality principle and synonyms limitations-> WordNet, CILI INTEGRATION
Humanities sub-domains’ adaptation: CH thesauri's, ontologies
Need for rigorous validation with a vast CH textual specifications corpus
From activity list to Process Models (Process Mining tools integration: DISCO, etc.)
15. Extracting static and dynamic
model elements from textual
specifications in humanities
Thank you for your attention
Patricia Martín-Rodilla
patricia.martin-rodilla@incipit.csic.es
Institute of Heritage Sciences
Spanish National Research Council
Santiago de Compostela, Spain.