1. Identifying attributes
TMRA 2009 open space session
Peter-Paul Kruijsen
Morpheus
<p.kruijsen@mssm.nl>
2. Problem statement
! Domain: Merge external data into topic map
! Solution: add PSIs in both topic maps to enable
merging
! Consequence: Add PSI to almost every topic
! Cumbersome
! Tricky for customers to grasp
! Solution: Merge without hand-coded PSIs
3. Hand-coded PSIs
! PSIs are usually added by Topic Maps expert based
on identifying attributes
! http://example.org/people/ssn/12345789
! http://example.org/keywords/topic_maps
! http://example.org/system/IPK719
! Not everyone is able to define perfect PSIs
! Unique
! Stable
4. Solution
! Compare topics based on fingerprints
! SSN
! Codes
! Topic name
! Auto-generate PSIs based on these uniquely identifying
attributes
! http://psi.mssm.nl/random/1258041512117–030586nsZN5Gs6Tq
! Apply these PSIs to topics before merge
! Configuration can be stored in topic map
! k:identifying-attribute(i:person : k:topic-type, i:ssn : k:attribute)
! k:identifying-attribute(i:system : k:topic-type, i:code : k:attribute)
! k:identifying-attribute(i:keyword : k:topic-type, k:untyped-name : k:attribute)
6. Algorithm
! For two topic maps and a configuration
! For each topic in source topic map
! For each identifying attribute for topic type
! Lookup attribute value in target topic map
! If no PSI present: randomly generate PSI
! Apply PSIs from one topic to the other
! After this loop: merge topic maps
8. Ups/Downs
! Benefits
! Merging no longer requires mastering PSI but only
describing uniquely identifying attributes
! Customers write their own XSLT to generate TM/XML
! Applicable even after large imports
! Merge locally based on fingerprints
! Downsides
! Randomly generated PSIs are unreadable
! Possibility to ‘correct’ afterwards
! Enhancement: remove random PSI after merge