Competitive intelligence (CI) supports the decision makers in understanding the competitive environment by means of textual reports prepared based on public resources. CI is particularly demanding in the context of larger business
clusters. We report on a long-term project featuring large-scale manual semantic annotation of CI reports wrt. business clusters in several industries. The underlying ontologies are the result of collaborative editing by multiple student teams. The results of annotation are finally merged into CI maps that allow easy access to both the original documents and the knowledge structures.
Building and Integrating Competitive Intelligence Reports Using the Topic Map Technology
1. Building and Integrating Competitive IntelligenceReports Using the Topic Map Technology Vojtěch Svátek, Tomáš Kliegr, Jan Nemrava, Martin Ralbovsý, Vojtěch Roček ,Jan Rauch University of Economics, Winston Churchill Sq. 4, Prague, Czech Republic Jiří Šplíchal, Tomáš Vejlupek Tovek s.r.o., Chrudimská 1418, Prague, Czech Republic
2. CI and Business Clusters CI – Competitive Intelligence is a sub-field of business intelligence that supports decision makers in understanding the competitive environment by means of reports prepared based on (public) resources. Cluster is a set of companies in related fields operating in the same geographical area How to link and search multiple CI reports? Envisaged Solution: Create a complementary topic map that would put the important facts into context
3. TheTopic Map 1] Ontology: putting concepts into context Instances Associations TopicTypes 2] Annotate important bits of text with ontology concepts
4.
5. S1: Individual ontologies, merge Each team wrote the CI report (in a text editor) Consequently, they obtained a copy of a startup ontology Students extended the ontology with new topic types using Tovek Topic Mapper (TTM): an ontology editor and annotating tool (desktop application) Students used TTM to annotate bits of text with a topic type. Annotated text became an internal occurrence in the topic map The ontologies enriched with new topic types and annotations were collected from all teams We used OKS to merge the topic maps Extend ontology Annotate DOC HTML The result is a linking file between the document and the shared topic map XTM Startup Ontology Result is a linking file conneting document with the topic map
6. Topic Maps Merging Merging of: Business cluster topic map, All unstructured documents, Linking files Linking files CI reports HTML XTM DOC Shared industry topic map
7. Issues Annotated text fragmented, since each fragment is stored as internal occurrence Laborious Duplicate topic types Effective merging requires unique identifiers, which was achieved only for companies (registration numbers used in subject indicators)
8. S2: Collaborative Ontology Population Goal: remove duplicate topic types Startup ontology was placed on a PostgreSQL server Student teams collaboratively enriched the ontology with topic types, association types and occurrence types they assumed to use during the annotation in Topic Mapper The ontology was then frozen: each team got its copy. TTM was used only for annotation, and then OKS for merging Collaborative Ontology Creation remote repository Topic Maps for Merging Import ontology Shared topic map students Annotate only
9. Issues Separation of ontology enrichment and document annotation is not natural and requires an experienced annotator Annotations still kept as internal occurrences Multiple concurrent instances of OKS servers resulted in corruption in the topic map, probably due to caching in OKS Two topic map tools used, original documents not easily accessible
10. S3: Annotation by linking Goal: move annotation fully to the web All students used one instance of OKS server CI reports were placed into a CMS (Joomla!) Each structural unit was assigned an id (via HTML’s <a name>) Annotation was done via external occurrences External occurrences point at a specific bookmark at the document, where the annotated fragment starts. The annotated fragment is assumed to span up to the nearest following bookmark.
11. Issues … and finally advantages Issues: OKS Ontopoly was not stable enough in concurrent setting X-Pointer technology, which could be used to mark spans in the document, is not supported by current browsers Advantages: The text with full content (including even figures or links) in the CMS is more intelligible than fragments in internal occurrences Further editing of an article is possible in the CMS without invalidating the annotation Full-text search feature of the CMS can be exploited Bringing the best from the CMS world and OKS
12. Summary& Plans On the competitive intelligence use case, we tested several approaches for collaborative ontology design and document annotation with some 500 users altogether. OKS is a great tool, which gets additional edge by being web-based We deem the last approach taken: documents stored in a CMS linked through external occurrences with OKS as usable - contingent on improvements in Ontopoly and Joomla! Ontopoly wishes Greater stability in case of concurrent user access We missed user management and versioning in Ontopoly Joomla! wishes Support for „tagging“ arbitrary bits of text A tool for creating XPointer URLs based on user selection A functionality that would highlight part of the document based on a URL containing XPointer span