1. Semantifying
Your CMS
Semantic CMS Community
Lecturer
Organization
Date of presentation
Co-funded by the
1 Copyright IKS Consortium
European Union
2. Page:
Part I: Foundations
(1) Introduction of Content Foundations of Semantic
(2)
Management Web Technologies
Part II: Semantic Content Part III: Methodologies
Management
Knowledge Interaction Requirements Engineering
(3) (7)
and Presentation for Semantic CMS
(4) Knowledge Representation
and Reasoning
(8)
Designing
Semantic CMS
Semantifying
(5) Semantic Lifting (9) your CMS
Storing and Accessing Designing Interactive
(6) Semantic Data
(10) Ubiquitous IS
www.iks-project.eu Copyright IKS Consortium
3. Page: 3
What is this Lecture about?
We have introduced ... Part III: Methodologies
... an RE approach for semantic
CMS. Requirements Engineering
(7) for Semantic CMS
... a component-based reference
architecture for the design of Designing
semantic CMS. (8) Semantic CMS
What„s next? (9) Semantifying
your CMS
A systematic method that can be
used by developers to extend (10)
Designing Interactive
„traditional“ CMS with semantic Ubiquitous IS
capabilities.
www.iks-project.eu Copyright IKS Consortium
5. Page:
Content Management Systems
Contentmanagement systems (CMS) are designed to
support a content management cycle
analyze content
creation and collection of content
the publication of content for access by users and/or other
systems
the management of these content
www.iks-project.eu
6. Page:
Standardized API
Each CMS provides an API to interact with the
repository which can be used within content-oriented
applications
Toprevent each CMS vendor providing their own
proprietary API,
two main specifications are being used in the community
JCR:Content Repository API for Java
CMIS: Content Management Interoperability Services
www.iks-project.eu
7. Page:
What Is JCR?
Abbreviation of Content Repository API for Java (JCR)
It is a specification for a Java platform API for accessing
content repositories in a uniform manner.
JSRs: Java Specification Requests
JSR 283: Content Repository for JavaTM Technology API
Version 2.0
www.iks-project.eu
8. Page:
What Is JCR?
Provides a functional view and a common vocabulary
over the content repository
One does not need to learn dozens of proprietary APIs
Encourages code portability
Prevents content lock in isolated silos by providing a
standardized repository model and access
www.iks-project.eu
10. Page:
Repository Model In JCR
Each node has a node type definition
Each node type can have
Property definitions specifying the properties that can be
used by instance of the node type
Child definitions specifying the node types of child nodes
that instances of current node type can have
www.iks-project.eu
11. Page:
What Is CMIS?
Abbreviation of Content Management Interoperability
Services
Defines a domain model and bindings that are designed
to be layered on top of existing Content Management
systems and their existing programmatic interfaces.
www.iks-project.eu
12. Page:
What Is CMIS?
Standard repository model and binding interface allows:
reduction of the work for integration of multi-vendor, multi-
repository content management environments
sweeping away the need for maintaining proprietary code
developing independent business units without
infrastructure considerations
www.iks-project.eu
13. Page:
Repository Model In CMIS
Theentities managed by CMIS are modeled as
typed Objects
CMIS comes with four types of base objects
Document object
Folder object
Relationship object
Policy object
Every CMIS object has a set of properties
www.iks-project.eu
14. Page:
Repository Model In CMIS
All CMIS objects are strongly typed
Object-Type defines a fixed and non-hierarchical set of properties that all
objects of that type have
CMIS has four base object types corresponding to four base objects:
cmis:document
cmis:folder
cmis:relationship
cmis:policy
Object types have their specific set of property definitions as in JCR
specification.
www.iks-project.eu
16. Page:
Comparison of JCR and CMIS
Both provides
High level domain model to represent the content in the
repository
Get rid of proprietary API of each content repository
www.iks-project.eu
18. Page:
Comparison of JCR and CMIS
Both JCR and CMIS define a hierarchical repository model.
JCR calls the building blocks as Nodes
CMIS calls the building blocks as Objects
Both JCR and CMIS specifies type definitions
Restrict properties
Restrict hierarchical structure
Content items of JCR and CMIS both have properties according
that are defined their type definitions
www.iks-project.eu
19. Page:
Metadata Management In CMS
Organizing the content as hierarchies
Through properties/parameters of nodes/objects/documents
Free format values, or selected from a constrained vocabulary (
which can be a taxonomy)
Can be used as content categories
By representing relationships between nodes/objects/documents
Taxonomies can be represented as tags hierarchies (as a
hierarchy of nodes..)
www.iks-project.eu
20. Page:
Generic Repository Model
Consideringthe JCR and CMIS repository models to
semantify a CMS,
we need a generic repository model
The
generic repository model should allow to represent
CMS objects from both specifications
www.iks-project.eu
22. Page:
Generic Repository Model
In the generic repository model
Object entity corresponds to JCR node and CMIS object
Object type entity corresponds to JCR node types and
CMIS object types
Property and property definition notions are also
represented in the generic repository model.
www.iks-project.eu
23. Page:
Generic Repository Model
ClassificationObject and Content Object notions are
introduced on top of the representation which covers
JCR and CMIS model
They differentiate data and metadata
Content objects are used to represent repository items
that contain actual data.
Classification Objects represent hierarchical taxonomies of
CMSs which are used to classify “content objects”
www.iks-project.eu
24. Page:
Strength of Semantic
Technologies
An ontology consists of following artifacts:
A vocabulary to describe a domain
A specification for intended meaning of vocabulary
including the how concept classification is done
Constraints providing additional knowledge about the
domain
Thus,
an ontology represents a formal and machine
manipulable model of a domain
www.iks-project.eu
25. Page:
Strength of Semantic
Technologies
A machinemanipulable model of a domain enables
reasoning on it
Reasoning provides
Recognising semantic similarity in spite of syntactic
differences
Recognising implicit consequences given explicitly
stated facts
www.iks-project.eu
26. Page:
Enhancing CMS With Semantic
Technologies
Provided
functionalities
on domain
ontology
Benefits
to CMSs
www.iks-project.eu
27. Page:
Extracting Semantics From CMSs
as Ontologies
ContentRepositories already provide certain amount of
semantics for content items
Through content
hierarchies, properties, taxonomies, node/object types
However this semantics is not “machine understandable”;
can not be reasoned on
www.iks-project.eu
28. Page:
Need For A Methodology
is a need for an “Integrated semantic engineering
There
method”
Enabling CMS developers to easily utilize semantic
functionalities provided by ontologies, reasoners, without a
major change in their systems
www.iks-project.eu
29. Page:
Extracting Semantics From CMSs
as Ontologies
Nodetypes/Object types/Document Types can be
automatically converted in to OWL classes
Properties as object and datatype properties
Restrictions when necessary
Nodes of these nodetypes can be created as instances…
www.iks-project.eu
31. Page:
What About Resources Having
Semantic Worth?
How should other resources be treated?
Links between content items
Taxonomies
Content hierarchies
Thereshould be configurable bridges from CMS to
ontology
www.iks-project.eu
32. Page:
Bridges
Should provide
Extracting certain CMS objects as ontology classes
Extracting certain CMS objects as ontology individuals
Extracting hierarchical structure through certain properties
between CMS objects
Extracting certain properties of CMS objects indicating a
semantic value
Treating differently to extracted properties according to
their annotations
www.iks-project.eu
33. Page:
Concept Bridge
Takes a query specifying the target CMS objects
Transforms the target objects to ontology classes
together with the possible hierarchical relations
Is able to include Subsumption Bridges to enable
hierarchy through certain properties
Is able to include Property Bridges to enable extract
certain properties of target objects and set appropriate
annotations in the ontology
www.iks-project.eu
34. Page:
Subsumption Bridge
Takes a query specifying the target CMS objects
Takes a predicate name
Forms subclass/superclass relations between the target
CMS objects through the specified predicate
www.iks-project.eu
35. Page:
Instance Bridge
Takes a query to select target CMS objects
Transforms selected CMS object into ontology
individuals
As Concept Bridge, it is able to include Property
Bridges to treat differently based on annotations of
properties of CMS objects
www.iks-project.eu
36. Page:
Property Bridge
Provides selectively lift some of the CMS objects
properties in the ontological representation
This enables lifting properties having semantic value only
It
can be included in and Concept Bridge or an Instance
Bridge
www.iks-project.eu
37. Page:
Backend Knowledge Base For
CMSs
As a result of semantic lifting mechanism we have the
ontological representation of the content repository
semantics
The ontological representation should be kept in a backend
knowledge base
and kept synchronized with the changes in the repository
A reasoner should be used collaboratively with the
knowledge base
to recognize implicit facts from the explicit ones in the ontology
www.iks-project.eu
38. Page:
Backend Knowledge Base For
CMSs
Existing triple stores
Providing built-in reasoner like Jena, Sesame
While Sesame supports only RDFS reasoning, Jena provides
RDFS, OWL and Rule Based reasoner
It is also possible to integrate external reasoner with triple
stores
Considering the pros and cons of different triple stores,
a generic interface to communicate with triple stores
host knowledge-base on different triple stores through the
generic interface
the semantic lifting mechanism can feed and query
ontologies hosted.
www.iks-project.eu
39. Page:
Using the Extracted Semantics in
Content Discovery
After extracting semantics of a CMS into an
ontology, the ontology can be used to provide semantic
functionalities on top of it.
Semantic search
It
can be further enhanced by aligning/merging external
domain ontologies
www.iks-project.eu
40. Page:
Initial CMS Structure
Workspace
NewsSubjectCodes NewsArticles
Disaster/
Accident
Health
classifiedBy
Education Article1
Economy
HealthTreatment
Business
Finance
Disease
Illness classifiedBy
Article2
ViralDiseases
Obesity
Article3
Eating Disorder
Cancer
Neurological classifiedBy
Disease SwineFlu
Content Management System Structure
www.iks-project.eu
41. Page:
Ontological Representation Of
CMS
Represent the CMS structure in the previous slide
ontologically
Represent the “news subject codes” branch as an
ontology class hierarchy
Represent the “news articles” branch as a set of ontology
individuals
www.iks-project.eu
42. Page:
Ontological Representation Of
CMS -NewsSubjectCodes
-ArtsCultureEntertainment
-DisasterAccident
-EconomyBusinessFinance
-Education
Article1
-EnvironmentalIssues
-Health instanceOf
Representation of
New Subject Codes as -Disease
hierarchical ontology -ViralDisease
classes instanceOf Article3 Representation of
-SwineFlu new articles as
-Cancer ontology individuals
-.........
-HealthTreatment
instanceOf Article2
-Illness
-Medicine
-SocialIssues
www.iks-project.eu Individual types are set with
corresponding ontology
43. Page:
Make a Search
Find me articles categorized by “Health” …
The answer contains: Article1, Article2 and Article3 due
to subsumption relation between the ontology classes.
www.iks-project.eu
44. Page:
Make a Rule Based Search
Rule: If a Disease isCausedBy PathogenicAgent
Then it is an InfectiousDisease.
Facts: Virus is a PathogenicAgent.
Fungi is a PathogenicAgent.
ViralDisease isCausedBy Virus.
Find me InfectiousDisease articles…
The answer is: Article 3
www.iks-project.eu
45. Page:
Go Back To Example
To represent “news subject codes” as a class hierarchy
in the ontological representation, we need a Concept
Bridge.
Having a query which targets the CMS objects under
“/Workspace/NewsSubjectCodes”
www.iks-project.eu
46. Page:
Go Back To Example
Torepresent “news articles” as individuals in the
ontological representation, we need an Instance Bridge
Having a query which targets the CMS objects under
“/Workspace/NewsArticles”
Having an inner Property Bridge which has “classifiedBy”
as predicate name
This will provide setting types of the individuals with the
ontology class corresponding to value of “classifiedBy”
property
www.iks-project.eu
47. Page:
Aligning External Ontologies
It is possible to align external domain ontologies with
the ontology representing the structure of CMS to be
able to use semantics in the external ontology
www.iks-project.eu
48. Page:
Go Over An Example
Initially, assume that we have the following ontology representation of
CMS
-NewsSubjectCodes
-ArtsCultureEntertainment MotorNeuroneDiseaseGeneClue
-EnvironmentalIssues … Professor Christopher Shaw,
from the Institute of Psychiatry at
-Health
Kings College London, said … Representatio
Representation of -Disease instanceOf
n of two of the
New Subject
-NeurologicalDisease News Articles
Codes as
as individuals
hierarchical -HealthTreatment
ontology classes GeneticCluesToEatingDisorders
-Illness
…Doctors studying the causes of
-EatingDisorder the eating disorders anorexia and
-Obesity instanceOf bulimia believe it has less to do with
media images of slim-figured
models and more to do with
-Medicine
biological and genetic factors…
-SocialIssues
www.iks-project.eu
49. Page:
Align CMS Representation With
External Ontology
-NewsSubjectCodes -MeSH
-ArtsCultureEntertainment -Anatomy
-DisasterAccident -Diseases
-EconomyBusinessFinance -Organisms
-Education -Psychiatry
-EnvironmentalIssues -BehaviorMechanisms
-Health
Mesh
Representation of -BehaviorDisciplines
-Disease Biomedic
New Subject Codes as
-MentalDisorders al
hierarchical ontology
-HealthTreatment
classes -AnxietyDisorders Ontology
-Illness -EatingDisorders
-EatingDisorder
-SleepingDisorders
equivalentTo
-Obesity
-SomotoformDisorders
-Medicine
-SocialIssues
www.iks-project.eu
50. Page:
Align CMS Representation With
External Ontology
...
...
-Education
-Organisms
-EnvironmentalIssues
-Psychiatry
-Health
instanceOf -BehaviorMechanisms
-Disease
-HealthTreatment -BehaviorDisciplines
-Illness instanceOf -MentalDisorders
-EatingDisorder -AnxietyDisorders
equivalentTo
-Obesity -EatingDisorders
instanceOf -SleepingDisorders
-Medicine
GeneticCluesToEatingDisorders
-SocialIssues
…Doctors studying the causes of
the eating disorders anorexia and
bulimia believe it has less to do with
media images of slim-figured
models and more to do with
www.iks-project.eu
biological and genetic factors…
51. Page:
Make A Search
Findme articles related with “psychiatry”
Search results will not only include the article
“MotorNeuroneDiseaseGeneClue” but also the article
“GeneticCluesToEatingDisorders”
The keyword “psychiatry” will be matched with the
ontology class “Psychiatry”.
Through reasoning, it will be inferred that the
“GeneticCluesToEatingDisorders” is an indirect instance
of “Psychiatry” class.
www.iks-project.eu
The figure is originally adapted from: http://intentionaldesign.ca/www/pmh3472/public_html/wp-content/uploads/2010/04/Content-Lifecycle-Management1.pngIt shows the content lifecycle in content management system. It produces results current and future status of the content in the content management system e.g whether it is controlled, it will be translated or deleted, etc. The cycle starts with the analysis phase. In this phase the strategy for the lifecycle of content is determined. How it will be produced, controlled, translated, etc. In collect phase, actual content is obtained, modified, versionized and if available metadata is created. In the manage phase, it is modeled and structured considering the standard approaches and stored. In the publishing phase, it’s transformed if needed and published.
CMIS is an open standard that uses web protocols to provide an generic abstraction layer on top of the content management systems. The web protocols specified by CMIS are Web Services (SOAP) and AtomPub.This specification is harbored in the OASIS consortium which offers lots of other standards related with the information society. CMIS specifications mainly contains service descriptions for storage and retrieval of content objects to/from underlying persistent store. The specification also introduces type and property definitions for the content objects. It includes services for version management and access control mechanisms as well.
CMISv1.0 specification offers 4 base objects (corresponding with their object types explained in the next slide). All content objects of a content repository that is compliant with the CMIS specification should have a type. The type definition of a content object determines the properties that it can have.Objects that holds the actual data are documents. They are elementary entities that are managed by a CMIS repository. Folder objects keep the file-able objects e.g documents and folders. Relationship objects represents a directional relationship between two objects. Policy objects specify administrative policies that can be applied to objects.
CMISv1.0 specifies 4 base object types namely cmis:document, cmis:folder, cmis:relationship, cmis:policy associated with the 4 base object explained in the previous slide. All these object types include a list of predefined properties that instances of these types can have. Any new object type should extend only one of these 4 object types. And any content object in the CMIS repository should have an object type.All CMIS objects are strongly typed means that a content object cannot have a property that is not defined its object type. Properties are not hierarchical, so an object type defines only a list of properties. For example, you can see the some of the properties that are defined in the cmis:document object type:cmis:namecmis:objectIdcmis:createdBycmis:creationDate…Whole specification can be found in: http://docs.oasis-open.org/cmis/CMIS/v1.0/cs01/cmis-spec-v1.0.html
Figure obtained from: Getting Started with CMISExamples using Content ManagementInteroperability Services, Abdera, & ChemistryJeff PottsNovember, 2009
The figure is adapted from http://dev.day.com/content/ddc/blog/2009/05/jcrcmiscomparison.htmlJCR is a content repository modelJava language API bindingsCMIS is document management modelWebService & AtomPub protocol bindingsComplementary of JCR/CMIS complementarity based on the above statements is similar to complementarity of Servlet API in java and the HTTP protocol .
Repository models specified by JCR and CMIS specifications are similar from a high level perspective. They both have hierarchical structure, specify elementary content items i.e objects and nodes. Elementary object items are restricted based on their type definitions. However, CMIS offers a more specialized model where objects may be a folder, document, relationship, policy object or an instance of a object type derived from one of the 4 default base types whereas JCR does not state such an obligation. Its API allows node type creation from scratch.
In the figure the elements that are in boxes having straight line borders represent the common hierarchy of JCR and CMIS content repositories. However, content repositories do not differentiate among actual data and metadata. So, to specify which objects keep metadata, which objects keep actual data, the Object element is extended with the new items: Content Object and Classification Object.For example, the content repository structure slide-37 contains only classification objects that are used to classify other documents. In other words, they do not hold actual data. It is possible that we would have document related with liver cancer that is classified by the Cancer node.
Legacy content management systems are mostly built on top of the LAMP stack and they do not implement new advancements in semantic technologies. All CMSs somehow exhibit three main categories for content management, namely content modeling, content creation and search on content items. Applying semantic functionalities (there are several possible procedures on domain ontologies) to these categories introduces several new improvements to CMS capabilities.Content Modeling: Considering the content modeling phase in a CMS; ontology browsing, ontology generation and ontology alignment procedures lead to ontology-guided modeling and automatic lifting to knowledge base. Ontology guided modeling enables content types and content metadata which are inline with ontological concepts so that inference and reasoning engines can process these models. A domain ontology helps the user while modeling the content metadata by restricting and suggesting according to the implicit and explicit knowledge that it exhibits. These functionalities are provided through ontology browsing and alignment procedures which might be provided through strong semantic features. On the other hand, automatic ontology generation can be done through CMS’s already existing data models. This may lead to creation of an ontology which reflects the as-is data models of the legacy CMS. In addition, alignment of the extracted ontology/ontologies with a domain ontology enables much more concrete ontological concepts which merges CMS’s as-is knowledge with a domain ontology knowledge in a semantic way.Content Creation/Editing: In content creation/editing phase, a CMS user makes use of the data models which are generated in the previous step (content/metadata modeling). Various functionalities provided on a domain ontology lead to semantic features such as auto-categorization of content items, suggestion for annotation, consistency checking etc… For example, while a CMS user is creating a text based content item, semantic functions might analyze the text and suggest the most appropriate concepts from the domain ontology for annotation. Or, the content item might be categorized under the most fitting concept automatically. If the user has defined semantic rules to be applied on the content items, automatic inference and semantic consistency checking mechanisms might be available.Search: Most of the existing CMSs cannot go beyond ordinary keyword-based search. However, understanding the meaning of the keywords that users enter and comparing them with the meanings of the words inside the content items require high-level semantic capabilities. Meanings of the words can change based on the domain ontology and even based on the user context. For example, searching with the keyword “Jersey”; one user might be interested in the island in the English channel or New Jersey in U.S or Jersey (JAX-RS) from the web programming domain. Capabilities of structural search can be aligned with the test-based search to come up with hybrid methods which benefits from all pros of different approaches.
Most of the time, content repository structures host implicit semantics. For example, content is organized in a hierarchical manner according to different categories in different levels, some standard taxonomies are used to annotate content items or content items contain semantic information within their properties. However, this implicit semantics can not be parsed by machines. Therefore, there is a need for a methodology that is compatible with existing content repository models. After extracting the semantics of content repository more intelligent operations e.g automatic annotating, automatic classifying, reasoning, etc can be done.
The methodology to extract the semantics of content management systems should not interfere in the CMS itself. CMS developers should be able to use the provided services.
Considering the domain model of content repositories, it can be seen that the structure can be mapped to OWL model as follows:The first step can be representing the type definitions as classes. Through the same logic content objects are turned into individuals as instance of associated ontology class. Properties of content repository objects can also be represented as datatype or object properties i.e relationships between content items are represented with object properties and literal valued properties can be transformed into datatype properties. If necessary, restrictions can also be defined e.g in class hierarchies.
This table shows the content repository model to OWL model mapping.
For mapping any content repository resources that have semantic information, there is a need for customizable bridges from content repository to ontology resources.
<ConceptBridge> <Query>/NewsSubjectCodes/%</Query> <PropertyBridge><PredicateName>equiClass</PredicateName> <PropertyAnnotation> <Annotation>equivalentClass</Annotation> </PropertyAnnotation> </PropertyBridge> </ConceptBridge>According tothisConcept Bridge, all content repository items under NewsSubjectCodeswill be transformed into OWL classes in the ontology by preserving the hierarchical structure. Furthermore, the target values referred from the processed content repository item through the equiClassproperty will be created as equivalent classes the processed content repository item.
<InstanceBridge> <Query>/NewsArticles/%</Query> <PropertyBridge> <PredicateName>relatedItem</PredicateName> <PropertyAnnotation>symmetric</PropertyAnnotation> </PropertyBridge> </InstanceBridge>According to this instance bridge, content repository items under the NewsArticlespath are transformed into ontology individuals. Furthermore, relatedItemproperties of the content repository objects will be added as assertions to corresponding individual. Furthermore, annotations can be used to specify detailed semantics of properties. For instance, when there exists an assertion like: contentItem1 -> relatedItem -> contentItem2, symmetric annotation indicates that contentItem2 -> relatedItem -> contentItem1 is also true.These annotations can be augmented with other OWL model property annotations such as transitive, functions, inverse, etc…
This an example structure of content repository that will be used to in the examples in the rest of this slide set. In the left hand side, it contains the type hierarchy that is used annotate actual content items in the repository. For instance Article2 is classified by HealthTreatmentcategory.