SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
SCEI(Semantic Communication Engine Innsbruck)
pronounced SKY
Technical Whitepaper
Dieter Fensel, Michael Fried, Christoph Fuchs, Iker Larizgoitia, Alex Oberhauser, Stefan Thaler, Ioan
Toma
v 0.6
19.06.2012
Abstract
1. Introduction
2. Problem definition
3. Reference architecture
3.1. Semantic layer/domain knowledge
3.2. Separation of components
3.3. Data and content storage
3.4. The weaving process in general
4. Reference implementation
4.1. Content Management System
4.1.1. Domain and task specific UI
4.1.2. Workflow engine and communication patterns
4.1.3. Export of RDF data (OWLIM Integration)
4.1.4. The Weaving Process within the CMS
4.1.4.1. Publication in CMS
4.1.4.2. Feedback collection in CMS
4.1.4.3. Statistics collection in CMS
4.2. dacodi
4.2.1. The Weaving Process within dacodi
4.2.1.1. Common Weaver Model
4.2.1.2. Publication in dacodi
4.2.1.3. Feedback Collection in dacodi
4.2.1.4. Statistics Collection in dacodi
4.2.3. Adapters
1
Abstract
The Semantic Communication Engine Innsbruck (SCEI) is a fully fledged online communication
software suite. It supports users in online communication, gathering feedback and measuring
online impact. The software contains workflow assistance as well as communication patterns
supporting the planning and execution of online campaigns. Furthermore we look into
possibilities of integrating crowdsourcing support like for example the translation of texts into
foreign languages. In particular, we enable fast and easy one-click publishing and collection
of content on a multitude of marketing channels, hiding technology complexity behind a user-
friendly interface, and directly reflecting on the impact within online communities and web
presence. The core idea of our approach is to introduce a semantic layer on top of the various
Internet based communication channels that is domain specific (e.g. tourism, hotels, marketing,
agencies, etc.) and not channel specific. This document describes the overall technical
architecture of the multi channel management and communication software. Furthermore we
present motivation, some use cases and architectural diagrams that outline the implementation
details.
Note: In this version of the document we mainly focus on the publication, feedback and
statistics core functionality of SCEI together with an overview of the semantic layer. In
later versions we will also cover workflow capabilities, communication patterns as well
as the crowdsourcing components.
2
1. Introduction
Today’s online world is more than ever driven by the fast paced exchange of information.
The rise of Facebook, YouTube and others resulted in a notable shift how companies and
individuals share and exchange information. These social media platforms and online services
enable everyone to interact with a huge, already established user base. While a “traditional”
online presence in form of a company or personal web page is still relevant, the inherent
recommendation mechanics of social media platforms are beneficial to reach potential
customers. However, online information is not exclusively for human consumption. Using
semantic technologies information can be enriched with metadata, making it readable for
machines as well.
This inherent difference of the traditional, social, and machine readable way to make
information available is essential in regards to how the information is treated.
While a traditional web page has many advantages in terms of content ownership and freedom
of presentation, there are usually limited metrics which indicate if the presented information
was appreciated by a visitor. Unless special mechanisms (such as a rating or feedback system)
are implemented, visitor numbers, as well as geographic data are the only metrics available.
Social media platforms, on the other hand, provide very simple feedback mechanisms which
are usually unobtrusive. Further, communication is encouraged by providing an easy way to
exchange messages between users. The emphasis on feedback and interaction is the main
difference of how information is treated on a social media platforms as opposed to traditional
web pages. Analyzing the accumulated feedback is a useful indicator to see if the published
information was received well by the audience or not. Even more broadly this enables the
steering of brand perception and things like a holistic online reputation management and
customer relationship management.
Traditional web pages and social media platforms concentrate on humans as their main data
consumers. The rise of various services (such as web services or mobile applications) and
publishing methods such as linked data provide an incentive to present the data in machine
readable form as well.
Our goal is to develop a set of tools which combines the traditional, social and machine
readable way to interact with information and makes this process easier than it is with existing
tools. To reach this goal we will develop a unified layer - dacodi - which is able to interact with
social media platforms, and extend existing content management systems (starting with Drupal)
to incorporate the social and machine readable aspect into existing solutions. Additionally we
will develop support for defining workflows as well as identify communication patterns that help
in the planning, execution and controlling of online campaigns. What differentiates our solution
is the introduction of a semantic layer that abstracts information items and underlying concepts
from the concrete channels that the user wants to manage. This semantic layer is specific to
3
a domain (e.g. Hotels, Restaurants, Doctors, Event managers, ...) which shall enable users to
work from a conceptual view rather than a channel view. Throughout this paper those software
components are referred to as the Semantic Communication Engine Innsbruck, or SCEI.
Figure 1: SCEI conceptual overview
The aforementioned differentiation (traditional, social, machine readable) allows us to separate
responsibilities of our software components, making the whole SCEI modular, which results in
higher efficiency, robustness and scalability. Obviously, this separation is not strict and the three
variants can overlap. Certain types of web pages combine various technologies and paradigms
which do not allow a strict classification. However, this three-fold separation is not meant as a
classification of current online information. Its purpose is to define the types of information with
which the SCEI interacts.
The aim of this document is to introduce our technical solution to the problem of online multi
channel management. Before defining the general problem in detail, we are going to agree
upon certain terms to define a clear terminology. After the problem description in Section 2 we
will present the high level approach of our solution i.e. the reference architecture in Section
3 followed by a more detailed and technical look into the software components i.e. reference
architecture in Section 4. This will comprise the separation of the system into two big parts, the
CMS and dacodi components, the introduction of our content and channels merging approach
achieved through something we call a weaving process as well as its impacts on publication,
4
feedback and statistics collection. Also we will introduce a Common Weaver Model which
enables scalability of the system by exploiting the fact that similar channels have common
characteristics.
Terminology
We define the following terms in order to establish a common understanding of the topic:
● Communication1 is the activity of conveying information according to Wikipedia.
Communication requires a sender, a message (an object of communication, information
or a form of information), channel (the medium) and an intended recipient. Bi directional
communication is underlies a broad process model that often starts with a publication
or broadcasting activity which can be followed by feedback, that again often triggers
the exchange of further information afterwards or even leads to engagement in long
conversation.
● Dissemination2 is the act of broadcasting content to the public without direct feedback
from the audience.
● Content Management is the set of processes and technologies that support the
collection, managing, and publishing of information in any form and medium. Digital
content may take the form of text, multimedia files or any other file type which follows a
content life cycle that requires management.
● RDFa is a W3C Recommendation that allows embedding of RDF statements into
XHTML documents, HTML4 and HTML5.
● Microdata is a similar approach to RDFa. It allows to embed semantics into existing
HTML content. Microdata aims to be simpler than RDFa and plays a major role in search
engine optimization (SEO).
● A Channel is a means of transporting a message, therefore a medium. When in our
definition an online channel does NOT equal a full communication platform. Potentially
every URI is a channel. For example a HTML page within a larger website can be a
channel.
● A Platform is a collection or a group of channels. For example Facebook is not one
channel, but a collection of multiple channels e.g. the Facebook wall being one of them.
A Platform allows access to more than one communication channel, (e.g. video, text,
image).
● Pull channels are channels that actively gather data from a predefined source. A
homepage (single html site) or Wiki page for example requests data from a server.
These data sources can be many fold (e.g. a semantic repository) but the procedure is
always the same: system pulls information from a source. For example also a Linked
Data endpoint can be queried using SPARQL and the extracted information can be
transformed, reused, etc. If we have the direct control over the underlying data we can
semantically annotate it using technologies like RDFa and microdata.
● Push channels are channels to which information has to be explicitly sent to. These
channels include email, bulletin boards and Web 2.0 platforms. None of these channels
actively gather information from external data sources. This means if we want to
distribute information to such channels we have to actively push it to the correct one.
Also due to the fact that the user usually does not have full control over the data pool
1 http://en.wikipedia.org/w/index.php?title=Communication&oldid=480484048
2 http://en.wikipedia.org/w/index.php?title=Dissemination&oldid=458980901
5
and storage it is not possible to control semantic annotation of for example a tweet or
facebook post.
● Information Item is the entity to be published. An information item may be semantically
enriched and thus described by an underlying concept. Viewed from the syntactic side
an information item is represented as XHTML with the possibility of RDFa annotations.
● User in our terminology is an agent (human or software solution) who executes a task
related to online communication.
● Adapters in dacodi are used to provide uniform access to all communication channels.
They are the linking part between the actual communication channel (e.g. Facebook API
for wall posts) and dacodi. We distinguish between two types of adapters: publishing and
retrieval.
2. Problem definition
After introducing the topic we now focus on defining the resulting problems that have to be
overcome if one wants to reach scalable and efficient online communication. On a high level,
the general problem is the following: A user has content that he wants to make accessible to
others. This content can either be published as static content on a traditional web page, as
a “status update” or something comparable on a social media platform, or as RDF triples in a
triple store or the Linked Open Data cloud. If the user desires to utilize multiple outlets - to reach
a wider and more diverse audience for example - the content has to be published on multiple
places. However, currently publication in multiple places results in duplicate effort and manual
labor.
While the problem may be very similar conceptually, publication on different channels works
differently if looked at the technical details. These are not mere technicalities or minor
differences however. When it comes to, for example, ownership of the published data, the
differences between a traditional web page and a social media platform are major. This shift of
ownership/responsibility implies further differences in regards to what operations are possible
on published data, e.g. modification and deletion of already published content.
A homepage is in most cases intended to be world readable without restrictions, whereas
social networks can be quite restrictive and make content only consumable for registered
customers. Additionally a homepage should be structured well in order to enable the user to
quickly discover the information needed. In most social networks recent content is automatically
delivered to a user in his stream.
The traditional Web and especially the Web 2.0 (Social Web) are becoming an inseparable
part in identity creation and represent a key medium for companies to communicate with
existing and potential customers. However, the opportunities for companies of leveraging
Web technologies for attracting more site visitors and reaching more target-group users is
accompanied by a number of challenges. These include as stated before technical difficulties,
but more importantly, handing the growing number and diversity of social platforms, specialized
6
news web pages, blogs, discussion forums and messaging services. We address these
hindrances by providing innovative marketing communication and impact-measurement
solutions. In particular, we offer the first product that employs semantics for creating a level of
abstraction over all communication channels, thus supporting the recommendation of suitable
channels and simultaneous publishing of content. In particular, we base the tool development
on four main approaches for handling complexity and reducing the amount of manual effort:
● Description of communication channels’ capabilities which is implicitly given through
clustering of channels into groups with similar functionality
● Semantic representation of the customer’s domain Information. SCEI makes use of
semantic annotations, which can refer to a domain ontology. These annotations are
useful for other services, as well as for the publication component of the tool (i.e. dacodi)
and play an important role in search engine optimisation.
● Channel recommendation
● Content transformation to fit a particular channel
Content distribution and feedback monitoring in various channels is a manual and labor
intensive task. Take, for example, video upload. A user has to upload a video on potentially
multiple platforms (YouTube, Vimeo, Facebook video), copy and paste the video title/description
and enter tags manually. After the upload process, the user may want to notice his clients via
social networks about the new video. Thus, the video link has to be copied and posted as a
status update. Further, a short description alongside the link would be beneficial, which has
again to be written or copied. Our tool wants to eliminate any non-automate-able manual labor
in this and similar processes.
The resulting software product saves time and hides technology specifics behind an easy-to-use
interface, enabling a flexible and scalable multi-channel communication strategy. Furthermore,
the tool also uses different metrics to statistically capture and analyze the online reach and
impact, providing means for evaluating the online marketing strategy but also to conduct
reputation management by timely reacting especially to negative posts and feedback.
3. Reference architecture
In order to solve the problems mentioned above, this section explains our solution on a
conceptual level.
The central element of our approach is the separation of content and online channels. This
allows reusing the same content for various communication means. Through this reuse we
want to achieve scalability of multi-channel communication. The explicit modeling of content
independent from specific channels also adds a second element of reuse: Similar operational
entities active in the same domain can reuse significant parts of such a content model.
Separating content from channels also requires the explicit alignment of both. This is achieved
through a weaving process.
Figure 2 shows the SCEI high level, reference architecture. The following Sections give more
details about reference architecture, its components, where the content generated by the user is
7
stored and the above described weaving process in general.
Figure 2: SCEI reference architecture
3.1. Semantic layer/domain knowledge
In order to abstract the domain specific communication from the actual channels, thus lifting the
distribution and data collection in channels to an upper conceptual layer, we need semantics
on top of our solution. This layer on the one hand captures data in domain specific ontologies
on the other hand describes the various communication channels. In order to interweave the
domain specific concepts with the underlying communication channels we propose a weaving
process which will be explained in more detail within this document. In the end this semantic
layer will smartly decide which kind of content is distributed to which channel in which form.
Let’s take for example a hotelier who wants to build up or extend the online presence of his or
her business. First of all there is a need to know all relevant channels which reflect the target
group. This list can include things like a homepage, mailing lists, fora or social networks. After
knowing the available channels, accounts have to be created on some selected platforms.
Additionally to that, a hotelier has to be present in various rating and review sites in order to
maximize business opportunities. These channels can be manifold and it is extremely hard to
keep an overview of what’s going on in these channels without technical assistance i.e. a tool
that distributes and aggregates all channels in a single interface. However technical details, as
well as emerging channels shall be integrated quickly and transparently. So the end user needs
to work on a level he or she understands i.e. a domain specific layer with concepts well known
in the industry sector instead of handling each channel separately.
8
3.2. Separation of components
The STI online communication tool is split into a set of components that can be conceptually
grouped in two major parts, namely:
1. The content management system (CMS), together with the domain and task specific
interfaces and the workflow engine and communication patterns component
2. The data and content distribution component, responsible mainly for the Web 2.0
communication (i.e. data distribution to and feedback collection in push channels).
Obviously this separation is needed for satisfying the different requirements of push and pull
channels due to the contradicting nature of these two approaches and their application by
different existing channels. One must however note that both paradigms have in common
that multi-directional communication (conversations between multiple users) can occur and
often statistical information can be extracted from the channels. Also through this component
separation we guarantee maximum scalability, allow easy adaptation to multiple use cases and
simplify the integration with the seekda hotel booking solution as well as other 3rd party apps.
Another main motivation of this separation is to have a single layer which unifies social media
platforms (Web 2.0 channels) - namely the data and content distribution component. This
enables easy integration by providing a single common interface, as well as the possibility
of external use, as mentioned above. Another approach would be to integrate everything
in the CMS of your choice, thus disregarding any possibility of loose coupling, reuse and a
component-based architecture.
On each data change in the CMS another module sends the newly created/updated content to
the data and content distribution component API. This loosely coupled approach allows an easy
exchange of the CMS part and makes the data and content distribution component independent
from current content management solutions. The data and content distribution component API
makes it also possible to create use case specific interfaces for data and content distribution
component (e.g. enables white labeling) and quickly integrate it into 3rd party applications in use
cases where the “heavy weight” CMS part including things like content hosting is not needed.
Following use cases were defined to show the advantage of such a flexible architecture:
● A hotelier does not want to change his existing homepage infrastructure and CMS but
nevertheless profits from addressing multiple Web 2.0, e-mail and rating channels via
the data and content distribution component. Setup and usage of the software must be
easy in order to be performed by an averagely skilled user. Here the communication with
the customer, including engagement in conversations via the tool, is the primary focus
of the user. Such a tool can be offered with a low pricing scheme since the data and
content distribution component does mainly the content distribution and does not have to
care about site hosting and content per se.
● The dissemination partner of an EU project needs a fully fledged, out of the box solution
to address all important channels at once. The CMS with semantic data export in
combination with data and content distribution component enables this. The initial
9
setup is however a non trivial task since homepage structure and the links to LOD
vocabularies and other ontologies must be created. However we can expect a more
technically skilled user to operate on this full package.
● A marketing agency with a multitude of different customers faces several other
problems. Each customer wants fully fledged offline and online presence in multiple
channels. SCEI is very flexible within this regard. It is possible for them to maintain the
full Web 2.0 presence of a customer via the data and content distribution component and
if needed to also provide their customers with a state-of-the-art CMS solution.
In Figure 2 we provide a high level overview of all SCEI components and actors which are the
following:
● User: Person that operates the software and works on the level of information items
rather than channels. We distinguish between several specific user roles, namely:
○ Content creator: Person that generates the content of the items to be
disseminated.
○ Workflow designer: Person that define communication patterns and workflows
involving communication, multi-channel publishing and social media monitoring
within an organization (e.g. a hotel business)
● Information Item: Content that traverses the system and is stored, distributed and
transformed within the process.
● CMS: The content management system (in our case Drupal 7.x) exposes the user
interface to the user as well as HTML in the form of a website, accessible via the Web.
○ Domain and Task specific UI: Dependent on the application domain, user, task
and role the user interface adapts itself and shows easily accessible all relevant
information.
○ Workflow Engine/Communication patterns: In order to support publication
and controlling workflows SCEI contains an engine supporting such. Well known
communication patterns help in these workflows, their planning and execution.
○ RDFa annotation: Enriches the information item with semantic metadata in
order to export it to an RDF repository as well as easen the distribution via the
data and content distribution component, because the tool understands the
meaning of the information item instead of just the structure.
○ Scheduling: Contains rules about delayed or recurrent publication of the
information item.
○ DB: Database which stores the actual content.
○ RDF export plugin: Exports all information item for the DB to an external
repository.
● Semantic repository: External RDF repository which exposes all information items via
a SPARQL endpoint. It also contains the domain and channel models and makes this
information accessible to both the CMS and data and content distribution component.
● Data and content distribution: Distributes content in and aggregates information from
all push channels.
○ API: Makes the data and content distribution component accessible via the CMS
10
as well as other 3rd party applications which makes this part integratable in
external software solutions. Receives HTML which can be additionally enriched
with RDFa annotations.
○ DB: Database stores references to the information items and their representation
in the different channels.
○ Publishing Module: Is responsible for distribution the information item in
different channels.
■ Content Extractor. Analyzes the HTML coming from the API and
extracts all relevant information.
■ Concept to channel mapper: Decides which part of the information item
will be pushed to which channel.
■ Content Transformer: Transforms the content in order to fit the channels
e.g. shortens a text to 140 characters for Twitter publication.
■ Scheduling: Contains rules about delayed or recurrent publication of the
information item.
○ Statistics module: Collects and stores all valuable statistical information coming
from the various channels e.g. site visits, number of views, and such.
■ Item Analyzer: Handles statistics of an information item in various
channels.
■ Channel Analyzer: Handles statistics coming from a specific channel
regarding all information items published within.
○ Engagement Module: Is responsible for direct interaction in various channels.
■ Feedback Collector: Gathers feedback form all channels in order to
present it centrally. This can be for example comments or reviews.
■ Interaction Component: Enables to react to the gathered feedback. For
example to reply on online comments.
○ Impact Analyser Module: Figures out which impact publications have had.
■ Impact Analyser: Specialized form of statistics that try to figure out
how to efficiently leave impact in the online world. We differentiate here
between real impact, based on active publications, as well as potential
impact, meaning how many people the user can potentially reach in the
various channels, given a limited amount of e.g. friends or subscribers.
3.3. Data and content storage
The CMS actually stores data and content (for example pictures) in its internal database.
References, meaning links, to these data items can be found in the website’s HTML code as
well as the exported RDF triples. The data and content distribution component, on the other
hand, is not meant as a content hosting solution and therefore does not store all content and
data. The only exception where the data and content distribution component stores data like
images and videos, although temporarily, is when publishing is delayed by the scheduling
mechanism.
We distinguish between dynamic and static information publishing. With static information we
11
refer to a “distributed profile” in all Web 2.0 channel that can be changed at once. Such a profile
contains things like contact information or a representative picture, in short things that should
not change frequently and are valid without temporal constraints. Dynamic information are
things that will be pushed to e.g. news feeds and represent information at a certain point in time.
For such publications the data and content distribution component only stores a reference to
which channels content was distributed and a textual description of the content, so that it can
later be identified by the user and specific feedback can be assigned to it. Acting mainly as a
speaking tube, the data and content distribution component provides a lightweight and scalable
solution.
3.4. The weaving process in general
As mentioned in the previous Section, the general problem is one of content distribution and
feedback collection. We define a “weaving” process to formalize the steps necessary to solve
this problem. In general, this process can be broken down as follows:
1. Content input
2. Selection of publication channels
3. Content adaptation
4. Publication
5. Collection of feedback
6. Collection of statistics
4. Reference implementation
We provide a reference implementation for the reference architecture presented in Section
3. The reference implementation is outlined in Figure 3 and is splitted into two major parts:
a content management system (CMS) part based on Drupal 7.x, and the data and content
distribution in-house implementation called dacodi. Additionally, a external semantic repository,
namely OWLIM, is used to save content. The rest of this section provides the technical details
on the reference implementation.
12
Figure 3: SCEI reference implementation
4.1. Content Management System
As basis for our CMS solution we use Drupal 7.x. The reason is its native RDF support, the
availability of additional semantic modules, such as a SPARQL endpoint or microdata export
and the possibility of third-party module development.
The publication of new or the updating of existing content (information item) starts with the
responsible person creating or changing one piece of information. This process is handled
by the underlying CMS. If necessary, scheduling information could be provided to postpone
the publication. After a successful change the content is saved to the external OWLIM
repository and sent to the dacodi API. The CMS utilizes dacodi to extend its content distribution
capabilities. Likewise the CMS acts as a specialized kind of user interface from the dacodi
viewpoint. In the following we will further outline how RDF data is exported by the CMS and
how the first part of the weaving process works. The second part of the weaving process will be
described in the dacodi Section of this document.
4.1.1. Domain and task specific UI
13
The Domain and task specific user interfaces are the components through which the content
users are directly interacting with our system. They are sub-components of and directly
implemented using the CMS.
The design and look-and-feel of these components are very much adapted to the mind setting
of the user, supporting them to specify content in a terminology that is familiar to them. For
example hoteliers will specify content items that they want to be disseminated in terms of offers,
touristic packages, etc. The domain and task specific user interfaces support thus information
dissemination abstraction based on the concrete domain, independent of the channel(s) of
dissemination.
The domain and task specific user interfaces also allow the user to manage and solve task
specific activities including yield, brand and reputation management, customer relation
management and online advertising.
4.1.2. Workflow engine and communication patterns
In order to support the user we offer a workflow engine together with support for communication
patterns. This component enables user to define and manage complex workflows on top of
the communication, multi-channel publishing and social media monitoring underlying SCEI
components. Such workflows have usually a long lifespan and involve multiple employees
working together on improving the visibility, reputation and communication of an entity.
The workflow engine and communication patterns component can be used to manage the
communication workflow including assigning, tracking and responding to user feedback. Using
this component one can define and manage steps and protocols to be activated when certain
events related to the published information occur., e.g. a bad comment on a post in Facebook
is written. Take for example a hotelier. Using the workflow engine and communication patterns
component, the hotelier can specify and manage when and which of his employes, depending
on his availability, are taking care of responding to customers posts on various channels about
his hotel, or engage with customers to present them new hotel offers.
4.1.3. Export of RDF data (OWLIM Integration)
The export of the CMS content to an external triplestore repository allows the publication of
the website data as a bubble in the linked data cloud. The consistency of the two databases
is guaranteed with the help of hooks that are triggered by the CMS on each add, updated
and delete operation. Hooks are functions that allow to intercept the CMS internal workflow.
After an operation was successfully executed the RDF export plug-in creates triples and uses
the Sesame REST API to add or change the content in OWLIM. For semantically annotating
(RDFa and microdata) the content on the homepage exposed by the underlying CMS we use
the Drupal internal database since available Drupal modules already enable this annotation
functionality. The OWLIM repository mainly serves as linked data SPARQL endpoint.
As seen in Figure 3, the changes to the CMS are not intrusive, since the added functionality is
provided by plug-ins. Two additional plug-ins need to be written: One for the OWLIM integration,
and one for dacodi.
4.1.4. The Weaving Process within the CMS
In regards to a content management system, the weaving process looks as follows:
1. Content input
The content is entered in the CMS, directly by the user of the CMS.
2. Selection of publication channels
14
Where should the document be published, in regards to the internal document tree of the
CMS. If a distribution to social media platforms via dacodi is desired, Web 2.0 channels
can be selected as well.
3. Content adaptation
Content adaptation is not necessary for the CMS, since there are no content restrictions.
4. Publication
The document is published in the CMS and - if desired - as triples in the LOD cloud. The
information item will be passed to dacodi during the publication phase, along with the
previously selected Web 2.0 channels.
5. Collection of feedback
Direct user feedback, like comments, shares, retweets, etc. is gathered by dacodi
and can be queried by the CMS using the dacodi API (see more in Section Feedback
Collection in dacodi).
6. Collection of statistics
Collection of visitor numbers and demographic data can be done via a tool like Google
Analytics or the open source solution PIWIK.
The publication as triples in the LOD cloud (or to any external triplestore), as mentioned in step
3 of the weaving process, is done by a plugin which integrates OWLIM in the CMS.
4.1.4.1. Publication in CMS
The CMS component enables the publication of content on a homepage. It also provides
functionality to annotate the website’s HTML with RDF data and export these RDF data as a
whole in order to make it machine understandable.
4.1.4.2. Feedback collection in CMS
Feedback from the CMS can come from various sources like for example an internal
commenting or rating system.
4.1.4.3. Statistics collection in CMS
Statistics within the CMS can come from various sources like Google Analytics or PIWIK for
analyzing page visits or internal comment and feedback systems.
In the following Section we will explain how dacodi is able to distribute content in multiple
channels.
4.2. dacodi
The dacodi component is used to distribute information in various Web 2.0 and email
channels, as well as collect and analyze feedback from those channels and actively engage
in conversations (i.e. reply to comments). Central to dacodi is the weaving process, which
enables channel selection based on the semantics of the information item to be distributed and
content transformation based on these channels. If manual effort is necessary, for example for
entering content in a certain spot to a wiki system, the content can be sent to the responsible
webmaster via e-mail. We will describe how the weaving process works within dacodi and
how the component interacts with online channels using certain Adapters for publication and
feedback and statistics collection.
15
4.2.1. The Weaving Process within dacodi
The ultimate goal of the weaving process is the semi-automated publication of the information
item in fitting channels, including necessary transformations, based on the information type.
Thus, the weaving process can be broken down in the following steps:
1. Content input
In the case of dacodi, this equals the acquisition of the information item; either through
the API (coming e.g. from the CMS) or a dedicated user interface.
2. Selection of publication channels
Selection of appropriate Web 2.0 channels based on the information type.
3. Content adaptation
Transformation of the information item into a Common Weaver Model (CWM) instance.
4. Publication
Publication of the (transformed) information item in the selected channels.
5. Collection of feedback
Feedback collection via the APIs of the used channels.
6. Collection of statistics
Statistics collection via the APIs of the used channels.
We will discuss the steps necessary, channel selection and content transformation, for the
weaving process in the following Subsections. Afterwards we will explain in detail the Common
Weaver Model which is part of the content transformation component and specific to dacodi but
not the CMS part.
Channel Selection
Based on the information item type (e.g. a business event), a fitting channel for the information
item will be selected (e.g. business event is announced on LinkedIn but not Facebook). The
central component of the channel selection process is the (Concept-to-Channel) Mapper, which
maps each concept to the appropriate channels.
Consequently the Mapper gets a concept as input, and gives back a list of channels which
are relevant for the concept. The Mapper of the prototype implementation uses a static
mapping which maps every concept to a list of channels. Due to the modular architecture of the
application, the mapper component can be easily replaced with a more sophisticated, dynamic
approach. It would be possible, for example, to implement a Mapper that incrementally learns
from user adjustments and thus alters the channel mappings based on the users needs.
Transformation
For every channel the information item has to fit in, a transformation is necessary. For example:
A business event might include fields such as short title, long title, description, start date, end
date, location and venue. Further, there might be an accompanying image which represents
the event - like a poster. A channel which only takes short text messages (Twitter, for example)
can’t handle all those fields. Thus, the information item has to be transformed into something
what we call a Common Weaver Model instance (CWM).
To expand on the previous example, one could think of combining the most important
16
information of a business event (short title, start date, end date, location and venue) into a string
which fits the channel’s restrictions - Twitter’s 140 characters, for example. The transformer
component defines what transformations are necessary to go from Information Item to Common
Weaver Model instance.
4.2.1.1. Common Weaver Model
The Common Weaver Model3 (CWM) exploits the fact that similar channels have common
characteristics. For example: Facebook status updates and Twitter enable the user to share
short text messages in form of status updates. YouTube, Vimeo and Facebook video enable the
user to upload and share videos. After looking at various Web 2.0 channels, we have identified
the following Common Weaver Models:
● Text: A String of varying length. Online communication as it is today relies heavily on
the exchange of short text messages. In essence, those messages are simply Strings
of varying length. Depending on the platform, such text messages can be between 140
(Twitter) and many thousand characters (63,206 in the case of Facebook).
● Link: A common hyperlink denoted by the <link /> or <a /> HTML element.
● Image: A two-dimensional image denoted by the <img /> HTML element. While support
may differ depending on the Channel, possible Internet media types include: gif, jpeg,
png, svg, tiff.
● Video: A video file. While support may differ depending on the Channel, possible
Internet media types include MPEG-1 video with multiplexed audio, MP4 video, Ogg
Theora video, QuickTime video, WebM Matroska-based open media format, Matroska
open media format, Windows Media Video.
● Presentation: A presentation file. We want to support this type - and thus related Web
2.0 platforms like slideshare - in future version. Not supported in the prototype.
● Audio: An audio file. Not supported in the prototype.
During the weaving process instances of those models will be extracted from the information
item and send to the selected publishing adapters. Each Common Weaver Model instance is
stored internally using a unique identifier and grouped by the information item to which it is
related.
These CWM instances are extracted from an information item. The granularity of the extraction
depends on the information item which is to be published. For example: if the user simply wants
to publish a single link in various channels, it makes sense to extract the link and publish it. On
the other hand, if a more complex information item contains dozens of links, it does not make
sense to extract and publish every link (this would equal annoying spamming), unless the user
explicitly wants to do so.
3 The model in Common Weaver Model refers to a model from a software engineering point of view, as in
MVC (Model-View-Controller). A model manages the behavior and data of the application domain.
17
Figure 4: Extraction of Common Weaver Model (CWM) instances from an information item.
When an information item is published, e.g. a business event, CWM instances are extracted.
Expanding on the business event example introduced in the Transformation section: If the
business event includes an image, it will be extracted and published to fitting image channels
like Flickr. The essential information about the event can be combined in a string format and
published via text channels, such as Twitter, Xing or Facebook. Since every CWM instance
knows from which information item it was extracted, a link to the original information item (in this
example: the business event) can be embedded, e.g. in the description of the image.
4.2.1.2. Publication in dacodi
The publisher module takes care of two things: publication of the information item (in this stage
of the weaving process represented as Common Weaver Model instances) using adapters, and
scheduling.
We plan to support scheduling in two ways: delayed publication and repeated publication. For
example: Delayed publication can be used to announce an event or a special offer at a specific
time, whereas repeated publication may be used to send reminders (e.g. for a call for papers) in
all channels.
4.2.1.3. Feedback Collection in dacodi
Every Information Item that is published by dacodi is tracked by the system, to provide statistical
information and a per-channel impact analysis. This feature allows the user to see how well the
18
published information item was received, without having to check every channel individually.
Feedback
Basically there are three forms of feedback that are supported by various Web 2.0 platforms
and thus relevant for dacodi:
● Unary feedback. Any feedback that is a predefined, positive feedback. Examples: “like”
on Facebook, “retweet” on Twitter, “favourite” on flickr, “favourite” on YouTube, etc.
● Binary feedback. Any feedback that is a predefined, positive or negative feedback.
Example: thumbs up/down on YouTube.
● Rating/ranking. Feedback that can be quantified on a discrete scale. Example: star
rating on a hotel review platform.
● Textual feedback. Any feedback that is user-created, in form of replies, comments or
any other form of written feedback. NLP techniques can be used to analyze the textual
feedback to provide the user with additional information, i.e. if the comment/reply was a
positive or a negative one. A user can directly react to textual feedback within dacodi if
the underlying channel allows this functionality.
4.2.1.4. Statistics Collection in dacodi
There are several statistical metrics that are relevant for the user. While the unary, binary,
ranking and textual feedback is centered on the information item, statistics are relevant on a
per-item and per-channel basis. Examples:
● Amount of unary, binary, rating, and textual feedback per information item (this includes
features like “most discussed information item”, i.e. the information item with the most
textual feedback).
● Number of information items published in each channel over a certain amount of time
(day, week, month, year).
● Calculation of a combined impact metric per channel, based on feedback analysis of the
information items published in the channel.
4.2.3. Adapters
As mentioned before, we distinguish between two types of adapters: publishing and retrieval.
The purpose of a publishing adapter is to publish an information item in a certain channel.
Retrieval adapters are used to gather information about already published information items.
Since the APIs as well as the offered functionality differ from channel to channel (e.g. Twitter’s
and Facebook’s API differ) a separate adapter for each channel needs to be written.
In our prototype we intend to create publishing and retrieval adapters for the following platforms:
YouTube, Facebook and Twitter. All three of them have a Web API and cover a majority of the
features we want to realize, such as publishing videos, texts, images and links. This is a starting
point for implementing new adapters that provide similar functionality.
We have identified the following, channel specific features that each adapter has to be able to
handle:
●Mapping CWM properties to appropriate properties in the published
communication channel. For example, a Tweet post’s text property is called ‘text’
19
whereas a Facebook post’s text property is named ‘message’. Since we have to
implement an adapter for each communication channel we want to address we will
implement this by a simple mapping routine in each adapter.
● Authentication and authorization to the communication channel. Most Web
2.0 communication channels rely on OAuth / OAuth24 to realize authorization and
authentication. However, some of them rely on OpenId5, basic HTTP authentication or
other form-based authentication mechanisms to restrict user access. The adapter has to
be able deal with these individual mechanisms and has to store and load the credentials
of each users.
● Publish a specific CWM instance. As mentioned above, the publishing process varies
from platform to platforms thus this functionality has to be abstracted. This holds also
true for retrieving feedback from different platforms.
Adapter loading and naming conventions
We designed our adapters and adapter loading mechanisms to achieve the following three
goals:
● No adapter duplication. The same functionality should be achieved with the same
code. (Minimize codebase, achieve simplest possible code base).
● A common adapter structure for all platforms. Platforms are differently structured.
However, for the clarity of the dacodi we want a uniform way adapters are integrated
in the system. Adding the same functionality (e.g. adding an image channel) should be
achieved in a similar manner in all platforms.
● Automatic adapter loading and execution. There should be no manual effort involved
in adding a new adapter to the system, except for programming the adapter.
To achieve these three goals, we designed the system carefully, introduced some naming
conventions and loading conventions for our adapters. These are described in the following
sections.
Figure 5 depicts the motivation for our design. The illustration sketches that social media
platforms offer more than one different way to publish information. Additionally, each user
account on this platform allows access another, similar set of communication channels, e.g.
when one has two accounts on Twitter, they have a duplication of all available communication
channels on Twitter, say two text-channels, two image channels and so on. The difference
between those channels are merely the user credentials that are used to authenticate for
the post. Different platforms allow to post similar common weaver items, though. Adhering
to the SRP software development principle6 we chose to write an adapter for each explicit
communication channel in each platform individually. This - together with a file naming
convention - also allows us to automatically load and execute adapter classes, without having
to change configuration files or any additional manual effort. If an adapter class is not found this
channel is simply not supported.
4 http://en.wikipedia.org/wiki/Oauth
5 http://en.wikipedia.org/wiki/Openid
6 http://en.wikipedia.org/wiki/Single_responsibility_principle
20
Figure 5: Platform as channel groups offering multiple ways to publish information
We named the components that define a communication channel in dacodi: a channel, a
platform and user credentials. In detail, they are:
● Channel Type: Is a virtual grouping of channels that allow publishing the same common
weaver items, e.g. image, video or text. This is depicted in Figure 5. It is virtual, because
it is split up into many different adapters to many different platforms but is accessed in a
uniform way nonetheless.
● Platform: Is a grouping of channels that have the same user credentials. An example
for a platform or channel group is Facebook. The notion of the channel group has been
introduced since a platform such as Twitter or Facebook actually allow access to more
than one communication channel, (e.g. video, text, image)
● User credentials: This is the information needed to authenticate/authorize a client to
a certain platform and associate it with a certain account. In dacodi user credentials
contain the following information: an account id which associates a channel group with
a user (i.e. the Facebook account id 1234 with the dacodi user 27), the authorization
token- and secret which store information that is required completing actions in a
platform such as posting (i.e. an OAuth2_token associated with the account or a
password), the consumer key and consumer secret, which contain information about the
application that is about to publish (you can think of it as the authentication of dacodi
to guarantee the platform that dacodi is actually itself publishing on the user's behalf).
The notion of user credentials have been introduced, since a user may have multiple
accounts on one platform.
21

Contenu connexe

Tendances

An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semanticijasa
 
Recsys virtual-profiles
Recsys virtual-profilesRecsys virtual-profiles
Recsys virtual-profilesHaishan Liu
 
Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...
Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...
Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...Forschungsgruppe Kooperationssysteme
 
2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvana2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvanaMarc Smith
 
OpenAthens and the future of access and identity management
OpenAthens and the future of access and identity managementOpenAthens and the future of access and identity management
OpenAthens and the future of access and identity managementEduserv Foundation
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek KoreaSlawek
 
Measuring privacy in online social
Measuring privacy in online socialMeasuring privacy in online social
Measuring privacy in online socialijsptm
 
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...ijtsrd
 
Social capital questionnaire 2013 tsn
Social capital questionnaire 2013 tsnSocial capital questionnaire 2013 tsn
Social capital questionnaire 2013 tsnTeachSocialNetworks
 
Policy resolution of shared data in online social networks
Policy resolution of shared data in online social networks Policy resolution of shared data in online social networks
Policy resolution of shared data in online social networks IJECEIAES
 
A Survey on Trust Inference Network for Personalized Use from Online Data Rating
A Survey on Trust Inference Network for Personalized Use from Online Data RatingA Survey on Trust Inference Network for Personalized Use from Online Data Rating
A Survey on Trust Inference Network for Personalized Use from Online Data RatingIRJET Journal
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...ijcseit
 
Setting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social NetworksSetting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social Networksvia fCh
 

Tendances (19)

An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
Recsys virtual-profiles
Recsys virtual-profilesRecsys virtual-profiles
Recsys virtual-profiles
 
Knowing your public
Knowing your publicKnowing your public
Knowing your public
 
Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...
Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...
Acis 2009 Richter Riemer - Corporate Social Networking Sites Modes of Use an...
 
Social Messaging Solution Matrix
Social Messaging Solution MatrixSocial Messaging Solution Matrix
Social Messaging Solution Matrix
 
2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvana2009-Social computing-First steps to netviz nirvana
2009-Social computing-First steps to netviz nirvana
 
OpenAthens and the future of access and identity management
OpenAthens and the future of access and identity managementOpenAthens and the future of access and identity management
OpenAthens and the future of access and identity management
 
Ultra large scale systems to design interoperability
Ultra large scale systems to design interoperabilityUltra large scale systems to design interoperability
Ultra large scale systems to design interoperability
 
Slawek Korea
Slawek KoreaSlawek Korea
Slawek Korea
 
Measuring privacy in online social
Measuring privacy in online socialMeasuring privacy in online social
Measuring privacy in online social
 
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
 
Social capital questionnaire 2013 tsn
Social capital questionnaire 2013 tsnSocial capital questionnaire 2013 tsn
Social capital questionnaire 2013 tsn
 
Policy resolution of shared data in online social networks
Policy resolution of shared data in online social networks Policy resolution of shared data in online social networks
Policy resolution of shared data in online social networks
 
A Survey on Trust Inference Network for Personalized Use from Online Data Rating
A Survey on Trust Inference Network for Personalized Use from Online Data RatingA Survey on Trust Inference Network for Personalized Use from Online Data Rating
A Survey on Trust Inference Network for Personalized Use from Online Data Rating
 
International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...International Journal of Computer Science, Engineering and Information Techno...
International Journal of Computer Science, Engineering and Information Techno...
 
Setting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social NetworksSetting The Stage For Empirical Research In Virtual Social Networks
Setting The Stage For Empirical Research In Virtual Social Networks
 
Paper.uia.3112011
Paper.uia.3112011Paper.uia.3112011
Paper.uia.3112011
 
Web Politics 2.0
Web Politics 2.0Web Politics 2.0
Web Politics 2.0
 
Open data quality
Open data qualityOpen data quality
Open data quality
 

Similaire à Scei technical whitepaper-19.06.2012

Channel model data2012
Channel model data2012Channel model data2012
Channel model data2012STIinnsbruck
 
Effective and efficient on line communication dexa2012
Effective and efficient on line communication dexa2012Effective and efficient on line communication dexa2012
Effective and efficient on line communication dexa2012STIinnsbruck
 
A novel method for generating an elearning ontology
A novel method for generating an elearning ontologyA novel method for generating an elearning ontology
A novel method for generating an elearning ontologyIJDKP
 
Empowerment Technologies Quarter 3 Module 1
Empowerment Technologies Quarter 3 Module 1Empowerment Technologies Quarter 3 Module 1
Empowerment Technologies Quarter 3 Module 1SheilaBungalan1
 
A survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperabilityA survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperabilityunyil96
 
IRJET- Virtual Community Using Cloud Technology “Unitalk”
IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”
IRJET- Virtual Community Using Cloud Technology “Unitalk”IRJET Journal
 
Mooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseMooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseKaren Oliver
 
Knowledge management tools
Knowledge management toolsKnowledge management tools
Knowledge management toolsmohsen seyedi
 
Semantic Massage Addressing based on Social Cloud Actor's Interests
Semantic Massage Addressing based on Social Cloud Actor's InterestsSemantic Massage Addressing based on Social Cloud Actor's Interests
Semantic Massage Addressing based on Social Cloud Actor's InterestsCSCJournals
 
AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...
AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...
AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...ijcsit
 
Transforming knowledge management for climate action
Transforming knowledge management for climate action  Transforming knowledge management for climate action
Transforming knowledge management for climate action weADAPT
 
B2 b management and lead generation ieee paper
B2 b management and lead generation   ieee paperB2 b management and lead generation   ieee paper
B2 b management and lead generation ieee paperNilesh Thadani
 
Multi-Agent Architecture for Distributed IT GRC Platform
 Multi-Agent Architecture for Distributed IT GRC Platform Multi-Agent Architecture for Distributed IT GRC Platform
Multi-Agent Architecture for Distributed IT GRC PlatformIJCSIS Research Publications
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Amy W. Tang
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud ComputingCarmen Sanborn
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paperFiras Husseini
 
Semic 2012 highlights report
Semic 2012 highlights report Semic 2012 highlights report
Semic 2012 highlights report Semic.eu
 

Similaire à Scei technical whitepaper-19.06.2012 (20)

Paper 28
Paper 28Paper 28
Paper 28
 
Channel model data2012
Channel model data2012Channel model data2012
Channel model data2012
 
Effective and efficient on line communication dexa2012
Effective and efficient on line communication dexa2012Effective and efficient on line communication dexa2012
Effective and efficient on line communication dexa2012
 
A novel method for generating an elearning ontology
A novel method for generating an elearning ontologyA novel method for generating an elearning ontology
A novel method for generating an elearning ontology
 
Empowerment Technologies Quarter 3 Module 1
Empowerment Technologies Quarter 3 Module 1Empowerment Technologies Quarter 3 Module 1
Empowerment Technologies Quarter 3 Module 1
 
A survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperabilityA survey of techniques for achieving metadata interoperability
A survey of techniques for achieving metadata interoperability
 
IRJET- Virtual Community Using Cloud Technology “Unitalk”
IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”IRJET-  	  Virtual Community Using Cloud Technology “Unitalk”
IRJET- Virtual Community Using Cloud Technology “Unitalk”
 
Mooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql DatabaseMooc And Document Orientated Nosql Database
Mooc And Document Orientated Nosql Database
 
Knowledge management tools
Knowledge management toolsKnowledge management tools
Knowledge management tools
 
Semantic Massage Addressing based on Social Cloud Actor's Interests
Semantic Massage Addressing based on Social Cloud Actor's InterestsSemantic Massage Addressing based on Social Cloud Actor's Interests
Semantic Massage Addressing based on Social Cloud Actor's Interests
 
AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...
AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...
AN APPROACH TO EXTRACTING DISTRIBUTED DATA FROM THE INTEGRATED ENVIRONMENT OF...
 
Transforming knowledge management for climate action
Transforming knowledge management for climate action  Transforming knowledge management for climate action
Transforming knowledge management for climate action
 
B2 b management and lead generation ieee paper
B2 b management and lead generation   ieee paperB2 b management and lead generation   ieee paper
B2 b management and lead generation ieee paper
 
Multi-Agent Architecture for Distributed IT GRC Platform
 Multi-Agent Architecture for Distributed IT GRC Platform Multi-Agent Architecture for Distributed IT GRC Platform
Multi-Agent Architecture for Distributed IT GRC Platform
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn Data Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
 
WCECS2009_pp764-767
WCECS2009_pp764-767WCECS2009_pp764-767
WCECS2009_pp764-767
 
The Revolution Of Cloud Computing
The Revolution Of Cloud ComputingThe Revolution Of Cloud Computing
The Revolution Of Cloud Computing
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
 
Semic 2012 highlights report
Semic 2012 highlights report Semic 2012 highlights report
Semic 2012 highlights report
 

Plus de STIinnsbruck

Plus de STIinnsbruck (20)

Unister
UnisterUnister
Unister
 
Twoo
TwooTwoo
Twoo
 
Twibes
TwibesTwibes
Twibes
 
Tweet deck 2012-01-02
Tweet deck 2012-01-02Tweet deck 2012-01-02
Tweet deck 2012-01-02
 
Tv handbook revised_100120141
Tv handbook revised_100120141Tv handbook revised_100120141
Tv handbook revised_100120141
 
Tv feratel 13032014
Tv feratel 13032014Tv feratel 13032014
Tv feratel 13032014
 
Tv evaluation 12032014
Tv evaluation 12032014Tv evaluation 12032014
Tv evaluation 12032014
 
T vb publication_rules_11032014
T vb publication_rules_11032014T vb publication_rules_11032014
T vb publication_rules_11032014
 
T vb mapping_implementation_25032014
T vb mapping_implementation_25032014T vb mapping_implementation_25032014
T vb mapping_implementation_25032014
 
T vb alignment_022814_0
T vb alignment_022814_0T vb alignment_022814_0
T vb alignment_022814_0
 
Ttr 20130701
Ttr 20130701Ttr 20130701
Ttr 20130701
 
Ttg mapping to_schema.org_
Ttg mapping to_schema.org_Ttg mapping to_schema.org_
Ttg mapping to_schema.org_
 
Ttb 08042014
Ttb 08042014Ttb 08042014
Ttb 08042014
 
Trust you
Trust youTrust you
Trust you
 
Tripwolf
TripwolfTripwolf
Tripwolf
 
Tripbirds
TripbirdsTripbirds
Tripbirds
 
Traveltainment
TraveltainmentTraveltainment
Traveltainment
 
Travelaudience
TravelaudienceTravelaudience
Travelaudience
 
Tourismuszukunft
TourismuszukunftTourismuszukunft
Tourismuszukunft
 
Tourismusverband innsbruck 24.09.2013
Tourismusverband innsbruck 24.09.2013Tourismusverband innsbruck 24.09.2013
Tourismusverband innsbruck 24.09.2013
 

Dernier

AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfSkillCertProExams
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...ZurliaSoop
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...David Celestin
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lodhisaajjda
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoKayode Fayemi
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfMahamudul Hasan
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...amilabibi1
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Baileyhlharris
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxlionnarsimharajumjf
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar TrainingKylaCullinane
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityHung Le
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatmentnswingard
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.thamaeteboho94
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalFabian de Rijk
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIINhPhngng3
 

Dernier (17)

AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
Jual obat aborsi Jakarta 085657271886 Cytote pil telat bulan penggugur kandun...
 
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
Proofreading- Basics to Artificial Intelligence Integration - Presentation:Sl...
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdfSOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
SOLID WASTE MANAGEMENT SYSTEM OF FENI PAURASHAVA, BANGLADESH.pdf
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait Cityin kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
in kuwait௹+918133066128....) @abortion pills for sale in Kuwait City
 
Zone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptxZone Chairperson Role and Responsibilities New updated.pptx
Zone Chairperson Role and Responsibilities New updated.pptx
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven CuriosityUnlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
Unlocking Exploration: Self-Motivated Agents Thrive on Memory-Driven Curiosity
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Introduction to Artificial intelligence.
Introduction to Artificial intelligence.Introduction to Artificial intelligence.
Introduction to Artificial intelligence.
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 

Scei technical whitepaper-19.06.2012

  • 1. SCEI(Semantic Communication Engine Innsbruck) pronounced SKY Technical Whitepaper Dieter Fensel, Michael Fried, Christoph Fuchs, Iker Larizgoitia, Alex Oberhauser, Stefan Thaler, Ioan Toma v 0.6 19.06.2012 Abstract 1. Introduction 2. Problem definition 3. Reference architecture 3.1. Semantic layer/domain knowledge 3.2. Separation of components 3.3. Data and content storage 3.4. The weaving process in general 4. Reference implementation 4.1. Content Management System 4.1.1. Domain and task specific UI 4.1.2. Workflow engine and communication patterns 4.1.3. Export of RDF data (OWLIM Integration) 4.1.4. The Weaving Process within the CMS 4.1.4.1. Publication in CMS 4.1.4.2. Feedback collection in CMS 4.1.4.3. Statistics collection in CMS 4.2. dacodi 4.2.1. The Weaving Process within dacodi 4.2.1.1. Common Weaver Model 4.2.1.2. Publication in dacodi 4.2.1.3. Feedback Collection in dacodi 4.2.1.4. Statistics Collection in dacodi 4.2.3. Adapters 1
  • 2. Abstract The Semantic Communication Engine Innsbruck (SCEI) is a fully fledged online communication software suite. It supports users in online communication, gathering feedback and measuring online impact. The software contains workflow assistance as well as communication patterns supporting the planning and execution of online campaigns. Furthermore we look into possibilities of integrating crowdsourcing support like for example the translation of texts into foreign languages. In particular, we enable fast and easy one-click publishing and collection of content on a multitude of marketing channels, hiding technology complexity behind a user- friendly interface, and directly reflecting on the impact within online communities and web presence. The core idea of our approach is to introduce a semantic layer on top of the various Internet based communication channels that is domain specific (e.g. tourism, hotels, marketing, agencies, etc.) and not channel specific. This document describes the overall technical architecture of the multi channel management and communication software. Furthermore we present motivation, some use cases and architectural diagrams that outline the implementation details. Note: In this version of the document we mainly focus on the publication, feedback and statistics core functionality of SCEI together with an overview of the semantic layer. In later versions we will also cover workflow capabilities, communication patterns as well as the crowdsourcing components. 2
  • 3. 1. Introduction Today’s online world is more than ever driven by the fast paced exchange of information. The rise of Facebook, YouTube and others resulted in a notable shift how companies and individuals share and exchange information. These social media platforms and online services enable everyone to interact with a huge, already established user base. While a “traditional” online presence in form of a company or personal web page is still relevant, the inherent recommendation mechanics of social media platforms are beneficial to reach potential customers. However, online information is not exclusively for human consumption. Using semantic technologies information can be enriched with metadata, making it readable for machines as well. This inherent difference of the traditional, social, and machine readable way to make information available is essential in regards to how the information is treated. While a traditional web page has many advantages in terms of content ownership and freedom of presentation, there are usually limited metrics which indicate if the presented information was appreciated by a visitor. Unless special mechanisms (such as a rating or feedback system) are implemented, visitor numbers, as well as geographic data are the only metrics available. Social media platforms, on the other hand, provide very simple feedback mechanisms which are usually unobtrusive. Further, communication is encouraged by providing an easy way to exchange messages between users. The emphasis on feedback and interaction is the main difference of how information is treated on a social media platforms as opposed to traditional web pages. Analyzing the accumulated feedback is a useful indicator to see if the published information was received well by the audience or not. Even more broadly this enables the steering of brand perception and things like a holistic online reputation management and customer relationship management. Traditional web pages and social media platforms concentrate on humans as their main data consumers. The rise of various services (such as web services or mobile applications) and publishing methods such as linked data provide an incentive to present the data in machine readable form as well. Our goal is to develop a set of tools which combines the traditional, social and machine readable way to interact with information and makes this process easier than it is with existing tools. To reach this goal we will develop a unified layer - dacodi - which is able to interact with social media platforms, and extend existing content management systems (starting with Drupal) to incorporate the social and machine readable aspect into existing solutions. Additionally we will develop support for defining workflows as well as identify communication patterns that help in the planning, execution and controlling of online campaigns. What differentiates our solution is the introduction of a semantic layer that abstracts information items and underlying concepts from the concrete channels that the user wants to manage. This semantic layer is specific to 3
  • 4. a domain (e.g. Hotels, Restaurants, Doctors, Event managers, ...) which shall enable users to work from a conceptual view rather than a channel view. Throughout this paper those software components are referred to as the Semantic Communication Engine Innsbruck, or SCEI. Figure 1: SCEI conceptual overview The aforementioned differentiation (traditional, social, machine readable) allows us to separate responsibilities of our software components, making the whole SCEI modular, which results in higher efficiency, robustness and scalability. Obviously, this separation is not strict and the three variants can overlap. Certain types of web pages combine various technologies and paradigms which do not allow a strict classification. However, this three-fold separation is not meant as a classification of current online information. Its purpose is to define the types of information with which the SCEI interacts. The aim of this document is to introduce our technical solution to the problem of online multi channel management. Before defining the general problem in detail, we are going to agree upon certain terms to define a clear terminology. After the problem description in Section 2 we will present the high level approach of our solution i.e. the reference architecture in Section 3 followed by a more detailed and technical look into the software components i.e. reference architecture in Section 4. This will comprise the separation of the system into two big parts, the CMS and dacodi components, the introduction of our content and channels merging approach achieved through something we call a weaving process as well as its impacts on publication, 4
  • 5. feedback and statistics collection. Also we will introduce a Common Weaver Model which enables scalability of the system by exploiting the fact that similar channels have common characteristics. Terminology We define the following terms in order to establish a common understanding of the topic: ● Communication1 is the activity of conveying information according to Wikipedia. Communication requires a sender, a message (an object of communication, information or a form of information), channel (the medium) and an intended recipient. Bi directional communication is underlies a broad process model that often starts with a publication or broadcasting activity which can be followed by feedback, that again often triggers the exchange of further information afterwards or even leads to engagement in long conversation. ● Dissemination2 is the act of broadcasting content to the public without direct feedback from the audience. ● Content Management is the set of processes and technologies that support the collection, managing, and publishing of information in any form and medium. Digital content may take the form of text, multimedia files or any other file type which follows a content life cycle that requires management. ● RDFa is a W3C Recommendation that allows embedding of RDF statements into XHTML documents, HTML4 and HTML5. ● Microdata is a similar approach to RDFa. It allows to embed semantics into existing HTML content. Microdata aims to be simpler than RDFa and plays a major role in search engine optimization (SEO). ● A Channel is a means of transporting a message, therefore a medium. When in our definition an online channel does NOT equal a full communication platform. Potentially every URI is a channel. For example a HTML page within a larger website can be a channel. ● A Platform is a collection or a group of channels. For example Facebook is not one channel, but a collection of multiple channels e.g. the Facebook wall being one of them. A Platform allows access to more than one communication channel, (e.g. video, text, image). ● Pull channels are channels that actively gather data from a predefined source. A homepage (single html site) or Wiki page for example requests data from a server. These data sources can be many fold (e.g. a semantic repository) but the procedure is always the same: system pulls information from a source. For example also a Linked Data endpoint can be queried using SPARQL and the extracted information can be transformed, reused, etc. If we have the direct control over the underlying data we can semantically annotate it using technologies like RDFa and microdata. ● Push channels are channels to which information has to be explicitly sent to. These channels include email, bulletin boards and Web 2.0 platforms. None of these channels actively gather information from external data sources. This means if we want to distribute information to such channels we have to actively push it to the correct one. Also due to the fact that the user usually does not have full control over the data pool 1 http://en.wikipedia.org/w/index.php?title=Communication&oldid=480484048 2 http://en.wikipedia.org/w/index.php?title=Dissemination&oldid=458980901 5
  • 6. and storage it is not possible to control semantic annotation of for example a tweet or facebook post. ● Information Item is the entity to be published. An information item may be semantically enriched and thus described by an underlying concept. Viewed from the syntactic side an information item is represented as XHTML with the possibility of RDFa annotations. ● User in our terminology is an agent (human or software solution) who executes a task related to online communication. ● Adapters in dacodi are used to provide uniform access to all communication channels. They are the linking part between the actual communication channel (e.g. Facebook API for wall posts) and dacodi. We distinguish between two types of adapters: publishing and retrieval. 2. Problem definition After introducing the topic we now focus on defining the resulting problems that have to be overcome if one wants to reach scalable and efficient online communication. On a high level, the general problem is the following: A user has content that he wants to make accessible to others. This content can either be published as static content on a traditional web page, as a “status update” or something comparable on a social media platform, or as RDF triples in a triple store or the Linked Open Data cloud. If the user desires to utilize multiple outlets - to reach a wider and more diverse audience for example - the content has to be published on multiple places. However, currently publication in multiple places results in duplicate effort and manual labor. While the problem may be very similar conceptually, publication on different channels works differently if looked at the technical details. These are not mere technicalities or minor differences however. When it comes to, for example, ownership of the published data, the differences between a traditional web page and a social media platform are major. This shift of ownership/responsibility implies further differences in regards to what operations are possible on published data, e.g. modification and deletion of already published content. A homepage is in most cases intended to be world readable without restrictions, whereas social networks can be quite restrictive and make content only consumable for registered customers. Additionally a homepage should be structured well in order to enable the user to quickly discover the information needed. In most social networks recent content is automatically delivered to a user in his stream. The traditional Web and especially the Web 2.0 (Social Web) are becoming an inseparable part in identity creation and represent a key medium for companies to communicate with existing and potential customers. However, the opportunities for companies of leveraging Web technologies for attracting more site visitors and reaching more target-group users is accompanied by a number of challenges. These include as stated before technical difficulties, but more importantly, handing the growing number and diversity of social platforms, specialized 6
  • 7. news web pages, blogs, discussion forums and messaging services. We address these hindrances by providing innovative marketing communication and impact-measurement solutions. In particular, we offer the first product that employs semantics for creating a level of abstraction over all communication channels, thus supporting the recommendation of suitable channels and simultaneous publishing of content. In particular, we base the tool development on four main approaches for handling complexity and reducing the amount of manual effort: ● Description of communication channels’ capabilities which is implicitly given through clustering of channels into groups with similar functionality ● Semantic representation of the customer’s domain Information. SCEI makes use of semantic annotations, which can refer to a domain ontology. These annotations are useful for other services, as well as for the publication component of the tool (i.e. dacodi) and play an important role in search engine optimisation. ● Channel recommendation ● Content transformation to fit a particular channel Content distribution and feedback monitoring in various channels is a manual and labor intensive task. Take, for example, video upload. A user has to upload a video on potentially multiple platforms (YouTube, Vimeo, Facebook video), copy and paste the video title/description and enter tags manually. After the upload process, the user may want to notice his clients via social networks about the new video. Thus, the video link has to be copied and posted as a status update. Further, a short description alongside the link would be beneficial, which has again to be written or copied. Our tool wants to eliminate any non-automate-able manual labor in this and similar processes. The resulting software product saves time and hides technology specifics behind an easy-to-use interface, enabling a flexible and scalable multi-channel communication strategy. Furthermore, the tool also uses different metrics to statistically capture and analyze the online reach and impact, providing means for evaluating the online marketing strategy but also to conduct reputation management by timely reacting especially to negative posts and feedback. 3. Reference architecture In order to solve the problems mentioned above, this section explains our solution on a conceptual level. The central element of our approach is the separation of content and online channels. This allows reusing the same content for various communication means. Through this reuse we want to achieve scalability of multi-channel communication. The explicit modeling of content independent from specific channels also adds a second element of reuse: Similar operational entities active in the same domain can reuse significant parts of such a content model. Separating content from channels also requires the explicit alignment of both. This is achieved through a weaving process. Figure 2 shows the SCEI high level, reference architecture. The following Sections give more details about reference architecture, its components, where the content generated by the user is 7
  • 8. stored and the above described weaving process in general. Figure 2: SCEI reference architecture 3.1. Semantic layer/domain knowledge In order to abstract the domain specific communication from the actual channels, thus lifting the distribution and data collection in channels to an upper conceptual layer, we need semantics on top of our solution. This layer on the one hand captures data in domain specific ontologies on the other hand describes the various communication channels. In order to interweave the domain specific concepts with the underlying communication channels we propose a weaving process which will be explained in more detail within this document. In the end this semantic layer will smartly decide which kind of content is distributed to which channel in which form. Let’s take for example a hotelier who wants to build up or extend the online presence of his or her business. First of all there is a need to know all relevant channels which reflect the target group. This list can include things like a homepage, mailing lists, fora or social networks. After knowing the available channels, accounts have to be created on some selected platforms. Additionally to that, a hotelier has to be present in various rating and review sites in order to maximize business opportunities. These channels can be manifold and it is extremely hard to keep an overview of what’s going on in these channels without technical assistance i.e. a tool that distributes and aggregates all channels in a single interface. However technical details, as well as emerging channels shall be integrated quickly and transparently. So the end user needs to work on a level he or she understands i.e. a domain specific layer with concepts well known in the industry sector instead of handling each channel separately. 8
  • 9. 3.2. Separation of components The STI online communication tool is split into a set of components that can be conceptually grouped in two major parts, namely: 1. The content management system (CMS), together with the domain and task specific interfaces and the workflow engine and communication patterns component 2. The data and content distribution component, responsible mainly for the Web 2.0 communication (i.e. data distribution to and feedback collection in push channels). Obviously this separation is needed for satisfying the different requirements of push and pull channels due to the contradicting nature of these two approaches and their application by different existing channels. One must however note that both paradigms have in common that multi-directional communication (conversations between multiple users) can occur and often statistical information can be extracted from the channels. Also through this component separation we guarantee maximum scalability, allow easy adaptation to multiple use cases and simplify the integration with the seekda hotel booking solution as well as other 3rd party apps. Another main motivation of this separation is to have a single layer which unifies social media platforms (Web 2.0 channels) - namely the data and content distribution component. This enables easy integration by providing a single common interface, as well as the possibility of external use, as mentioned above. Another approach would be to integrate everything in the CMS of your choice, thus disregarding any possibility of loose coupling, reuse and a component-based architecture. On each data change in the CMS another module sends the newly created/updated content to the data and content distribution component API. This loosely coupled approach allows an easy exchange of the CMS part and makes the data and content distribution component independent from current content management solutions. The data and content distribution component API makes it also possible to create use case specific interfaces for data and content distribution component (e.g. enables white labeling) and quickly integrate it into 3rd party applications in use cases where the “heavy weight” CMS part including things like content hosting is not needed. Following use cases were defined to show the advantage of such a flexible architecture: ● A hotelier does not want to change his existing homepage infrastructure and CMS but nevertheless profits from addressing multiple Web 2.0, e-mail and rating channels via the data and content distribution component. Setup and usage of the software must be easy in order to be performed by an averagely skilled user. Here the communication with the customer, including engagement in conversations via the tool, is the primary focus of the user. Such a tool can be offered with a low pricing scheme since the data and content distribution component does mainly the content distribution and does not have to care about site hosting and content per se. ● The dissemination partner of an EU project needs a fully fledged, out of the box solution to address all important channels at once. The CMS with semantic data export in combination with data and content distribution component enables this. The initial 9
  • 10. setup is however a non trivial task since homepage structure and the links to LOD vocabularies and other ontologies must be created. However we can expect a more technically skilled user to operate on this full package. ● A marketing agency with a multitude of different customers faces several other problems. Each customer wants fully fledged offline and online presence in multiple channels. SCEI is very flexible within this regard. It is possible for them to maintain the full Web 2.0 presence of a customer via the data and content distribution component and if needed to also provide their customers with a state-of-the-art CMS solution. In Figure 2 we provide a high level overview of all SCEI components and actors which are the following: ● User: Person that operates the software and works on the level of information items rather than channels. We distinguish between several specific user roles, namely: ○ Content creator: Person that generates the content of the items to be disseminated. ○ Workflow designer: Person that define communication patterns and workflows involving communication, multi-channel publishing and social media monitoring within an organization (e.g. a hotel business) ● Information Item: Content that traverses the system and is stored, distributed and transformed within the process. ● CMS: The content management system (in our case Drupal 7.x) exposes the user interface to the user as well as HTML in the form of a website, accessible via the Web. ○ Domain and Task specific UI: Dependent on the application domain, user, task and role the user interface adapts itself and shows easily accessible all relevant information. ○ Workflow Engine/Communication patterns: In order to support publication and controlling workflows SCEI contains an engine supporting such. Well known communication patterns help in these workflows, their planning and execution. ○ RDFa annotation: Enriches the information item with semantic metadata in order to export it to an RDF repository as well as easen the distribution via the data and content distribution component, because the tool understands the meaning of the information item instead of just the structure. ○ Scheduling: Contains rules about delayed or recurrent publication of the information item. ○ DB: Database which stores the actual content. ○ RDF export plugin: Exports all information item for the DB to an external repository. ● Semantic repository: External RDF repository which exposes all information items via a SPARQL endpoint. It also contains the domain and channel models and makes this information accessible to both the CMS and data and content distribution component. ● Data and content distribution: Distributes content in and aggregates information from all push channels. ○ API: Makes the data and content distribution component accessible via the CMS 10
  • 11. as well as other 3rd party applications which makes this part integratable in external software solutions. Receives HTML which can be additionally enriched with RDFa annotations. ○ DB: Database stores references to the information items and their representation in the different channels. ○ Publishing Module: Is responsible for distribution the information item in different channels. ■ Content Extractor. Analyzes the HTML coming from the API and extracts all relevant information. ■ Concept to channel mapper: Decides which part of the information item will be pushed to which channel. ■ Content Transformer: Transforms the content in order to fit the channels e.g. shortens a text to 140 characters for Twitter publication. ■ Scheduling: Contains rules about delayed or recurrent publication of the information item. ○ Statistics module: Collects and stores all valuable statistical information coming from the various channels e.g. site visits, number of views, and such. ■ Item Analyzer: Handles statistics of an information item in various channels. ■ Channel Analyzer: Handles statistics coming from a specific channel regarding all information items published within. ○ Engagement Module: Is responsible for direct interaction in various channels. ■ Feedback Collector: Gathers feedback form all channels in order to present it centrally. This can be for example comments or reviews. ■ Interaction Component: Enables to react to the gathered feedback. For example to reply on online comments. ○ Impact Analyser Module: Figures out which impact publications have had. ■ Impact Analyser: Specialized form of statistics that try to figure out how to efficiently leave impact in the online world. We differentiate here between real impact, based on active publications, as well as potential impact, meaning how many people the user can potentially reach in the various channels, given a limited amount of e.g. friends or subscribers. 3.3. Data and content storage The CMS actually stores data and content (for example pictures) in its internal database. References, meaning links, to these data items can be found in the website’s HTML code as well as the exported RDF triples. The data and content distribution component, on the other hand, is not meant as a content hosting solution and therefore does not store all content and data. The only exception where the data and content distribution component stores data like images and videos, although temporarily, is when publishing is delayed by the scheduling mechanism. We distinguish between dynamic and static information publishing. With static information we 11
  • 12. refer to a “distributed profile” in all Web 2.0 channel that can be changed at once. Such a profile contains things like contact information or a representative picture, in short things that should not change frequently and are valid without temporal constraints. Dynamic information are things that will be pushed to e.g. news feeds and represent information at a certain point in time. For such publications the data and content distribution component only stores a reference to which channels content was distributed and a textual description of the content, so that it can later be identified by the user and specific feedback can be assigned to it. Acting mainly as a speaking tube, the data and content distribution component provides a lightweight and scalable solution. 3.4. The weaving process in general As mentioned in the previous Section, the general problem is one of content distribution and feedback collection. We define a “weaving” process to formalize the steps necessary to solve this problem. In general, this process can be broken down as follows: 1. Content input 2. Selection of publication channels 3. Content adaptation 4. Publication 5. Collection of feedback 6. Collection of statistics 4. Reference implementation We provide a reference implementation for the reference architecture presented in Section 3. The reference implementation is outlined in Figure 3 and is splitted into two major parts: a content management system (CMS) part based on Drupal 7.x, and the data and content distribution in-house implementation called dacodi. Additionally, a external semantic repository, namely OWLIM, is used to save content. The rest of this section provides the technical details on the reference implementation. 12
  • 13. Figure 3: SCEI reference implementation 4.1. Content Management System As basis for our CMS solution we use Drupal 7.x. The reason is its native RDF support, the availability of additional semantic modules, such as a SPARQL endpoint or microdata export and the possibility of third-party module development. The publication of new or the updating of existing content (information item) starts with the responsible person creating or changing one piece of information. This process is handled by the underlying CMS. If necessary, scheduling information could be provided to postpone the publication. After a successful change the content is saved to the external OWLIM repository and sent to the dacodi API. The CMS utilizes dacodi to extend its content distribution capabilities. Likewise the CMS acts as a specialized kind of user interface from the dacodi viewpoint. In the following we will further outline how RDF data is exported by the CMS and how the first part of the weaving process works. The second part of the weaving process will be described in the dacodi Section of this document. 4.1.1. Domain and task specific UI 13
  • 14. The Domain and task specific user interfaces are the components through which the content users are directly interacting with our system. They are sub-components of and directly implemented using the CMS. The design and look-and-feel of these components are very much adapted to the mind setting of the user, supporting them to specify content in a terminology that is familiar to them. For example hoteliers will specify content items that they want to be disseminated in terms of offers, touristic packages, etc. The domain and task specific user interfaces support thus information dissemination abstraction based on the concrete domain, independent of the channel(s) of dissemination. The domain and task specific user interfaces also allow the user to manage and solve task specific activities including yield, brand and reputation management, customer relation management and online advertising. 4.1.2. Workflow engine and communication patterns In order to support the user we offer a workflow engine together with support for communication patterns. This component enables user to define and manage complex workflows on top of the communication, multi-channel publishing and social media monitoring underlying SCEI components. Such workflows have usually a long lifespan and involve multiple employees working together on improving the visibility, reputation and communication of an entity. The workflow engine and communication patterns component can be used to manage the communication workflow including assigning, tracking and responding to user feedback. Using this component one can define and manage steps and protocols to be activated when certain events related to the published information occur., e.g. a bad comment on a post in Facebook is written. Take for example a hotelier. Using the workflow engine and communication patterns component, the hotelier can specify and manage when and which of his employes, depending on his availability, are taking care of responding to customers posts on various channels about his hotel, or engage with customers to present them new hotel offers. 4.1.3. Export of RDF data (OWLIM Integration) The export of the CMS content to an external triplestore repository allows the publication of the website data as a bubble in the linked data cloud. The consistency of the two databases is guaranteed with the help of hooks that are triggered by the CMS on each add, updated and delete operation. Hooks are functions that allow to intercept the CMS internal workflow. After an operation was successfully executed the RDF export plug-in creates triples and uses the Sesame REST API to add or change the content in OWLIM. For semantically annotating (RDFa and microdata) the content on the homepage exposed by the underlying CMS we use the Drupal internal database since available Drupal modules already enable this annotation functionality. The OWLIM repository mainly serves as linked data SPARQL endpoint. As seen in Figure 3, the changes to the CMS are not intrusive, since the added functionality is provided by plug-ins. Two additional plug-ins need to be written: One for the OWLIM integration, and one for dacodi. 4.1.4. The Weaving Process within the CMS In regards to a content management system, the weaving process looks as follows: 1. Content input The content is entered in the CMS, directly by the user of the CMS. 2. Selection of publication channels 14
  • 15. Where should the document be published, in regards to the internal document tree of the CMS. If a distribution to social media platforms via dacodi is desired, Web 2.0 channels can be selected as well. 3. Content adaptation Content adaptation is not necessary for the CMS, since there are no content restrictions. 4. Publication The document is published in the CMS and - if desired - as triples in the LOD cloud. The information item will be passed to dacodi during the publication phase, along with the previously selected Web 2.0 channels. 5. Collection of feedback Direct user feedback, like comments, shares, retweets, etc. is gathered by dacodi and can be queried by the CMS using the dacodi API (see more in Section Feedback Collection in dacodi). 6. Collection of statistics Collection of visitor numbers and demographic data can be done via a tool like Google Analytics or the open source solution PIWIK. The publication as triples in the LOD cloud (or to any external triplestore), as mentioned in step 3 of the weaving process, is done by a plugin which integrates OWLIM in the CMS. 4.1.4.1. Publication in CMS The CMS component enables the publication of content on a homepage. It also provides functionality to annotate the website’s HTML with RDF data and export these RDF data as a whole in order to make it machine understandable. 4.1.4.2. Feedback collection in CMS Feedback from the CMS can come from various sources like for example an internal commenting or rating system. 4.1.4.3. Statistics collection in CMS Statistics within the CMS can come from various sources like Google Analytics or PIWIK for analyzing page visits or internal comment and feedback systems. In the following Section we will explain how dacodi is able to distribute content in multiple channels. 4.2. dacodi The dacodi component is used to distribute information in various Web 2.0 and email channels, as well as collect and analyze feedback from those channels and actively engage in conversations (i.e. reply to comments). Central to dacodi is the weaving process, which enables channel selection based on the semantics of the information item to be distributed and content transformation based on these channels. If manual effort is necessary, for example for entering content in a certain spot to a wiki system, the content can be sent to the responsible webmaster via e-mail. We will describe how the weaving process works within dacodi and how the component interacts with online channels using certain Adapters for publication and feedback and statistics collection. 15
  • 16. 4.2.1. The Weaving Process within dacodi The ultimate goal of the weaving process is the semi-automated publication of the information item in fitting channels, including necessary transformations, based on the information type. Thus, the weaving process can be broken down in the following steps: 1. Content input In the case of dacodi, this equals the acquisition of the information item; either through the API (coming e.g. from the CMS) or a dedicated user interface. 2. Selection of publication channels Selection of appropriate Web 2.0 channels based on the information type. 3. Content adaptation Transformation of the information item into a Common Weaver Model (CWM) instance. 4. Publication Publication of the (transformed) information item in the selected channels. 5. Collection of feedback Feedback collection via the APIs of the used channels. 6. Collection of statistics Statistics collection via the APIs of the used channels. We will discuss the steps necessary, channel selection and content transformation, for the weaving process in the following Subsections. Afterwards we will explain in detail the Common Weaver Model which is part of the content transformation component and specific to dacodi but not the CMS part. Channel Selection Based on the information item type (e.g. a business event), a fitting channel for the information item will be selected (e.g. business event is announced on LinkedIn but not Facebook). The central component of the channel selection process is the (Concept-to-Channel) Mapper, which maps each concept to the appropriate channels. Consequently the Mapper gets a concept as input, and gives back a list of channels which are relevant for the concept. The Mapper of the prototype implementation uses a static mapping which maps every concept to a list of channels. Due to the modular architecture of the application, the mapper component can be easily replaced with a more sophisticated, dynamic approach. It would be possible, for example, to implement a Mapper that incrementally learns from user adjustments and thus alters the channel mappings based on the users needs. Transformation For every channel the information item has to fit in, a transformation is necessary. For example: A business event might include fields such as short title, long title, description, start date, end date, location and venue. Further, there might be an accompanying image which represents the event - like a poster. A channel which only takes short text messages (Twitter, for example) can’t handle all those fields. Thus, the information item has to be transformed into something what we call a Common Weaver Model instance (CWM). To expand on the previous example, one could think of combining the most important 16
  • 17. information of a business event (short title, start date, end date, location and venue) into a string which fits the channel’s restrictions - Twitter’s 140 characters, for example. The transformer component defines what transformations are necessary to go from Information Item to Common Weaver Model instance. 4.2.1.1. Common Weaver Model The Common Weaver Model3 (CWM) exploits the fact that similar channels have common characteristics. For example: Facebook status updates and Twitter enable the user to share short text messages in form of status updates. YouTube, Vimeo and Facebook video enable the user to upload and share videos. After looking at various Web 2.0 channels, we have identified the following Common Weaver Models: ● Text: A String of varying length. Online communication as it is today relies heavily on the exchange of short text messages. In essence, those messages are simply Strings of varying length. Depending on the platform, such text messages can be between 140 (Twitter) and many thousand characters (63,206 in the case of Facebook). ● Link: A common hyperlink denoted by the <link /> or <a /> HTML element. ● Image: A two-dimensional image denoted by the <img /> HTML element. While support may differ depending on the Channel, possible Internet media types include: gif, jpeg, png, svg, tiff. ● Video: A video file. While support may differ depending on the Channel, possible Internet media types include MPEG-1 video with multiplexed audio, MP4 video, Ogg Theora video, QuickTime video, WebM Matroska-based open media format, Matroska open media format, Windows Media Video. ● Presentation: A presentation file. We want to support this type - and thus related Web 2.0 platforms like slideshare - in future version. Not supported in the prototype. ● Audio: An audio file. Not supported in the prototype. During the weaving process instances of those models will be extracted from the information item and send to the selected publishing adapters. Each Common Weaver Model instance is stored internally using a unique identifier and grouped by the information item to which it is related. These CWM instances are extracted from an information item. The granularity of the extraction depends on the information item which is to be published. For example: if the user simply wants to publish a single link in various channels, it makes sense to extract the link and publish it. On the other hand, if a more complex information item contains dozens of links, it does not make sense to extract and publish every link (this would equal annoying spamming), unless the user explicitly wants to do so. 3 The model in Common Weaver Model refers to a model from a software engineering point of view, as in MVC (Model-View-Controller). A model manages the behavior and data of the application domain. 17
  • 18. Figure 4: Extraction of Common Weaver Model (CWM) instances from an information item. When an information item is published, e.g. a business event, CWM instances are extracted. Expanding on the business event example introduced in the Transformation section: If the business event includes an image, it will be extracted and published to fitting image channels like Flickr. The essential information about the event can be combined in a string format and published via text channels, such as Twitter, Xing or Facebook. Since every CWM instance knows from which information item it was extracted, a link to the original information item (in this example: the business event) can be embedded, e.g. in the description of the image. 4.2.1.2. Publication in dacodi The publisher module takes care of two things: publication of the information item (in this stage of the weaving process represented as Common Weaver Model instances) using adapters, and scheduling. We plan to support scheduling in two ways: delayed publication and repeated publication. For example: Delayed publication can be used to announce an event or a special offer at a specific time, whereas repeated publication may be used to send reminders (e.g. for a call for papers) in all channels. 4.2.1.3. Feedback Collection in dacodi Every Information Item that is published by dacodi is tracked by the system, to provide statistical information and a per-channel impact analysis. This feature allows the user to see how well the 18
  • 19. published information item was received, without having to check every channel individually. Feedback Basically there are three forms of feedback that are supported by various Web 2.0 platforms and thus relevant for dacodi: ● Unary feedback. Any feedback that is a predefined, positive feedback. Examples: “like” on Facebook, “retweet” on Twitter, “favourite” on flickr, “favourite” on YouTube, etc. ● Binary feedback. Any feedback that is a predefined, positive or negative feedback. Example: thumbs up/down on YouTube. ● Rating/ranking. Feedback that can be quantified on a discrete scale. Example: star rating on a hotel review platform. ● Textual feedback. Any feedback that is user-created, in form of replies, comments or any other form of written feedback. NLP techniques can be used to analyze the textual feedback to provide the user with additional information, i.e. if the comment/reply was a positive or a negative one. A user can directly react to textual feedback within dacodi if the underlying channel allows this functionality. 4.2.1.4. Statistics Collection in dacodi There are several statistical metrics that are relevant for the user. While the unary, binary, ranking and textual feedback is centered on the information item, statistics are relevant on a per-item and per-channel basis. Examples: ● Amount of unary, binary, rating, and textual feedback per information item (this includes features like “most discussed information item”, i.e. the information item with the most textual feedback). ● Number of information items published in each channel over a certain amount of time (day, week, month, year). ● Calculation of a combined impact metric per channel, based on feedback analysis of the information items published in the channel. 4.2.3. Adapters As mentioned before, we distinguish between two types of adapters: publishing and retrieval. The purpose of a publishing adapter is to publish an information item in a certain channel. Retrieval adapters are used to gather information about already published information items. Since the APIs as well as the offered functionality differ from channel to channel (e.g. Twitter’s and Facebook’s API differ) a separate adapter for each channel needs to be written. In our prototype we intend to create publishing and retrieval adapters for the following platforms: YouTube, Facebook and Twitter. All three of them have a Web API and cover a majority of the features we want to realize, such as publishing videos, texts, images and links. This is a starting point for implementing new adapters that provide similar functionality. We have identified the following, channel specific features that each adapter has to be able to handle: ●Mapping CWM properties to appropriate properties in the published communication channel. For example, a Tweet post’s text property is called ‘text’ 19
  • 20. whereas a Facebook post’s text property is named ‘message’. Since we have to implement an adapter for each communication channel we want to address we will implement this by a simple mapping routine in each adapter. ● Authentication and authorization to the communication channel. Most Web 2.0 communication channels rely on OAuth / OAuth24 to realize authorization and authentication. However, some of them rely on OpenId5, basic HTTP authentication or other form-based authentication mechanisms to restrict user access. The adapter has to be able deal with these individual mechanisms and has to store and load the credentials of each users. ● Publish a specific CWM instance. As mentioned above, the publishing process varies from platform to platforms thus this functionality has to be abstracted. This holds also true for retrieving feedback from different platforms. Adapter loading and naming conventions We designed our adapters and adapter loading mechanisms to achieve the following three goals: ● No adapter duplication. The same functionality should be achieved with the same code. (Minimize codebase, achieve simplest possible code base). ● A common adapter structure for all platforms. Platforms are differently structured. However, for the clarity of the dacodi we want a uniform way adapters are integrated in the system. Adding the same functionality (e.g. adding an image channel) should be achieved in a similar manner in all platforms. ● Automatic adapter loading and execution. There should be no manual effort involved in adding a new adapter to the system, except for programming the adapter. To achieve these three goals, we designed the system carefully, introduced some naming conventions and loading conventions for our adapters. These are described in the following sections. Figure 5 depicts the motivation for our design. The illustration sketches that social media platforms offer more than one different way to publish information. Additionally, each user account on this platform allows access another, similar set of communication channels, e.g. when one has two accounts on Twitter, they have a duplication of all available communication channels on Twitter, say two text-channels, two image channels and so on. The difference between those channels are merely the user credentials that are used to authenticate for the post. Different platforms allow to post similar common weaver items, though. Adhering to the SRP software development principle6 we chose to write an adapter for each explicit communication channel in each platform individually. This - together with a file naming convention - also allows us to automatically load and execute adapter classes, without having to change configuration files or any additional manual effort. If an adapter class is not found this channel is simply not supported. 4 http://en.wikipedia.org/wiki/Oauth 5 http://en.wikipedia.org/wiki/Openid 6 http://en.wikipedia.org/wiki/Single_responsibility_principle 20
  • 21. Figure 5: Platform as channel groups offering multiple ways to publish information We named the components that define a communication channel in dacodi: a channel, a platform and user credentials. In detail, they are: ● Channel Type: Is a virtual grouping of channels that allow publishing the same common weaver items, e.g. image, video or text. This is depicted in Figure 5. It is virtual, because it is split up into many different adapters to many different platforms but is accessed in a uniform way nonetheless. ● Platform: Is a grouping of channels that have the same user credentials. An example for a platform or channel group is Facebook. The notion of the channel group has been introduced since a platform such as Twitter or Facebook actually allow access to more than one communication channel, (e.g. video, text, image) ● User credentials: This is the information needed to authenticate/authorize a client to a certain platform and associate it with a certain account. In dacodi user credentials contain the following information: an account id which associates a channel group with a user (i.e. the Facebook account id 1234 with the dacodi user 27), the authorization token- and secret which store information that is required completing actions in a platform such as posting (i.e. an OAuth2_token associated with the account or a password), the consumer key and consumer secret, which contain information about the application that is about to publish (you can think of it as the authentication of dacodi to guarantee the platform that dacodi is actually itself publishing on the user's behalf). The notion of user credentials have been introduced, since a user may have multiple accounts on one platform. 21