SlideShare une entreprise Scribd logo
1  sur  132
Télécharger pour lire hors ligne
Galway, Ireland
October 30, 2015
Book of Abstracts
Insight Student Conference 2015
INSIGHTSC2015 organised
by
Erik Aumayr, Narumol Prangnawarat,
Zia Ush Shamszaman, Alokkumar Jha,
Md. Rezaul Karim, Jaynal Abedin,
Thu-Le Pham, Siamak Barzegar
The Insight Centre for Data Analytics
Book of Abstracts
Insight Student Conference (INSIGHTSC 2015)
Galway, Ireland
Erik Aumayr, Narumol Prangnawarat,
Zia Ush Shamszaman, Md. Rezaul Karim, Alokkumar Jha,
Jaynal Abedin, Thu-Le Pham, Siamak Barzegar
The Insight Centre for Data Analytics
October 30, 2015
Credits:
Cover design: Srinivasan Arumugam, Md. Rezaul Karim
Book editor: Erik Aumayr, Md. Rezaul Karim, Narumol Prangnawarat,
Jaynal Abedin
Published by:
The Insight Centre for Data Analytics
https://www.insight-centre.org/content/insight-student-conference-2015
LIST OF ABSTRACTS
Linked Data and Semantic Web
1 A Linked Data Platform as Service for Finite Element Biosimulations
Joao Bosco Jares, Muntazir Mehdi andRatnesh Sahay
2 Adaptivity in RDF Stream Processing (RSP)
Zia Ush Shamszaman, Muhammad Intizar Aliand Alessandra Mileo
3 An elastic and scalable spatiotemporal query processing for linked sensor data
Hoan Nguyen Mau Quoc
4 An Infrastructure to Integrate Open Public Health Data and Predicting Health Status
Jaynal Abedin, Ratnesh Sahay and Dietrich Rebholz-Schuhmann
5 Biological Link Extractor From Literature In Systems Biology
Arindam Halder
6 Complex Reasoning over Big Data Streams with Answer Set Programming
Thu-Le Pham
7 Data Agnostic Management Systems for The Internet of Things
Zeeshan Jan, Aqeel Kazmi and Martin Serrano
8 Deployment and Configuration of Multi-Query Data Test Services in Federated Cloud Environments
Salma Abdulaziz
9 Diminishing Business Challenges by Improving Open Data Business Model
Fatemeh Ahmadi Zeleti
10 DInfra - Distributional Infrastructure
Siamak Barzegar, Juliano Efson Sales, Andre Freitas and Brian Davis
11 DISCOV3R: Discovering Life-Sciences Datasets in LS-LOD Cloud
Muntazir Mehdi, Ratnesh Sahay and Dietrich Rebholz-Schuhmann
12 Discovering Hidden Structures for Quality Assessment
Emir Muñoz
13 Effective Data Visualisation to Promote Home Based Cardiac Rehabilitation
Liam Sexton and David Monaghan
14 Enabling a better Collaboration and Communication through Personalizations
Anne Helmreich
15 EnRG: Entity Relatedness Graph
Nitish Aggarwal
16 Extracting Semantic Knowledge from Unstructured Text using Embedded Controlled Languages
Hazem Abdelaal
17 Heuristic Based Adaptive Query Optimization Approach for RDF Stream Processing
Yashpal Singh, Ali Intizar and Alessandra Mileo
18 Instance Search with Semantic Analysis and Faceted Navigation
Zhenxing Zhang, Cathal Gurrin and Alan Smeaton
19 Knowledge Base Segmentation in Entity Linking with Multiple Knowledge Bases
Bianca Pereira
20 Linked data approach for CNV annotation in cancer
Alokkumar Jha, Yasar Khan, Ratnesh Sahay and Dietrich Rebholz-Schuhmann
21 Linked Data Profiling
Andrejs Abele
22 Mobile RDF Store
Le Tuan Anh and Danh Le Phuoc
23 Optimizing Access to Twitter Pull-Based APIs
Soheila Dehghanzadeh
24 Overcoming Limitations to control personal data and ownership in DOSNs
Safina Showkat Ara
25 Reference Implementation and Performance Evaluation for CQELS RDF Stream Processing engine
Chan Le Van
26 Temporal Graph-based Approach for Document Summarisation
Narumol Prangnawarat, Ioana Hulpus and Conor Hayes
27 Using semantic information for ontology translation
Mihael Arcan
28 Zinc Phthalocyanine and its Substituted Derivatives as Sensitive Layers for Textile - Based Sensor
Eva Marešová, Martin Vrňata, Přemysl Fitl, Jiřı́ Bulı́ř, Ján Lančok, Jan Vlček, David Tomeček,Michal
Novotný, Larisa Florea, Shirley Coyleand Dermot Diamond
Media Analytics
29 A Framework for Extraction of Exceptional Events
Yuriy Gurin, Terrence Szymanski andMark Keane
30 An Approach to Measure the Impact of Academic Entities Using a Heterogeneous Graph
Mohan Timilsina
31 Brain computer interfaces for digital media interaction
Zhengwei Wang
32 Bridging Social Media and e-Participation
Lukasz Porwol
33 Content-based Search Engine for Lifelogging with Collaborative Common Knowledge Base
Tengqi Ye
34 Deep community analysis: Using statistical analysis to identify the most influential users in Social media
Himasagar Tamatam and Conor Hayes
35 Deep Image Representations for Instance Search
Eva Mohedano
36 Design considerations of a lifelog annotation system with three-level ontology
Aaron Duane
37 Development of a Closed-loop Neurocognitive Engineering Platform
Damien Kearney, Tomas Ward, Mahnaz Arvaneh and Ian Robertson
38 Exploring Twitter Data with Computational Homotopy
Pablo Torres-Tramon and Graham Ellis
39 Extraction of Customer-To-Customer Suggestions from Reviews
Sapna Negi
40 Human Action Recognition Framework using Graph Representation
Iveel Jargalsaikhan
41 Measuring the Semantic Similarity in Interest Graphs for Social Recommendations
Guangyuan Piao
42 Observing the Relationships Between MEPs on Twitter
Mark Belford, James Cross and Derek Greene
43 Real Time Crowded Scene Understanding
Mark Marsden
44 Twitter for Sentiment Analysis and Opinion Mining
Peiman Barnaghi, Parsa Ghaffari and John Breslin
45 Using the NoSQL Model to support DWARF Cubes in XML Data Mining
Michael Scriney and Mark Roantree
46 What are Words Worth? Exploring Semantic Spaces of Political Discourse
Igor Brigadir
Optimisation & Decision Analytics
47 A comparison of the SIR and the network model, with Kalman Filter
Weipeng Huang
48 An Efficient Dispatch and Decision-making Model for Taxi-booking Service
Cheng Qiao
49 Countermeasures to Mitigate Bandwidth Level DDoS attacks in Data Centers
Samar Raza Talpur and Prof. M-Tahar Kechadi
50 Different solutions of Variable Cost and Size Bin Packaging Problem with Stochastic Item (VCSBPSI)
Andrea Visentin
51 Improving Catalogue Navigation using Critique Graphs
Begum Genc and Barry O’Sullivan
52 Learning User Preferences in Matching for Ridesharing
Mojtaba Montazery and Nic Wilson
53 On Energy- and Cooling-Aware Data Centre Workload Management
Danuta Sorina Chisca, Deepak Mehta,Ignacio Castiñeiras and Barry O’Sullivan
54 On Temporal Bin Packing
Milan de Cauwer
55 Optimal Bayes decision rules in cluster analysis using greedy techniques
Riccardo Rastelli and Nial Friel
56 Personalized Route Planning
Daniel A. Desmond
57 ReACTR: Realtime Algorithm Configuration through Tournament Rankings
Tadhg Fitzgerald
58 Solving a Hard Cutting Stock Problem by Machine Learning and Optimisation
Adejuyigbe O. Fajemisin
59 Statistical Regimes and Runtime Prediction – Abstract
Barry Hurley
60 Towards Fast Algorithms for the Preference Consistency Problem
Anne-Marie George, Nic Wilson and Barry O’Sullivan
Personal Sensing
61
A Data Driven Approach to Determining the Influence of Fatigue on Turning Characteristics and Associated
Injury Risk in Chronic Ankle Instability
Alexandria Remus, Eamonn Delahunt and Brian Caulfield
62
An Evaluation of the effects of the MedEx programme on physical, clinical and psychosocial outcomes and
an examination of the determinants of adherence to MedEx.
Fiona Skelly and Emer O Leary
63
Association between Objectively Measured Physical Activity and Vascular Endothelial Function in Adoles-
cent Males
Sinead Sheridan and Niall Moyna
64
Change of direction biomechanics continue to improve from six to nine months post Anterior Cruciate
Ligament reconstruction.
Shane Gore, Andrew Franklyn-Miller and Kieran Moran
65 Development of an autonomous phosphate sensor.
Gillian Duffy, Kevin Murphy, Adrian Nightingale, Nigel Kent, Matthew Mowlem, Dermot Diamond and
Fiona Regan
66
Do mobile phone Apps apply behaviour change strategies for clinical populations: A Literature Review
Strategy
Orlaith Duff, Deirdre Walsh and Catherine Woods
67 GUI based User Behaviour Modelling for Identification
Zaher Hinbarji, Cathal Gurrin and Rami Albatal
68
Investigating normal day-to-day variations in postural control in a healthy young population using Wii
Balance Boards
William Johnston, Ciaran Purcell, Ciara Duffy, Tara Casey, David Singleton, Barry Greene, Denise Mc-
Grath and Brain Caulfield
69 MedEx Move On: Community-based Exercise Rehabilitation for Cancer Survivors
Mairead Cooney, Emer O’Leary, Brona Furlong, Catherine Woods and Noel McCaffrey
70 Multi-modal Continuous Human Affect Recognition
Haolin Wei, David Monaghan and Noel E. O Connor
71 Non-Linear Analyses of Surface Electromyography in Parkinson’s Disease
Matthew Flood and Madeleine Lowery
72 Ocular Glucose Biosensing Using Boronic Acid Fluorophores
Danielle Bruen, Larisa Florea and Dermot Diamond
73 Quantifying Athletic Screening Tools Using Inertial Sensors And The Microsoft Kinect
Darragh Whelan, Martin O’Reilly, Eamonn Delahunt and Brian Caulfield
74 Self-Propelled Electrotactic Ionic liquid Droplets
Wayne Francis, Klaudia Wagner, Stephen Beirne, David Officer, Gordon Wallace, Larisa Florea and Dermot
Diamond
75 Sodium sensing in sweat using crown ether functionalised polymeric hydrogels
Deirdre Winrow, Larisa Florea and Dermot Diamond
76 Stimuli-responsive hydrogels based on acrylic acid and acrylamide
Aishling Dunne, Siobhán Mac Ardle, Larisa Florea and Dermot Diamond
77 Tackling Neurodegenerative Diseases
Zhemin Zhu
78 Textile Strain Sensors for Clinical Applications in Spinal Flexion
Jennifer Deignan, Syamak Farajikhah, Ali Jeirani, Javad Foroughi, Shirley Coyle, Peter Innis, Rita Par-
adiso, Gordon Wallace and Dermot Diamond
79
THE BIOMECHANICAL DETERMINANTS OF CUTTING PERFORMANCE IN ELITE FEMALE
FIELD SPORT ATHLETES: STUDY JUSTIFICATION
Neil Welch, Kieran Moran and Andrew Franklyn-Miller
80 The Development of a Wearable Potentiometric Sensor for Real-time Monitoring of Sodium Levels in Sweat
Thomas Glennon, Conor O’Quigley, Giusy Matzeu, Eoghan Mc Namara, Florin Stroiescu, Kevin Fraser,
Margaret McCaul, Stephen Beirne, Jens Ducrée, Gordon Wallace, Paddy White and Dermot Diamond
81
The use of an adaptive coaching system to improve patient outcomes during a free living step training
exercise programme in type 2 diabetics
Hugh Byrne
82 The Use of Inertial Measurement Units to Evaluate Exercise Performance
Martin O’Reilly, Darragh Whelan, Tomas Ward and Brian Caulfield
83 Tuning the Stimuli-Responsive Properties of Poly(Ionic Liquid)s
Alexandru Tudor, Larisa Florea andDermot Diamond
84 Using Intensity of Periodicity to detect Behaviour Change
Feiyan Hu, Alan Smeaton and Eamonn Newman
85 Validation of a Wearable Optical Heart Rate Sensor during Rest and Exercise
Alexandria Remus and Brian Caulfield
86 Validity and Reliability of the FitBit Charge HRTM
to Monitor Heart Rate at Rest and During Exercise
Clare McDermott, Niall Moyna and Kieran Moran
87 Wearables for Diabetes Prevention and Management
José Juan Domı́nguez Veiga
Recommender Systems
88 A Scalable and Secure Realtime Healthcare Analytics Framework with Apache Spark
Md. Rezaul Karim, Ratnesh Sahay andDietrich Rebholz-Schuhmann
89 Actionable Recommendations using Resource Based Features
Owen Corrigan
90 Career Development: Recommending Jobs, Career Paths and Skills to Users
Xingsheng Guo, Houssem Jerbi and Michael O’Mahony
91 Exploitation-Exploration Aware Diversification for Recommendation Systems
Andrea Barraza-Urbina and Conor Hayes
92 Not your Average Fortuneteller; A Study of Time-aware Recommender Systems for Linear TV
Humberto Corona
93 Opinionated Explanations for Recommendations
Khalil Muhammad, Aonghus Lawlor,Rachael Rafter and Barry Smyth
94 Recommendation Framework for Feature-rich Sequences
Gunjan Kumar
95 Recommending from Experience
Francisco J Peña
96 Tracking and Recommending News
Doychin Doychev, Aonghus Lawlor and Barry Smyth
Machine Learning & Statistics
97 A Distributed Approach for Clustering Large Spacial Datasets
Malika Bendechache and Tahar Kechadi
98 AcademiScope: An Application for Building and Visualizing an Academic Network
John Lonican and Conor Hayes
99 Adaptive MCMC for Changepoint Models on Large Datasets
Alan Benson and Nial Friel
100 Application of Data Mining Techniques to Service Modelling and Evaluation at Caredoc
Duncan Wallace and Tahar Kechadi
101 Automatic Dynamic Product Classification
Guillermo Vinue and Andrew Parnell
102 Categorising Online Q&A Communities Based on User Behaviour
Erik Aumayr and Conor Hayes
103 Deriving the String Student-t Process for Scalable Student-t Process Regression
Gernot Roetzer and Simon Wilson
104 Exploring Variable Interactions with Restricted Boltzmann Machines
Jim O’ Donoghue and Mark Roantree
105 Identification of Regional Interaction Pattern in Regulatory Networks
Laleh Kazemzadeh
106 Indicators of Good Student Performance in Moodle Activity Data
Ewa Mlynarska
107 Loss Functions and Optimization Strategies for Large-scale Sequence Learning
Severin Gsponer, Georgiana Ifrim andBarry Smyth
108 Model Selection for Ranking Data
Lucy Small
109 Model-based Clustering with Sparse Covariance Matrices
Michael Fop and Thomas Brendan Murphy
110 Network Forensics Readiness and Security Awareness Framework
Aadil Mahrouqi and Tahar Kechadi
111 Predicting Peer Groups Effects On University Exam Results
Philip Scanlon and Prof Alan Smeaton
112 Predicting Topographical and Sociological Information Patterns from Building Access Logs
Philip Scanlon and Alan Smeaton
113
Prenatal alcohol exposure and cord-blood DNA methylation: identifying latent structure in high dimensional
data
Cathal Mullin
114 Radiographic Knee Osteoarthritis Classification using Convolutional Neural Network
Joseph Antony, Kevin McGuinness, Noel O’Connor and Kieran Moran
115 Reformulations of the Map Equation for Community Finding and Blockmodelling
Neil Hurley and Erika Duriakova
116 Representative Itemset Mining
Hong Huang and Barry O’Sullivan
117 Revealing the Hidden Patterns: Trajectory Mining from Mobility Data
Ali Azzam Naeem
118 Sensor based sentiment analysis
Srinivasan Arumugam, Jyothirmoy Patgiri andNavdeep Sharma
119 Social Network Analysis of Pride and Prejudice
Siobhan Grayson, Derek Greene, Gerardine Meaney and Brian Mac Namee
120 AddressingCold-StartforStreamingTwitterHashtagRecommendationtoNews
Bichen Shi
121 Topy the Story Tracker: Social Indexing for Real-time Story Tracking
Gevorg Poghosyan, Georgiana Ifrim and Neil Hurley
Acknowledgment: This publication has emanated from research supported in part by the research grant from Science Foundation Ireland
(SFI) under Grant Number SFI/12/RC/2289 and EU project SIFEM (contract Number 600933).
A Linked Data Platform as Service for Finite Element Biosimulations
Joao Bosco Jares, Muntazir Mehdi and Ratnesh Sahay
The Insight Centre for Data Analytics, National University of Ireland, Galway (NUIG)
joao.jares@insight-centre.org
Abstract
Biosimulation studies have recently been introduced as
models to understand the causes which results raise to
impairment in human organs. Finite Element Method
(FEM) provides a mathematical tool to simulate dynamic
biological systems, with applications in many fields of
human organs, from ear to neurovascular research.
However, without a proper data infrastructure, steps
involved in the execution and comparative evaluation of
Finite Element Simulations might consume much more
time and besides, it needs to be performed in an isolated
environment. Considering these facts, we propose a
service-oriented Linked Data platform to improve
automation, integration, analysis and visualization of
biosimulation model for inner ear (cochlea) mechanics.
1. Introduction and Motivation
Finite Element (FE) models are numerical approaches
used for finding approximate solutions to differential
equations. Construction of FE models is a highly
complex task comprising multiple steps, such as,
definition of a discretized geometrical model (a mesh)
and a physical-mathematical model, method type
selection, visualization and the result interpretation of
the model. Moreover, building a consistent FE model
could consume much more time hence, depends on fine-
tuning of different parameters; where, the complexity is
expressed not only in difficulty of building and
validating FE Models but also in reproducing and
reusing third-party FE models [1]. Proposed works in
the existing approaches [2] describe the creation of an
infrastructure and platform to support more automated
interpretation of FE simulations in bio-mechanics using
Semantic Web standards and tools. In addition, most
related data are represented in numeric way, this work
explores mechanisms to bridge data on the numerical to
the ontology (conceptual) level, facilitating and
automating the interpretation of the simulation results.
Therefore, in order to evaluate our proposed approach,
the service is exposed via a single interface endpoint for
third-party agent’s consumption. The main aim of the
service is to improve and integrate the process of
automating the individual tasks of performing a
biosimulation under a realistic cochlear mechanics FE
model.
2. Proposed Framework
The Linked Data platform, which is exposed as a service
is called SIFEM [2] system and the high level
components of SIFEM system (and the conceptual
model) are shown in Figure 1. The simulation service
starts with the specification of the simulation input
parameters (such as inner-ear geometrical mesh). After
the specification of the simulation inputs, the user starts
the simulation using the service interface, which invokes
the SIFEM Solver Service to coordinate the execution
and instantiation of multiple solver instances in an
asynchronous fashion.
Figure 1. Components of SIFEM Services & conceptual model
The SIFEM Solver service reads the solver input data
and transforms the data into solver specific input format.
On receiving from the solvers, the RDFization
component RDFizes the input and output of the
simulation experiment and stores them in the RDF triple
store. The RDFization of the input and output is
performed using the SIFEM conceptual model. The Data
Analyzer component extracts a set of data analysis
features (such as extrema points, average point, and
slope) from the numerical simulation data. The data
analyses results are also RDFized using the SIFEM
conceptual model and are also stored in the triple store.
The simulation input-output and analysis data are
RDFized and stored such that it could be re-used instead
of performing a completely new simulation. The cochlea
or inner-ear represents a bio-mechanical device, as well
as, the complete understanding of the related organ
(cochlea) behavior still an open research challenge. The
creation of a complete cochlea model depends on the
integration of heterogeneous models at different scales
(e.g., basilar membrane, organ of corti and outer hair
cells) and theoretical domains (e.g., mechanical,
geometrical, and electrical). To the best of our
knowledge, the proposed linked data platform is a first
unified infrastructure that brings together numerical
parameters, models, terminologies, storage, querying,
visualization and analysis to conduct a finite element
biosimulation.
References
[1] Yasar Khan, Muntazir Mehdi, Alokkumar Jha, Saleem Razaz,
Andre Freitas, Marggie Jones, and Ratnesh Sahay. Extending Inner-Ear
Anatomical concepts in the Foundational Model of Anatomy (FMA)
Ontology. In proceedings of the BIBE’2015.
[2] Muntazir Mehdi, Yasar Khan, Andre Freitas, Joao Jares, Stefan
Decker, and Ratnesh Sahay. A Linked Data Platform for Finite Element
Bio-Simulations. In proceedings of the SEMANTiCS’ 2015.
INSIGHTSC [1]
Adaptivity in RDF Stream Processing (RSP) ∗
Zia Ush Shamszaman, Muhammad Intizar Ali, Alessandra Mileo
Insight Centre for Data Analytics, National University of Ireland Galway
{zia.shamszaman, ali.intizar, alessandra.mileo}@insight-centre.org
1. Introduction and Motivation
Existing directions in RSP research focus on reaching
a consensus on how to process RDF streams, rather than
on the ability to adapt and react to changing requirements
of the applications and properties of the underlying data
streams. It is not trivial to find a configuration of such fea-
tures that can work independently of the data and the ap-
plication domains. Therefore, we believe it is important to
design flexible RSP solutions which can adapt to the ap-
plication requirements and let the application discover and
select the underlying RSP engine (or a configuration of their
features) on the fly through dynamic adaptation. Over the
last few years several stream processing systems have been
proposed for efficient processing of RDF stream, just to
name a few CQELS, C-SPARQL, EP-SPARQL, SPARQL-
stream [2, 1]. Multiple aspects can affect the performance
and correctness of the results produced by the query pro-
cessors, including operational semantics of linked streams,
query execution method, and target domain. However exist-
ing approaches to stream query processing lack the adapt-
ability to react to changing requirements of the applications
and properties of the underlying data streams. The goal of
this research is to design a more flexible and adaptive stream
query processing approach which enables stream query pro-
cessing approaches to adapt according to the requirements
of the applications and to the characteristics of the data
streams.
2. Hypothesis
The adaptive approach will improve efficiency and cor-
rectness of stream processing in general by serving a
broader category of application requirements. In addi-
tion, it will provide better results in changing environments,
i.e. when application requirements and properties of data
streams change at run-time.
3. Proposed Approach
We believe there is a wide range of features which might
limit the applicability of RSP solutions. We classify these
features into two categories: i) design-time features include
aspects such as input data model, language to define pro-
cessing rules, operational semantics, and supported stream-
ing operators, and ii) run-time features include aspects such
as execution time, processing tech- niques, quality of ser-
vice (QoS), privacy, target domain of applications and more.
Existing RSP engines are designed to have two types of
∗This research has been partially supported by Science Foundation Ire-
land (SFI) under grant No. SFI/12/RC/2289
Features CQELS CSPARQL
Input
Periodic Y N
Data Driven N Y
Output
Istream Y N
Rstream N Y
Dstream N N
Empty Relation Notification N Y
Table 1: Narrowed-down features of RSP engines.
input data models for query execution, (i) data driven i.e.
query is executed whenever data is arrived, or (ii) periodic
i.e. query is executed periodically. The data driven ap-
proach is ideal for continuous monitoring while the time
driven approach is good for periodic monitoring. The C-
SPARQL follows time driven strategy where result may get
stale if the re-execution frequency is lower than the fre-
quency of the updates [1] and thus not suitable for appli-
cation where delay can be crucial (e.g. burglar attempt no-
tification in a home surveillance system). The CQELS en-
gine follows data driven approach which is ideal for time
efficiency but can be resource expensive for periodic noti-
fication system ( e.g. periodic surveillance over childrens
activities in a home monitoring system).
Work in [1] has showed that C-SPARQL suffers from
duplicate results for simple queries and misses some certain
output in complex queries. This is due to implementing the
R-Stream output operator in which old triples does not get
removed from the time window. However, another evalua-
tion in [2] have shown that C-SPARQL provide more cor-
rect results among other RSP engines. Diversity in output
results produced by various RSP engines is a known phe-
nomenon. Input data models and output streaming opera-
tors are mere two examples to showcase how some design-
time features may affect correctness and performance of the
RSP engine for query execution.
Hence, we intend to design an engine that allign with the
RSP-QL(query language) of W3C RSP community with
new query semantics and on-demand query generation ca-
pabilities. In the query execution part, based on the require-
ments we are to select different features in Table 1 to swithc-
on the best of our engine.
References
[1] D. Le-Phuoc, M. Dao-Tran, M. Pham, P. Boncz, T. Eiter,
and M. Fink. Linked stream data processing engines: Facts
and figures. The Semantic Web–ISWC 2012, pages 300–312,
2012.
[2] Y. Zhang, M.-D. Pham, Ó. Corcho, and J.-P. Calbimonte. Sr-
bench: A streaming rdf/sparql benchmark. In ISWC (1), pages
641–657, 2012.
INSIGHTSC [2]
An elastic and scalable spatiotemporal query processing for linked sensor data
Hoan Nguyen Mau Quoc
Insight Centre for Data Analytics
E-mail: hoan.quoc@insight-centre.org
Abstract
Recently, many approaches have been proposed to man-
age sensor data using Semantic Web technologies for ef-
fective heterogeneous data integration. However, our re-
search survey revealed that these solutions primarily fo-
cused on semantic relationships and still paid less attention
to its temporal-spatial correlation. In this paper, we pro-
pose a spatiotemporal query engine for sensor data based
on Linked Data model. The ultimate goal of our approach is
to provide an elastic and scalable system which allows fast
searching and analysis on the relationships of space, time
and semantic in sensor data.
1. Introduction
The Internet of Things(IoT) is the network of physical
objects embedded with sensors that are making real-time
observations about the world as it happens. The sensor
observation data is always associated with spatiotemporal
contexts, i.e, they are produced in specific locations at spe-
cific time. Therefore, all sensor data items can be repre-
sented in three dimensions: semantic, spatial and temporal.
Consider the following example: ”What is the average tem-
perature in last 30 minutes in Dublin city”. This simple
example poses an aggregate query on weather temperature
readings of all weather stations in Dublin city. Unfortu-
nately, supporting such multidimensional analytical queries
on sensor data is still challenging in terms of complexity,
performance and scalability. In particular, these queries
imply heavy aggregation on large amount of data points
along with computation-intensive spatial and temporal fil-
tering conditions. Moreover, the high update frequency and
large volume of natures of our targeted systems (ten thou-
sand updates per seconds on billions of records already in
the storage) will add up the burden of answering the query
within some seconds or milliseconds. On top of that, by
their nature, such systems need to scale to millions of sen-
sor sources and years of data.
Motivated by such challenges, we propose an elastic spa-
tiotemporal query engine, which is able to index, filter and
aggregate a high throughput of sensor data together with a
large volume of historical data stored in the engine. The
engine is backed by distributed database management sys-
tems, i.e., OpenTSDB for temporal data and ElasticSearch
for spatial data, so that it enables us to store billion data
points and ingest million records per second while it is still
able to query live data streaming from sensor sources.
2. System Architecture
To systematically address the short-comings in the previ-
ous section, we exploit multidimensional nature of Linked
Sensor Data to parallelise the writing and query operations
on distributed computing nodes. This decision is also in-
spired by the study of [1, 2] showing that, the processing
in Big RDF Graph could be parallelised efficiently by parti-
tioning the graph into smaller subgraphs to store in multiple
processing nodes. Following will be the overview of the ar-
chitecture associated with indexing strategies to deal with
temporal and spatial aspects of the Linked Sensor Data.
RDF$
Parser$
Triple$Analyzer$
Triple$pa2erns$
Recogni6on$
Rules$
Temporal)
Rules)
Text)
Rules)
Spa0al)
Rules)
Spa6al,$
Text$En6ty$
Indexer$
Temporal$
En6ty$
Indexer$
Elas6c$
record
s$
Open)TSDB)
RDF$
Triple)
Storage)
En6ty$
Producer$
Raw$
data$
TSDB$
records$
Query$Processing$
Module$
Search$
Query$
Results$
Data$Manager$
Triple$Router$
Figure 1: System architecture
3 Conclusions and Future Work
The need for efficient querying on massive amount of
sensor data lies at the heart of most sensor data analytics
platform. In this paper, we present our recent effort on
leveraging the linked data and NoSQL technologies to ef-
fectively manage sensor data. Our approach provides not
only a complex spatiotemporal query functions to the users
but also proves the ability to handle billions of sensor data.
Our experimental results show that this approach is both fast
and scalable.
For the future work, we expect to adapt a distributed
triple store to our system. Furthermore, we are implement-
ing some query optimisation algorithms to speed up the
query performance. Whilst our system still has its limita-
tion, it is a step towards providing high performance spa-
tiotemporal query engine in the IoT world.
4 Acknowledgments
This publication has emanated from research supported
in part by a research grant from Science Foundation Ireland
(SFI) under Grant Number SFI/12/RC/2289, Irish Research
Council under Grant No. GOIPD/2013/104 and by Euro-
pean Union under Grant No. FP7-ICT-608662 (VITAL).
References
[1] J. Huang, D. J. Abadi, and K. Ren. Scalable sparql querying
of large rdf graphs. In Proceedings of the 33rd VLDB, pages
1123–1134, 2011.
[2] K. Lee and L. Liu. Scaling queries over big rdf graphs with se-
mantic hash partitioning. In Proceedings of the VLDB, VLDB
’07. VLDB Endowment, 2014.
INSIGHTSC [3]
An Infrastructure to Integrate Open Public Health Data and Predicting Health
Status
Jaynal Abedin, Ratnesh Sahay, Dietrich Rebholz-Schuhmann
Insight Centre for Data Analytics
E-mail jaynal.abedin@insight-centre.org
Abstract
In recent years many datasets are being made publicly
available, sometimes as donor requirement or sometimes
due legislative initiatives. The sources are heterogeneous
and also available in varieties of format. The datasets in-
cludes, public health survey data, laboratory diagnostic re-
sult, and even genotype result also available but little has
been done to integrate these heterogeneous data to ensure
interoperability and also very little research has been done
using integrated data to predict health status of an individ-
ual. Through this work we are proposing an infrastructure
where we will integrate heterogeneous data sources through
semantic web technology and produce linked data and then
will develop predictive model to predict health status of
an individual on top of it. Also there will be intermediate
layer to assess data quality before and after integrating the
sources.
1. Motivation
Over the years healthcare providers used standalone
healthcare information system to provide healthcare ser-
vices which are not compatible to interoperability infras-
tructure. Moreover, multidisciplinary healthcare communi-
ties and decentralized healthcare information system largely
contribute to the non-interoperability issues. To overcome
interoperability issues several initiatives has been taken over
the years.
Though there has been several initiative taken over the
years to integrate data sources to ensure interoperability but
there is lack of assessing data quality before and after inte-
gration. Also there scarcity of literature on aligning multi-
ple dataset in uniform manner, like a dataset might not con-
tain all the indicator which are contained in other dataset.
There is a need to develop algorithm which will be able to
impute those information and dataset across sources will be
uniform.
2. Problem Statement
There is lack of infrastructure to semantically integrate
open public health data from heterogeneous sources and
there is lack of algorithm to create uniform data sets across
sources and utilize the integrated data sources to develop
predictive model to predict individuals health status
3. Related Work
Bischof et.al [1] has proposed an infrastructure in smart
city domain to semantically integrate heterogeneous data
sources and published their integrated data as linked open
data. Our approach little bit similar to this approach but we
will have different layers with various functionality
4. Research Objective
To develop an infrastructure to semantically integrate
heterogeneous public health data sources, indicator align-
ment and develop predictive model to predict health status
of individuals.
5. Proposed Infrastructure
In the proposed infrastructure there will be nine differ-
ent layer with specific functionality. Each layer has specific
task to perform as below:
Input Layer (IL): Identifying publicly available health
data sources including, survey data, laboratory diagnostic
data, Genomic data and Meta data. Pre-Processing Layer
(PPL): In this layer we will assess data quality against meta
data of the sources. Semantic Annotation Layer (SAL):
Convert to linked data (RDF), Semantically Annotate with
appropriate schema and ontology. Inter-Linking Layer
(ILL): Semantically link all of the data sources. Indica-
tor Alignment Layer (IAL): In this layer align indicators
across dataset, predict indicators if there is any missing indi-
cator in a dataset through statistical modeling (non paramet-
ric or semi-parametric predictive model). Assessing Qual-
ity of Indicator Alignment (AQIA): Assess quality of in-
dicator alighment algorithm developed in IA layer. Inte-
grated Data Layer (IDL): Store semantically integrated
data set in a way that it can be queried using SPARQL query
language. Predictive Modelling Layer (PML): Develop
predictive modeling using integrated dataset on specific dis-
ease of interest. Output Layer (OL): Store the modeling
result in linked data form e.g. RDF.
6. Future Work
Once the infrastructure has been developed we can then
develop web app on top of it where users can provide his
own data which will be conferted to linked data. Then the
usuer can see the health status based on his own data and
other publicly available dataset.
References
[1] S. Bischof, C. Martin, A. Polleres, and P. Schneider. Collect-
ing, integrating, enriching and republishing open city data as
linked data.
INSIGHTSC [4]
Biological Link Extractor From Literature In Systems Biology
Arindam Halder, Frank Barry
REMEDI/INSIGHT Centre for Data Analytics
Arindam.halder@insight-centre.org
Abstract
Systems biology has been gaining in prominence due to
the ever-increasing sources of generating biological
data. Despite advances in the field of language
processing and data integration we have failed to fully
utilize the data being generated and one of the most
under-utilized resource has been the published
literature. To utilize the gold mine that is the published
literature it’s paramount that we integrate the results
from across publications to build a better
understanding of the underlying biological systems.
1. Introduction
The successful completion of the Human Genome
Project ushered in a period where data generation
became easier and faster with time. This led to a glut of
data, which added to the already huge corpus of data
about proteins, genes, pathways and diseases. The
solution to the information explosion was to create huge
databases, which documented the data, but the
published literature was never satisfactorily mined in a
similar manner. The frameworks to combine data from
various data sources is pretty advanced to give a
overview about the complexity and functionality along
with the completeness of a biological network but the
integration of scientific literature is yet to be
accomplished in a satisfactory manner [1].
2. Motivations And Problem Statement
At REMEDI it was discovered that a protein
‘SPARCL1’ which when applied along with MSC’s to
infarcted heart leads to increased efficiency of the repair
mechanism. The increase in SPARCL1 levels enhanced
the efficiency of tissue repair significantly but no
information exists about the mechanism of action.
Our work presents a framework to automate the
process of knowledge discovery from published
literature, which would lead to better discovery of
hidden links between biological entities. The
framework works on full papers either uploaded by the
user or abstracts retrieved from PubMed based on the
query formulated by the user. By processing the
unstructured text, each paper is converted to a sub-
graph based on all the genes/proteins co-occurring with
a list of pre-defined interaction verbs with the co-
occurring gene/protein pairs forming an edge in the
graph.
3. Proposed Solution
This section will elaborate the various methods applied,
which helps in creating sub-graphs and eventually a
network by merging the sub-graphs. The whole
processing pipeline of the text and how it is annotated
to derive meaningful relationships and generate the sub-
graphs is illustrated in Figure 1.
Figure 1: The Architecture
The text is filtered based on a dictionary of interaction
verbs (3548) extracted from the GENIA corpus [2], acts
as a trigger. Simultaneously a POS tagger is used to
find if nouns are co-occurring with the words in our
dictionary. This initial filtering of text based on co-
occurrence of nouns and verb is to filter out non-
essential sentences for the faster tagging by GENIA
tagger [2]. After tagging by the GENIA tagger the text
with presence of nouns but no gene/protein tags are
checked for presence of gene/proteins missed by the
tagger against a dictionary of protein and gene names
from Uniprot and HGNC to weed out false negatives.
The tool was evaluated against the existing solutions
namely ABNER and BANNER and we achieved
precision values based on the GENIA corpus
comprising 2000 abstracts.
4. Future Work
The mechanism of action as predicted by the pipeline is
being presently tested in the biological lab. The tool
will be further evaluated on other major biological
corpuses like AIMED and BIOINFER.
5. References
[1] Chen Li, Maria Liakata, and Dietrich Rebholz-Schuhmann
Biological network extraction from scientific literature: state
of the art and challenges.
[2] J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii GENIA
corpus—a semantically annotated corpus for bio-textmining.
Bioinformatics (2003) 19 (Suppl 1): i180-i182 doi:
10.1093/bioinformatics/btg1023.
INSIGHTSC [5]
Complex Reasoning over Big Data Streams with Answer Set Programming∗
Thu-Le Pham, Alessandra Mileo
Insight Centre for Data Analytics, NUIG
E-mail: {thule.pham, alessandra.mileo}@insight-centre.org
Abstract
This paper addresses the problem of performing complex
non-monotonic reasoning over dynamic data. We propose a
data-driven approach to examine how the stable model se-
mantics of Answer Set Programming can be used for scal-
able processing big data streams.
1. Introduction
One of the significant challenges arising in the reason-
ing community nowadays is the ability to produce timely
new results from data streams continuously. State-of-the-art
reasoners traditionally focus on performing complex rea-
soning tasks over static data where there are no hard con-
straints on the response time. Moreover, the expressivity
of a reasoner is known to be inversely related to its per-
formance. Therefore, building scalable complex reasoning
systems over streaming data becomes a difficult task.
Answer Set Programming (ASP) with its stable
model semantics is well-known as a powerful high-
expressive declarative programming language to represent
rich knowledge structures and the ability of managing de-
faults, common-sense, preferences, recursion, and non-
determinism. However the high expressivity of ASP comes
at the expense of the efficiency, which makes ASP-based
reasoning systems over streams harder to scale. How can we
perform ASP-based reasoning over big data streams main-
taining scalability?
2. Related Work
Authors in [4] focus on distributed methods for non-
monotonic rule-based reasoning which tries to achieve bet-
ter scalability using the MapReduce framework. Their cur-
rent work performs parallel reasoning with well-founded
semantics which is good for highly parallelizable prob-
lems, not for computationally complex ones. Authors in
[3] present the StreamRule framework, which combines a
stream processing engine (CQELS) and a non-monotonic
reasoner (Clingo) as a one-direction processing pipeline.
The current work in [2] focuses on enabling adaptivity
for StreamRule by studying correlations between streaming
rate and window size. These works show that we can find an
optimal window size for a given streaming rate for reducing
the processing time of the reasoning layer in StreamRule.
However, this conclusion holds if and only if there is no de-
pendency between input data for the reasoning component.
∗This research has been partially supported by SFI under grant No.
SFI/12/RC/2289 and EU FP7 CityPulse Project under grant No.603095.
3. Problem Statement & Proposed Solution
The basic idea for tackling the research question in Sec-
tion 1 is to divide input data into chunks, each chunk fed to
an ASP reasoner to perform complex reasoning tasks. In [2]
we argue that if the processing time for the whole input data
in one computation is monotonically increasing, we can re-
duce that time by reasoning (even) sequentially on indepen-
dent subsets of input data. In this paper, we follow this
hypothesis. However, reasoning sequentially on chunks of
data has to guarantee that the final results must be the same
as if we process the whole input data in one computation.
Therefore, the problem is refined as follow:
Given an ASP logic program P (a set of rules), and input
data I, find subsets I1, ..., In of I such that:
I =
[
i=1..n
Ii and ANS(P, I) =
[
i=1..n
ANS(P, Ii)
where ANS(P, Ii) are the answer sets (results) of rea-
soning on P and Ii.
In order to find I1, ..., In to satisfy two above equations,
we take into account some structural information of the
logic program P for studying the dependencies among ele-
ments in I. We need to extend the concept of dependency
graph of P [1] because this definition considers only the
relationship between a positive IDB (intensional database)
predicate in the body with a predicate in the head of a rule
in P. We intend to extend the dependency graph by consid-
ering: i) the (transitive) correlation between two predicates
in a body of a rule, ii) not only positive literals but also neg-
ative ones.
We believe that this extended dependency graph can help
to reduce the reasoning time of a single system while main-
taining the correctness of the results by splitting input data.
Moreover, it can help to enable the parallelism for perform-
ing ASP reasoning.
References
[1] F. Calimeri, S. Perri, and F. Ricca. Experimenting with paral-
lelism for the instantiation of asp programs. Journal of Algo-
rithms, 63(1):34–54, 2008.
[2] S. Germano, T.-L. Pham, and A. Mileo. Web stream reasoning
in practice: on the expressivity vs. scalability tradeoff. In Web
Reasoning and Rule Systems, pages 105–112. Springer, 2015.
[3] A. Mileo, A. Abdelrahman, S. Policarpio, and M. Hauswirth.
Streamrule: a nonmonotonic stream reasoning system for the
semantic web. In Web Reasoning and Rule Systems, pages
247–252. Springer, 2013.
[4] I. Tachmazidis, G. Antoniou, and W. Faber. Efficient com-
putation of the well-founded semantics over big data. arXiv
preprint arXiv:1405.2590, 2014.
INSIGHTSC [6]
Data Agnostic Management Systems for The Internet of Things
Zeeshan Jan, Aqeel Kazmi, Martin Serrano
Insight Centre for Data Analytics, NUIG
{first}.{last}@insight-centre.org
Abstract
In the Internet of Things area (IoT), interoperability is
seen as the way to align different levels of data that also
use different representation models. Data Management
Systems in IoT are becoming the main bottle-neck when
a high level of data interoperability is required. IoT
systems that store heterogeneous data sets, called silos,
are unable to communicate with each other at the data
level. This work aims for bringing interoperability at
one common layer by using semantics and for storing
the heterogeneous data generated by different IoT
systems and by means of a common data format using
ontologies. A Hybrid DMS platform has been
prototyped for handling RDF and JSON-LD formats
providing interoperability for IoT data. The DMS uses
internal and external data representations of collected
IoT data and this is how DMS targets data integration
and application interoperability.
1. Introduction
Managing data generated from various IoT platforms is
potentially a big challenge when it comes to be
necessary storing, managing and making it available for
other components of a system. Traditionally each IoT
system uses its own taxonomy to store data generated
by ICOs. For storing such heterogeneous data at one
place, there is a need of common taxonomy. As
technology evolved, developers have shifted their
interest from RDF to JSON-LD since data
representation in JSON-LD is closer to the development
tools and developers would not bother to learn RDF.
Our solution is to come up with an ontology and to
store JSON-LD data chunks based on our ontology
generated by different IoT systems and then making this
data available for other components of the system in
three data formats i.e., JSON-LD, JSON and RDF.
2. Background
The Internet of Things has been a hot topic for research
for the past years. There are a numerous IoT platforms
producing and managing data generated by ICOs
(called sensors), in specific domains like health care
and smart traffic. There is a noticeable emergence of
IoT tools available for public to connect their ICOs and
push their data into cloud systems [1]. Recent examples
are Xively [2], Ubidots [3] and OpenIoT [4], which
allow various applications of different domains to
exploit the data. Data silos of the IoT platforms
producing data have a big potential for more research
and development if they could communicate with each
other. It can also open another space of applications to
cope with challenges being faced by IoT platforms.
There is a need of an architecture that potentially could
manage and integrate that diversity of data formats and
even more, that can provide the management
mechanism to handle the information indistinctly.
3. Proposed Solution
VITAL project defines the paradigm of a System of
Systems, which manages the data from different IoT
systems producing data (i.e. data silos) and makes it
available for exploitation. VITAL schemas define the
taxonomy to cover up various domains from the IoT
platforms that have been working connected to VITAL,
and then the data from those IoT platforms is wrapped
into VITAL ontology and stored. In VITAL DMS PPI
(Platform Provider Interfaces) are made responsible for
wrapping up the data generated by IoT platforms and
then pushed into VITAL system. VUAIs (Virtualized
Unified Access Interfaces) are the different components
responsible for exploiting the enriched VITAL datastore
in three available data formats i.e., JSON-LD, JSON
and RDF. DMS acts like a data-model-agnostic system,
and its main objective is to maintain models of data
being pushed into it, data-modeling is done at PPI level
where it is supposed to wrap the ICO’s data into VITAL
ontology and then push it into DMS.
Fig 1: DMS communicates with PPI and VUAIs
8. References
[1] J. Soldatos, M. Serrano and M. Hauswith. “Covergence of
Utility Computing with the Internet of Things” , International
Workshop on Extending Seamlessly to the Internet of Things
(esIoT), collocated at the IMIS-2012 International
Conference, 4th
6th
July, 2012, Palermo, Italy.
[2] Xively – [Online]. Available: http://www.xively.com
[3] Ubidots – [Online]. Available: http://www.ubidots.com
[4] OpenIoT – [Online]. Available: http://www.openiot.eu
[5] VITAL Project – [Online]. Available: http://www.vital.eu
[6] H. Chen, F. Perich, T. Finin, and A. Joshi, "Soupa:
Standard ontology for ubiquitous and pervasive applications,"
in Mobile and Ubiquitous Systems: Networking and Services,
International Conference on, 2004.
INSIGHTSC [7]
Dynamic Deployment of Multi-Query Internet of Things Data Services in
Federated Cloud Environments
Salma Abdulaziz, Martin Serrano
Insight Centre for Data Analytics, NUIG
{salma.abdulaziz, martin.serrano}@insight-centre.org
Abstract
The fast proliferation of the Internet of Things platforms
and the dynamic deployment of their applications in the
cloud raise many research concerns regarding their
integration in a semantic interoperable way. The
OpenIoT platform solved the interoperability problem
using the concept of data virtualization and the SSN
ontology. However, Integrating different OpenIoT
platforms in the cloud is not addressed yet. This study
focuses on extending the OpenIoT functionalities to
enable IoT data services federation in the cloud. This
federation will be the key enabler of smart cities.
1. Motivation
The Interoperability of different IoT testbeds problem
rises from the fact that these testbeds are in different
geographical locations and are administratively
dispersed. Moreover, they use different kinds of sensors
with different technologies.
The OpenIoT platform made it possible to collect data
from any IoT testbed regardless of the used technology
and regardless of its location using the concept of data
virtualization, linked data technologies and SSN
ontology. This enables the semantic interoperability of
IoT services in the cloud. However, integrating
different OpenIoT platforms in a way that it becomes
possible for one platform to collect data from another
platforms is still challenging. This is a typical problem
in smart cities where information systems measuring
different aspects about the city functions need to be
integrated with other information systems to provide
smart cities with new services based on this interaction,
even if the systems are located at different geographical
locations. This will enable new data exchange
applications to be built on the top of this interaction(s).
2. Problem Statement
However, the integration of different geographically
and administratively dispersed IoT platforms is
becoming crucial for enabling smart cities, there is not
an easy and dynamic way to enable this federation of
platforms in the cloud due to the usage of different
technologies and the need of accessing these platforms
from the same place at the same time with almost zero-
programming effort
3. Related Work
The OpenIoT platform proposed in [1] is the latest
technology addressing the problem of gathering data
from heterogeneous IoT testbeds. It managed to
virtualize the sensor data to make it independent of the
used sensor technology. However, interaction between
different OpenIoT platforms to access more than one
IoT testbed at the same time in not addressed yet.
4. Research Question
How to integrate different OpenIoT platforms
connected to different IoT testbeds so that every
platform can access the sensor data of other platforms
in the cloud without the need of any programming
effort or configurations from the user perspective?
5. Proposed Solution
Different OpenIoT instances will be implemented in the
cloud. Each OpenIoT instance will be connected to a
group of sensors measuring a certain environmental
condition. Two approaches will be investigated. The
first will be extending the current OpenIoT platform by
implementing a user interface that enables data
exchange between different federated platforms. The
other approach is to make each platform to
automatically push its sensor data to other platforms
available in the cloud so that each platform will have
the ability to access the data of other platforms without
the need to request it every time. These two approaches
will be compared in terms of data availability,
performance and scalability. Based on this comparison,
one of them will be implemented to extend the OpenIoT
platform functionalities together with a third party
entity called the federator. The federator is to monitor
all the OpenIoT instances in the cloud and keep records
about each of them. All the platforms will be connected
to this federator for consultancy about other platforms.
The configuration and implementation of sensor Multi-
Queries will be addressed as well so that it becomes
possible for one platform to be accessible by more than
one other platform at the same time.
6. Evaluation
The evaluation will be based on the user experience
using this extended functionality of the OpenIoT
platform to gather data from other instances with zero-
programming effort where the time taken to share the
sensor data, stability and database usage will be
measured for evaluation
7. References
[1] J. Soldatos, N. Kefalakis, M. Hauswirth, M. Serrano, J.
Calbimonte, M. Riahi, K. Aberer, P.Jayaraman, A. Zaslavsky,
I. P. Zarko, L. Skorin- Kapov, and R. Herzog, “OpenIoT:
Open Source Internet-of-Things in the Cloud,” in
Interoperability and Open-Source Solutions for the Internet of
Things, 2015, vol. 9001, pp. 13–25. 

INSIGHTSC [8]
Diminishing Business Challenges by Improving Open Data Business Model
Fatemeh Ahmadi-Zeleti
Insight, NUIG
fatemeh.ahmadizeleti@insight-centre.org
Abstract
Growing list of business models is observed however,
on closer examination, they are not clearly delineated
and lack clear value orientation. Therefore,
understanding of value creation and exploitation
mechanisms in existing open data businesses is difficult
and challenging. This gap is in the center of this
research work focused on development of open data
business model (ODBM) to diminish difficulties and
challenges open data businesses face and to unlock the
real value of open data.
1. Introduction
Large numbers of businesses are seeking to tap into
the potential of open data. As new entrants flood the
marketplace, businesses are seeking to uniquely
position themselves through specialization to create and
capture value for their customers. Business models are
conceptual instruments for describing how value is
created and revenue is generated however, there are
very few scholarly studies available on business models
to harness the potential value of open data.
2. Related Work
Several business models (15 identified) [1] and
business model frameworks have been proposed in the
literature. The well-known Osterwalder and Pigneur
business model canvas (nine building-blocks), Shafer,
Smith and Linder framework (four building-blocks) and
Hamel Business Model (four building-blocks) are the
three most cited business model frameworks.
In my study, I adopt the notion of business model
provided by Osterwalder which considers a business
model as a conceptual tool that contains a set of inter-
related elements that allows a company to make money.
3. Early Results
Grounded in the extant literature of business models,
6-Values (6-Vs) business model conceptual framework
(Figure1) is developed that exploits the key components
of a business model and their interrelations [1].
Figure 1: The 6-Vs Conceptual Framework
Based on the 6-Vs framework, the 15 business models
are elaborated, characterized, and analyzed in open data
context. The result of this analysis is the emergence of
five open data business model patterns and four
business disciplines (Figure 2) [1].
Figure 2: Patterns and Value Disciplines
5. Research Questions and Hypotheses
The challenges and difficulties associated with open
data business models lead to the following research
questions:
1. What revenue pattern can be observed? 2. What
are the core capabilities? 3. What are the business
success factors?
Key hypotheses related to our research questions
are: H1. Higher utilization rate of open data products
and services determines the pricing method, H2.
Increasing transparency increases the value of the open
data products and services, H3. Multiple open data
revenue streams positively affect the profitability of the
business, H4. Added value to open data products and
services leads to higher profit.
6. Current and Future Work
A survey was carefully designed, tested, and
distributed to 250 globally located companies (from
over 1,500 examples of Open Data Impact Map
database). To better understand the un-known aspects,
interviews will be conducted. Through R and Tableau,
data are analyzed in order to empirically test the 6-Vs
model and to develop ODBM.
8. Reference
[1] F. A. Zeleti, A. Ojo, and E. Curry, “Business
Models for the Open Data Industry:
Characterization and Analysis of Emerging
Models,” in 15th Annual International
Conference on Digital Government Research,
2014.
INSIGHTSC [9]
DInfra - Distributional Infrastructure
Siamak Barzegar , Juliano Efson Sales ,Brian Davis
Insight Centre for Data Analytics
E-mail author.name@insight-centre.org
Andre Freitas
Department of Computer Science and Mathematics, University of Passau, Germany
Firstame.Lastname@uni-passau.de
Abstract
The distributional semantic infrastructure presents an in-
frastructure for computing multilingual semantic related-
ness and correlation for twelve natural languages by using
five distributional semantic models (DSMs). The software
infrastructure - DInfra (Distributional Infrastructure) pro-
vides researchers and developers with a easy-to-use plat-
form for processing large-scale corpora and conducting ex-
periments with distributional semantics. The infrastructure
integrates several multilingual DSMs, so end user can ob-
tain a result without worrying about the complexities in-
volved in building DSMs. The DInfra webservice allows the
users to have easy access to a wide range of comparisons of
DSMs with different model parameters. In addition, users
can configure and access DSM parameters using a easy to
use API.
1. Introduction
Distributional semantics is built upon the assumption
that the context surrounding a given word in a text provides
important information about its meaning [2], [3]. Distribu-
tional semantics focuses on the construction of a semantic
representation of a word based on the statistical distribution
of word co-occurrence in unstructured data.
2. The Distributional Infrastructure
DInfra is an implementation of Explicit Semantic Anal-
ysis (ESA), Latent Semantic Analysis (LSA), Random In-
dexing based on the EasyESA [Carvalho et at. 2014] and
S-Space [Jurgens et al. 2010], GloVe and word2Vector
models. ESA, LSA and RI models are based on Vector
Space Models but GloVe and Word2Vec models are based
on Deep Learning Models1
. The service runs as a JSON 2
webservice, which allows users to submit queries for simi-
lar terms in a multilingual fashion bases on a semantic re-
latedness measure which use Spearmans correlation to test
relatedness scores. We also consider Wikipedia corpora for
the years (2006, 2008, 2014) and ukWaC [1] corpus from
which to build the vectors. The DInfra webservice allows
the user to obtain semantic similarity using Spearman cor-
1Deep learning is a branch of machine learning based on a set of algo-
rithms that attempt to model high-level abstractions in data by using model
architectures, with complex structures or otherwise, composed of multiple
non-linear transformations [Li Deng and Dong Yu. 2014
2JSON - Java Script Object Notation
relation for 12 natural languages. Our service can be tested
online. It includes two components: 1- Semantic Related-
ness that calculates the words similarity, 2- Correlation that
calculates the spearmans rank correlation.
Figure 1: Screenshot of DInfra web service
3. Multilingual Analysis in DInfra
The evaluation will consist of two scenarios:
Scenario One: Translated similarity datasets3
by expert
translators from a well known localisation company4
were
used and evaluated the distributional semantics infrastruc-
ture using Wikipedia as a corpus for 12 different languages
has been evaluated by spearman’s rank correlation.
Scenario Two: Used an automatic machine translation to
translate the word pairs in different languages to the English
language and evaluate them in the English corpus.
References
[1] M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The
wacky wide web: a collection of very large linguistically pro-
cessed web-crawled corpora. Language resources and evalu-
ation, 43(3):209–226, 2009.
[2] Z. S. Harris. Distributional structure. Word, 1954.
[3] P. D. Turney, P. Pantel, et al. From frequency to meaning:
Vector space models of semantics. Journal of artificial intel-
ligence research, 37(1):141–188, 2010.
3WordSim353 (WS353), the Rubenstein & Goodenough (RG) (1965)
and Miller & Charles (MC) (1991)
4Lionbridge Technologies, Finland, Natural Language Solutions Team
INSIGHTSC [10]
Acknowledgment: This publication has emanated from research supported in part by the research grant from Science Foundation
Ireland (SFI) under Grant Number SFI/12/RC/2289 and EU project SIFEM (contract Number 600933).
DISCOV3R: Discovering Life-Sciences Datasets in LS-LOD Cloud
Muntazir Mehdi, Ratnesh Sahay and Dietrich Rebholz-Schuhmann
The Insight Centre for Data Analytics, National University of Ireland Galway
{muntazir.mehdi, ratnesh.sahay, rebholz}@insight-centre.org
Abstract
A significant portion of the Linked Open Data (LOD)
cloud consists of Life Sciences datasets known as Life
Sciences Linked Open Data (LS-LOD) Cloud contains
billions of clinical facts that interlink to form a “Web of
Clinical Data”. However, tools for new publishers to
find relevant datasets that could potentially be linked to
are missing, particularly in specialist domain-specific
settings. Based on a set of domain-specific keywords
extracted from a local dataset, we propose methods to
automatically identify relevant datasets from LS-LOD
Cloud.
1. Introduction
The LOD Cloud is composed of datasets published
by different publishers coming from academia,
government organizations, online communities and
companies alike. Most of the datasets within the LOD
Cloud are accessible via at-least one SPARQL
endpoint. The Datahub1
or Mannheim Linked Data2
catalogue lists such SPARQL endpoints available on
the Web. The LOD Cloud comprises of 500 million
links across datasets3
, following the fourth Linked Data
principle: “links to related data”.
However, creating links with external LOD datasets
is a challenging task for publishers. Addressing this
challenge, a number of linking frameworks, such as Silk
[2] and LIMES [3], have been proposed. Given that
there are now hundreds of remote datasets and many of
them are black-boxes that do not describe their content
[1], it becomes a challenging task for a publisher to find
relevant datasets. In certain cases, the content of remote
datasets are usually described using VoID4
, SPARQL
1.1 Service Descriptions, and specialized vocabulary (or
ontology), which may help, but these are not available
for many endpoints [1].
The most general option is to consider the SPARQL
endpoints of datasets as black boxes whose content is
opaque and directly query them to determine their
relevance. In our work, we explore this option by
assuming that high-quality, representative set of
domain-specific keywords is made available as input to
the process; this set of keywords may be extracted from
any local source in any format - such as a taxonomy, a
relational schema or a term dictionary. Based on this set
of domain-specific keywords, we propose to directly
probe SPARQL endpoints with queries to determine
their relevance.
1
http://datahub.io/group/lodcloud
2
http://linkeddatacatalog.dws.informatik.uni-
mannheim.de/dataset
3
http://lod-cloud.net/state/
4
http://www.w3.org/TR/void/
2. DISCOV3R
A generic workflow of the DISCOV3R framework is
given in Figure 1. For our use-case, we use a set of
clinical terminologies (CTerms) used to define a human
ear. The terminologies, in their original form are used to
prepare query terms (QTerms) which will be later used
by three different approaches to discover relevant
datasets from LS-LOD Cloud. The CTerms are first
filtered using string manipulation techniques. Using
language processing tools, the stopwords or general
terms are eliminated from the CTerms. Finally, a set of
QTerms are created by generating n-grams of the
CTerms with order preserved. Once QTerms are
generated, the QTerms are then forwarded to three
different discovering techniques:
Figure 1. DISCOV3R Workflow
DMatch: The Direct Matching (DMatch) approach
makes use of the original QTerms without any
modification to search for the term using a SPARQL
query for direct literal matching.
μMatch: The Multi Matching (μMatch) approach
expands the DMatch approach by creating multiple case
and language-tag variants for each QTerm so as to
generate more hits.
DetMatch: The Detect Matching (DetMatch)
approach expands the μMatch approach by removing
duplicate literal matching results and reducing the
overall query execution time.
Results: In our experiments, we consider a total of
222 CTerms and 35 SPARQL endpoints. Out of 222
CTerms, 124 were found on 18 datasets for DMatch,
203 were found on 23 datasets for both μMatch &
DetMatch. The average execution time for DMatch
approach was 0.16 seconds, for μMatch & DetMatch
approaches was 1.01s & 0.17s respectively.
References
1. Buil-Aranda, Carlos, et al. "SPARQL web-querying
infrastructure: Ready for action?." The Semantic Web-
ISWC 2013. Springer Berlin Heidelberg, 2013. 277-293.
2. Volz, Julius, et al. "Silk-A Link Discovery Framework
for the Web of Data."LDOW 538 (2009).
3. Ngomo, Axel-Cyrille Ngonga, and Sören Auer. "Limes-a
time-efficient approach for large-scale link discovery on
the web of data." IJCAI'11.
INSIGHTSC [11]
Discovering Hidden Structures for Quality Assessment
Emir Muñoz
Fujitsu Ireland Ltd. and Insight Centre for Data Analytics
E-mail: emir.munoz@insight-centre.org
Abstract
Despite all the structured data available on the Web, users’
capability to exploit that data is limited. To achieve that
users need an understanding of 1) underlying data model,
2) quality of the data, and 3) possible use cases. All com-
ponents that cannot always be found or are not explicit for
the users. Our work focuses in the discovery of implicit
structures present in the Web of Data to aid the users’ un-
derstanding of these components.
1. Introduction
Initiatives like Schema.org1
have helped to the growth
of structured data on the Web, by allowing users to rep-
resent entities like people, places, and things in general
within webpages. However, in many cases structured data
is published without a clear structure or spine which allows:
users’ understanding of the data modelling, assessment of
quality, and identification of use cases. When users try to
query their data (which can be very expensive) they usually
face problems given by an unclear structure of the large data
at hands. First, they need a minimum idea of how the data
looks like in order to perform any task. Constraints are a
fundamental part of design in data modelling and allow to
express structure. Here, we present an ongoing work that
focus on mining constraints from the Web of Data that al-
lows us to build ‘core spines’ for datasets. Hence, users will
not see a piece of data as a black-box anymore, and will gain
insights on how to query and unlock the potential of data.
2. Constraints
A constraint defines certain properties that data in a
dataset must comply with. For example, a RANGECON-
STRAINT indicates that a given property can have only val-
ues of a given type; a MIN/MAXCONSTRAINT indicates
that a given property occurs at least or at most a number of
times. More constraints and validation algorithms can be
found in [1] and [2] using automaton and regular expres-
sions based approaches, respectively.
Constraints in data management are useful for differ-
ent perennial tasks, such as indexing, query optimization,
views, etc. Hence, the discovery of constraints satisfied in
a piece of data becomes a relevant and not trivial challenge.
Although many inconsistencies can be found in the Web of
Data, a strict consideration of constraints would lead to data
lost. Allowing some exceptions can prevent systems from
losing data [4]. Thus, the idea is to consider constraints with
soft bounds that can be violated by individual entities, but
should be respected on average.
1http://schema.org
3. Structure Mining
Discovery of constraints is not a new topic in the
database community. Grahne [3] mines approximate keys
in XML data using a rule mining approach. Recently, the
discovery of keys2
in RDF have gained attraction for their
utility in data linkage [6]. For example, by means of keys
we can determine whether a student is enrolled in the li-
brary and in the rugby team. However, the state-of-the-art
for the Web of Data has only focused on mining keys over
a particular and limited RDF view, which is known as Con-
cise Bounded Description (CBD). CBD does not consider
RDF blank nodes and adds an unexpected complexity to the
problem. For example, to determine if two blank nodes are
value equal we should determine if there exists an isomor-
phism between both graphs, which results to be an NP (GI-
complete) problem [5]. We aim to characterise this problem
as a frequent itemsets mining problem. We believe that fre-
quencies in data will uncover the structure the data have and
allow the users’ understanding.
4. Ongoing Work
In this work, we hypothesize that datasets in the Web of
Data have hidden structures, and that those structures help
users to assess quality of the data. Preliminary results over
DBpedia3
have shown that syntax patterns can be found in
literal predicates. For instance, the ISBN code follows the
ALPHANUM-NUM-NUM-NUM pattern. Currently, our plans
focus on define/discover soft constraints, and process larger
datasets, which will require of scalable methods.
References
[1] P. M. Fischer, G. Lausen, A. Schätzle, and M. Schmidt.
RDF constraint checking. In Proc. of the Workshops of the
EDBT/ICDT, pages 205–212, 2015.
[2] J. E. L. Gayo, E. Prud’hommeaux, I. Boneva, S. Staworko,
H. R. Solbrig, and S. Hym. Towards an RDF validation lan-
guage based on regular expression derivatives. In Proc. of the
Workshops of the EDBT/ICDT, pages 197–204, 2015.
[3] G. Grahne and J. Zhu. Discovering approximate keys in XML
data. In Proc. of the 2002 ACM CIKM, pages 453–460, 2002.
[4] S. Hartmann. Soft constraints and heuristic constraint cor-
rection in entity-relationship modelling. In Semantics in
Databases, 2nd International Workshop, pages 82–99, 2001.
[5] A. Hogan. Skolemising blank nodes while preserving isomor-
phism. In Proc. of the 24th WWW, pages 430–440, 2015.
[6] T. Soru, E. Marx, and A. N. Ngomo. ROCKER: A refinement
operator for key discovery. In Proc. of the 24th WWW, pages
1025–1033, 2015.
2Keys are a particular type that uniquely identifies elements in a collec-
tion, e.g., StudentID in a university context.
3http://dbpedia.org/
INSIGHTSC [12]
Effective Data Visualisation to Promote Home Based Cardiac Rehabilitation
Liam Sexton, David S. Monaghan
Insight Centre for Data Analytics
liam.sexton@insight-centre.org, david.monaghan@dcu.ie
Abstract
Data visualisation can play a crucial role in improving ad-
herence rates in home Cardiac Rehabilitation (CR) pro-
grammes. The research presented here explores the use
of open-source visualisation libraries in a prototype home-
based CR system.
1. Motivation
Cardiovascular disease is a world-wide burden to pa-
tients and health care agencies alike. Traditionally, individ-
uals who have suffered a cardiac event attend centre-based
CR to aid recovery and prevent further cardiac illness, how-
ever participation in these programmes is sub-optimal.
2. Problem Statement
Home-based CR has been introduced in an attempt to
widen access and promote uptake, however moving care
from a dedicated facility has drawbacks; will the patient
adhere to the programme and how does a clinician know
this? This research aims to use effective data visualisations
to promote adherence in home CR programmes.
3. Related Work
In the literature, the most widely cited reasons for pa-
tients not attending CR programmes are facility distance
and reluctance to partake in group classes [2]. Studies
have shown that home-based CR is equally effective as
centre-based CR [1]. Evidence suggests that CR partic-
ipation can be improved by 18% to 30% using patient-
targeted strategies, such as motivational communications,
phonecalls, and home visits [1]. Thus, there is a need for al-
ternative evidence-based approaches to traditional CR that
provide affordable access to effective clinical interventions.
4. Hypothesis
This work attempts to demonstrate that through effective
use of data visualisations in a CR programme, patient adher-
ence can be improved. From a clinician’s perspective, the
use of data visualisations can present a favourable method
of viewing the progress of individuals and groups.
5. Proposed Solution
A number of different data visualisations were developed
in this research. Fig. 1 shows a prototype patient mon-
itoring dashboard developed using Javascript and D3.js.
Weekly goals are set based on WHO recommended tar-
gets and patients are encouraged to meet or exceed previ-
ous daily totals through motivational graphics. Two nor-
mal distribution graphs show patient progress against those
of a chosen peer group, incorporating a social comparison
element. A clinician can become aware of an individuals
non-adherence through the dashboard. However, to get a
sense of overall programme participation additional infor-
mation must be communicated. Fig. 2 shows a heatmap
that has been created to show mass patient (y-axis) partici-
pation (cell colour) over time (x-axis).
Figure 1: Prototype patient monitoring dashboard
P
a
r
t
i
c
i
p
a
n
t
I
D
1
2
3
4
5
6
7
7
8
9
1
0
1
1
1
2
1
3
1
4
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
8
2
9
3
0
3
1
3
2
3
3
#
S
e
s
s
i
o
n
s
0
3
5
8
1
0
F
e
b
2
0
1
3
A
p
r
2
0
1
3
J
u
n
2
0
1
3
J
u
n
2
0
1
3
A
u
g
2
0
1
3
O
c
t
2
0
1
3
D
e
c
2
0
1
3
F
e
b
2
0
1
4
A
p
r
2
0
1
4
J
u
n
2
0
1
4
A
u
g
2
0
1
4
O
c
t
2
0
1
4
D
e
c
2
0
1
4
F
e
b
2
0
1
5
Figure 2: Heatmap showing group adherence over time
6. Evaluation & Future Work
The heatmap visualisation has been evaluated using pa-
tient data from the MedEX CR programme. Relationships
and patterns were found that prompted further research
questions into quantifying the statistical likelihood of pa-
tient adherence. The patient dashboard is still in develop-
ment and will be integrated into a larger home based CR
system in the future.
References
[1] H. M. Dalal, A. Zawada, K. Jolly, T. Moxham, and R. S.
Taylor. Home based versus centre based cardiac rehabilita-
tion: Cochrane systematic review and meta-analysis. Bmj,
340:b5631, 2010.
[2] P. Davies, F. Taylor, A. Beswick, F. Wise, T. Moxham,
K. Rees, and S. Ebrahim. Promoting patient uptake and adher-
ence in cardiac rehabilitation. Cochrane Database Syst Rev,
7(7), 2010.
INSIGHTSC [13]
Enabling a better Collaboration and Communication through
Personalizations
Anne Helmreich, Martin Serrano
Insight Centre for Data Analytics
{anne.helmreich, martin.serrano}@insight-centre.org
Joerg Haehner
University of Augsburg
Abstract
Many people are working indeed in the same company
but don’t work together in terms of collaborations or in-
formation sharing. To solve this problem we developed
C&C as an approach to support collaboration and com-
munication. The implemented prototype ExpertO (Ex-
perts open collaboration) assists staff members in find-
ing collaboration partners through personalization.
1. Motivation
Let it be computer science, medicine or any other
topic: we are living in a fast changing world. For any
topic there are researches searching for a new and/or
better solutions to improve the world. All those topics
can’t be treated separately anymore, as the relatively
new fusion in medical computer science demonstrates.
There are two options to deal with this trend: 1) in-
venting everything from scratch or 2) work with some-
one experienced.
2. Problem Statement
Nowadays it is quite difficult to find an expert in
a specific topic because of the information overflow in
the internet and the leak of proper information shar-
ing even inside companies [2]. A survey pointed out
a leak of collaboration and information flow between
researchers and experts working in Insight Centre as
well. Although collaboration could save money, time
and nerves, it seems to be difficult to bring people to-
gether. In fact, the employees do not know who else is
researching in the same research field or who has some
desired expertise to discuss about.
3. Proposed Solution
As many people are unsatisfied by the current state
of information flow inside their company, it was neces-
sary to tackle this issue. Our approach is to develop
a system (C&C) which provides an expert finding sys-
tem and an information sharing platform. For each
staff member a profile with some basic data and the
current research topics will be created automatically.
Based on these profiles and through some recommen-
dation/personalization techniques, suggestions of suit-
able individuals will be created for each member. In
addition, a forum, a messaging systems and an event
systems will help the staff members to improve knowl-
edge sharing inside the company. IoT-sensors could
collect data about staff members’ location to enhance
fellow-recommendations by including context informa-
tion in the recommendation process.
4. Evaluation
To evaluate this approach, expertO was developed
as a prototype taking off with personalized suggestions
as main function. Figure 1 shows an overview of the
approach. The prototype is coming as a web appli-
cation which is available for both staff members and
visitors (after registration). The platform is welcom-
ing the user with a bunch of suggested profiles which
could be of interest. Furthermore it offers the possibil-
ity to add/delete research topics, interests, expertise
as well as social activities to the own profile. ExpertO
provides a topic search for specific quests. In case the
user is interested in a profile, more details about the
suggested profile are provided including the seating lo-
cation (currently only NUIG). Google Analytics will
measure the usage of the platform.
5. Future steps
As expertO is a prototype there are plenty of ideas
waiting. Next steps are the implementation of a mes-
saging system for communicating through the plat-
form, a forum to enable information sharing and an
event calendar with the possibility to create and join
events as well as to get event recommendation. The
final goal is the integration in the VITAL [1] platform.
The expertO platform can be in use while restocking
bit by bit with new features and improved recommen-
dation algorithms.
Figure 1: ExpertO Platform
References
[1] T. V. Consortium’. Vital, 2015.
[2] J. Hohu and A. Saksena’. Expert collaboration: Dy-
namic access to distributed expertise will shape suc-
cessful enterprises, 2011.
INSIGHTSC [14]
EnRG: Entity Relatedness Graph
Nitish Aggarwal
Insight Centre for Data Analytics
National University of Ireland, Galway
nitish.aggarwal@insight-centre.org
Abstract
Wikipedia provides an enormous amount of background
knowledge to reason about the semantic relatedness be-
tween two entities. In order to find out all the related en-
tities for a given entity, an entity relatedness graph (EnRG)
is constructed, where the nodes represent Wikipedia enti-
ties and the relatedness scores are reflected by the edges.
Wikipedia contains more than 4.1 millions entities, which
required efficient computation of the relatedness scores be-
tween the corresponding 17 trillions of entity-pairs. We
compute the relatedness between two entities by quantifying
the distance between the corresponding high-dimensional
vectors, built over Wikipedia concepts. We evaluate the
approach on a benchmark that contains the relative entity
relatedness scores for 420 entity pairs. Our approach im-
proves the accuracy by 15% on state of the art methods for
computing entity relatedness.
1. Introduction
Significance of measuring relatedness between entities
has been shown in various tasks which deal with informa-
tion retrieval (IR), natural language processing (NLP), text
analysis or other related fields. In this paper, we present
an Entity Relatedness Graph (EnRG)1
, which can be used
to obtain a ranked list of related entities that is required by
various tasks such as query suggestion in web search, query
expansion, and recommendation systems. Related entities
can be obtained from knowledge bases such as DBpedia or
Freebase by retrieving the directly connected entities. How-
ever, most of the popular entities have more than 1,000 di-
rectly connected entities, and these knowledge bases mainly
cover some specific types of relations. For instance, “Steve
Jobs” and “Steve Wozniak” are not directly connected in
DBpedia graph. Therefore, we need to find related entities
beyond the relations defined in Knowledge base graphs.
2. Proposed Solution
We developed an approach called Wikipedia-based Dis-
tributional Semantics for Entity Relatedness (DiSER),
which builds the semantic profile of an entity by using the
high dimensional concept space derived from Wikipedia.
DiSER generates a high dimensional vector by taking ev-
ery Wikipedia concept as dimension, and the associativity
weight of an entity with the concept as the magnitude of the
corresponding dimension. To measure the semantic relat-
edness between two entities, we simply calculate the cosine
1http://monnet01.sindice.net:8080/enrg/
score between their corresponding DiSER vectors.
DiSER considers only human annotated entities in
Wikipedia, thus keeping all the canonical entities that ap-
pear with hyperlinks in Wikipedia articles. The tf-idf weight
of an entity with every Wikipedia article is calculated and
used to build the corresponding semantic profile, which is
represented by the retrieved Wikipedia concepts sorted by
their tf-idf scores. For instance, for an entity e, DiSER
builds a semantic vector v, where v =
PN
i=0 ai ∗ ci and ci
is ith
concept in the Wikipedia concept space, and ai is the
tf-idf weight of the entity e with the concept ci. Here, N
represents the total number of Wikipedia concepts.
With the Entity relatedness scores obtained by DiSER, we
constructed the EnRG (Entity Relatedness Graph). EnRG is
constructed by calculating the DiSER scores between 16.83
trillions of entity-pairs (4.1 millions x 4.1 millions).
3. Evaluation
We computed semantic relatedness scores for all entity
pairs provided in a gold standard 420 entity pairs. These
scores were obtained by using our proposed method DiSER,
and other state of the art methods: ESA [1], WLM [3] and
KORE [2]. We calculated Spearman Rank correlation be-
tween the gold standard dataset and the results obtained
from different methods. Table 1 shows the results. DiSER
out performs all of the state of the art methods.
Entity Relatedness Spearman Rank
Measures Correlation with human
DiSER 0.781
WLM 0.610
KORE 0.673
Table 1: Spearman rank correlation of relatedness measures
with gold standard
References
[1] E. Gabrilovich and S. Markovitch. Computing semantic relat-
edness using wikipedia-based explicit semantic analysis. In
Proceedings of the 20th IJCAI, 2007.
[2] J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and
G. Weikum. Kore: Keyphrase overlap relatedness for entity
disambiguation. In Proceedings of the 21st CIKM, 2012.
[3] I. Witten and D. Milne. An effective, low-cost measure of
semantic relatedness obtained from wikipedia links. In Pro-
ceeding of AAAI Workshop on Wikipedia and Artificial Intel-
ligence: an Evolving Synergy, 2008.
INSIGHTSC [15]
Extracting Semantic Knowledge from Unstructured Text using Embedded
Controlled Languages
Hazem Safwat
Insight Centre for Data Analytics, NUIG
hazem.abdelaal@insight-centre.org
Abstract
Knowledge extraction from unstructured text is highly
desirable but extremely challenging, due to the inherent
ambiguity of natural language. Although, most of the
efforts are directed towards extracting data from
structured or semi-structured text such as Wikipedia
info-boxes. In this article, we present an architecture
based on embedded controlled languages that can
extract formal semantic knowledge from unstructured
text corpus with a potential to support multilinguality as
well.
1. Motivation
Recent years has seen an increasing deluge of
heterogeneous unstructured text on the Web. While
semantic technologies and text mining systems can
serve an important role in mapping and linking text to
formal knowledge for efficient retrieval and
management, unambiguous processing of natural
language, in particular the ability to capture complex
knowledge statements is far from being solved. In order
to push the semantic web forward, more efforts should
be carried out to extract formal semantic knowledge out
of the unstructured text due to the amount of knowledge
it can deliver.
2. Problem Statement
Information Extraction systems can serve an important
role in extracting knowledge from text by identifying
references to named entities and recognize some
relationships between these entities. However, in order
to extract more complex insights from the text, different
NLP approaches are needed.
3. Related Work
The authors of [1] tried to address a close but not
similar approach for extracting entities from events of a
military dataset. The purpose of this approach is to
build a system that can support knowledge sharing and
decision making across different groups from different
nations without conversion to a technical format.
4. Research Question
How to build a robust system that can extract semantic
knowledge from unstructured text, while keeping the
semantics unchanged and support multilinguality as
well.
5. Hypothesis
While Controlled Natural Languages (CNLs) are not a
perfect catch all solution to manage and process
unstructured text, we argue that rewriting natural
language into CNL could offer an attractive solution in
certain domains where there is a preexisting need for
human orientated CNLs i.e. aviation, legal and public
policy. We propose an extension to the work in [2],
where the author defines a process to embed CNLs in
the grammar of the host languages.
6. Proposed Solution
In order to solve this problem we built an architecture
divided into two phases, the first phase is the pre-
processing that is responsible for preparing the
unstructured corpus for parsing, and the second phase is
the post-processing responsible for extracting the CNL
subtrees and convert them into a structured form as
shown in figure [1].
Figure 1: The proposed system architecture.
7. Evaluation
The evaluation plan is to implement a beta version of
the system, where all the modules in the architecture
will be implemented manually by an expert. This
version will be tested on a small corpus to get an idea
about the performance of the system, and whether some
extra modules need to be added to the architecture.
Also, the test set will be used to compare the
performance of the system after automating each
module versus the performance of our reference test set.
8. References
[1] D. Mott, D. Braines, S. Poteet, A. Kao, and P. Xue.
Controlled Natural Language to facilitate information
extraction. ACITA, 2012.
[2] A. Ranta. Embedded Controlled Languages. In 4th
International Workshop, CNL 2014, pages 1–7. Galway.
INSIGHTSC [16]
Heuristic Based Adaptive Query Optimization Approach for RSP ∗
Yashpal Singh, Muhammad Intizar Ali, Alessandra Mileo
Insight Centre for Data Analytics
yashpal.singh, ali.intizar, alessandra.mileo(@insight-centre.org)
1. Motivation and Problem Statement
RDF streams are massive unbounded sequence of linked
data that are continuously generated at a random rate which
raises the issue of optimization of continuous query pro-
cessing. The problem of supporting rapid adaptation to
run-time conditions during adaptive query processing [3]
is of increasing importance in today’s data processing en-
vironments. In applications such as network monitoring,
telecommunications data management, manufacturing, sen-
sor networks, and others, data takes the form of continuous
data streams rather than finite stored data sets, and clients
require long-running continuous queries as opposed to one-
time queries. There are two evaluation strategies of eval-
uating sliding window operators: re-evaluation and incre-
mental evaluation [4]. The incremental order in which mul-
tiple joins are performed as per query plan may decide the
performance of the system because selectivity of one join
can differ significantly from the selectivity of other joins.
However existing approaches lack a cost-bound monitor-
ing component which can exploit the run-time parameter
and stored heuristics to generate a highly optimized global
plan. The goal of our research is to design an adaptive
and cost-effective monitoring component for RSP which en-
ables existing stream query processing engine to adapt the
highly optimized query plan based on the system environ-
ment. The monitoring component does not only augments
the probability of generating an optimized plan, but also re-
duces the run-time delay caused by optimization.This strat-
egy granteatly benefits from the available techniques and
approaches provided by DSMS and RSP research commu-
nities.
2. State-of-the-art
Considering state-of-the-art solutions, the RSP engine
are categorized in two classes on the basis of their archi-
tecture [5]. Blackbox architecture which uses existing sys-
tems as sub-components for the processing model [2], [1]
and the other one is whitebox architecture where a query
engine natively support optimization strategies [5] . To im-
plement our approach currently we are taking one engine
from whitebox architecture and we will try to make our ap-
proach independent of engines architectural design.
3. Proposed Approach and Discussion
In this research, we investigate the different query exe-
cution model of the existing RSP engines based on the ar-
∗This research has been partially supported by Science Foundation Ire-
land (SFI) under grant No. SFI/12/RC/2289 for RSP(RDF Stream Process-
ing)
chitectural design. Our main goal is to implement a cost-
effective monitoring component which helps the engine at
run-time to decide the best global optimized plan as per the
current system environment. We consider the merging of
logical and physical plan incorporated by the existing query
optimizers, to rival their performance and flexibility. All
our techniques extend existing platform to a parallel and
distributed RDF stream processing.
We have maintained some initial set of cost-bound param-
eters: referential integrity, window size of stored map-
pings, success rate of generated plan and joining patterns
between streams over the query life cycle. These all pa-
rameters counts as heuristics for our new monitoring com-
ponent and helps the query execution component to de-
cide the global optimized plan. As a next step, we plan to
build the query execution monitoring component indepen-
dent of the existing RSP engines architecture while consid-
ering some cost-bound parameters. Recent initiatives such
as AROA(Adaptive Run-time Overhead Adjustment) [6] in
the DSMS community has a significant contribution to the
efficient stream data query processing. We intend to Con-
tribute to RSP in designing an efficient stream query pro-
cessing model and that can support different existing RSP
engines in standard approach.
References
[1] D. Anicic, P. Fodor, S. Rudolph, and N. Stojanovic. Ep-
sparql: a unified language for event processing and stream
reasoning. In Proceedings of the 20th international confer-
ence on World wide web, pages 635–644. ACM, 2011.
[2] D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle, and
M. Grossniklaus. C-sparql: Sparql for continuous querying.
In Proceedings of the 18th international conference on World
wide web, pages 1061–1062. ACM, 2009.
[3] A. Deshpande, Z. Ives, and V. Raman. Adaptive query pro-
cessing. Foundations and Trends in Databases, 1(1):1–140,
2007.
[4] T. M. Ghanem, M. Hammad, M. F. Mokbel, W. G. Aref, A. K.
Elmagarmid, et al. Incremental evaluation of sliding-window
queries over data streams. Knowledge and Data Engineering,
IEEE Transactions on, 19(1):57–72, 2007.
[5] D. Le-Phuoc, J. X. Parreira, M. Hausenblas, and
M. Hauswirth. Continuous query optimization and
evaluation over unified linked stream data and linked open
data. Technical report, Citeseer, 2010.
[6] H.-H. Lee, H.-K. Park, J.-C. Park, W.-S. Lee, and K.-H. Joo.
Adaptive run-time overhead adjustments for optimizing mul-
tiple continuous query processing. International Journal of
Software Engineering and Its Applications, 8(11):183–196,
2014.
INSIGHTSC [17]
Instance Search with Semantic Analysis and Faceted Navigation
Zhenxing Zhang Cathal Gurrin Alan F.Smeaton
Insight Centre for Data Analytics
E-mail zhenxing.zhang, cathal.gurrin, alan.smeaton@insight-centre.org
Abstract
In this paper we present an interactive instance search ap-
proach to finding video clips which contain specific objects
from a large video collection. We implement a complete in-
teractive search system with open vocabulary querying and
faceted navigation to enable users to search video archives
based on automatically extracted semantic categories, ob-
ject labels, and object attributes from visual content.
1. Introduction
Recent work has described methods to build content
based video retrieval systems from automatically extracted
semantic information using various approaches, such as col-
laborative browsing or concept filtering. However an anal-
ysis of these methods reveals that subjective and imprecise
user abilities when querying and interacting with retrieval
systems pose challenges to make use of advanced visual re-
trieval capabilities. This paper presents an interactive in-
stance search approach with open vocabulary querying and
faceted navigation to enable users to browse large video
collections quickly and intuitively based on automatically
extracted semantic categories, object labels, and object at-
tributes from video content. We implemented a complete
interactive search system and designed comprehensive ex-
periments on large public data collections to evaluate the
efficiency and effectiveness of proposed interactive video
search tool.
2. Visual Content Analysis
The recent development in image classification and
object recognition using Deep Convolutional Networks
(DCNs) outperforms other state-of-the-art approaches on
various evaluation data sets.The learning framework,
Caffe [2], along with learned models is available to use as
open source, which encourages researchers to use and con-
tribute to the framework. In our work, we employ DCNs
to extract meaningful information, such as semantic con-
cepts, object labels, and object attributes, to describe the
visual content of video shots. More specifically, we chose
two pre-trained models to cover the wide range of possible
topics in the BBC programming videos, such as desired ob-
ject information as well as environments context. We used
the Caffe Library [2] running on a machine equipped with
GeForce GTX 970 Graphic card and 16 GB RAM to do the
heavy visual processing tasks.
3. Interactive Search
In this work, we incorporated an open-vocabulary query
stage to allow unrestricted query search and faceted naviga-
tion to help users access large archives.
Open Vocabulary Querying The open-vocabulary
querying stage allow us to measure of relatedness between
input query words and the list of semantic labels used by the
system, which ensures that even if the query word is not one
of the pre-trained semantic labels, the system is able to pro-
duce a ranked list of closest match multimedia documents
for users.
Faceted Navigation During the results presentation
stage, we implemented faceted navigation to support users
with content exploration. This essentially reflects the fact
that users may seek information in a number of different
ways from their own understanding of the query topics.
Figure 1: Screenshot for interactive browsing interface
For our implementation, we build a network of related
concepts from WordNet [1], a freely available lexical dic-
tionary, to measure semantic relatedness or similarity be-
tween different words. Figure 1 demonstrated an example
of using our faceted navigation to find the interested video
clips from two different routes. In this text query example,
users can drill down through concepts and attributes more
naturally.
4. Experiment
The experiment data set consists of about 100 hours of
video with the content coming from various BBC TV pro-
grammes. The performance of the system is measured with
an average score(0–100) calculated by taking both search
accuracy and search efficiency into account. The experi-
ment results demonstrates our approach could help users to
find the interested video clips quickly. More comprehensive
evaluation experiments will be carried out by the end of this
year to further evaluate the performance.
References
[1] I. Feinerer and K. Hornik. wordnet: WordNet Interface, 2015.
R package version 0.1-10.
[2] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir-
shick, S. Guadarrama, and T. Darrell. Caffe: Convolu-
tional architecture for fast feature embedding. arXiv preprint
arXiv:1408.5093, 2014.
INSIGHTSC [18]
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015
Pdfslide.net book of-abstracts-insight-student-conference-2015

Contenu connexe

Similaire à Pdfslide.net book of-abstracts-insight-student-conference-2015

Explainable AI for non-expert users
Explainable AI for non-expert usersExplainable AI for non-expert users
Explainable AI for non-expert usersKatrien Verbert
 
Responsible AI
Responsible AIResponsible AI
Responsible AINeo4j
 
2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...
2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...
2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...Alkis Vazacopoulos
 
Abstracts of papers.pdf
Abstracts of papers.pdfAbstracts of papers.pdf
Abstracts of papers.pdfAshley Smith
 
Mixed-initiative recommender systems: towards a next generation of recommende...
Mixed-initiative recommender systems: towards a next generation of recommende...Mixed-initiative recommender systems: towards a next generation of recommende...
Mixed-initiative recommender systems: towards a next generation of recommende...Katrien Verbert
 
Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...Katrien Verbert
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFOlga Scrivner
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015Steffen Staab
 
Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1eshtiyak
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSivagowry Shathesh
 
Emerging technologies for health and medicine_ virtual reality, augmented rea...
Emerging technologies for health and medicine_ virtual reality, augmented rea...Emerging technologies for health and medicine_ virtual reality, augmented rea...
Emerging technologies for health and medicine_ virtual reality, augmented rea...Scott Mijal
 
Emerging Role of Social Media Analytics in Health Care and BioMedical Research
Emerging Role of Social Media Analytics in Health Care and BioMedical ResearchEmerging Role of Social Media Analytics in Health Care and BioMedical Research
Emerging Role of Social Media Analytics in Health Care and BioMedical ResearchUniversity of Arizona
 
Mixed-initiative recommender systems
Mixed-initiative recommender systemsMixed-initiative recommender systems
Mixed-initiative recommender systemsKatrien Verbert
 
ICACR 2023 final (1).pptx
ICACR 2023 final (1).pptxICACR 2023 final (1).pptx
ICACR 2023 final (1).pptxSaadAli105813
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Artificial Intelligence Institute at UofSC
 

Similaire à Pdfslide.net book of-abstracts-insight-student-conference-2015 (20)

Explainable AI for non-expert users
Explainable AI for non-expert usersExplainable AI for non-expert users
Explainable AI for non-expert users
 
Responsible AI
Responsible AIResponsible AI
Responsible AI
 
Oxford_15-03-22.pptx
Oxford_15-03-22.pptxOxford_15-03-22.pptx
Oxford_15-03-22.pptx
 
Posters 2017
Posters 2017Posters 2017
Posters 2017
 
2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...
2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...
2017 Business Intelligence & Analytics Corporate Event Stevens Institute of T...
 
Dr.DLSREDDY_Resume1
Dr.DLSREDDY_Resume1Dr.DLSREDDY_Resume1
Dr.DLSREDDY_Resume1
 
Abstracts of papers.pdf
Abstracts of papers.pdfAbstracts of papers.pdf
Abstracts of papers.pdf
 
Mixed-initiative recommender systems: towards a next generation of recommende...
Mixed-initiative recommender systems: towards a next generation of recommende...Mixed-initiative recommender systems: towards a next generation of recommende...
Mixed-initiative recommender systems: towards a next generation of recommende...
 
Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...Human-centered AI: towards the next generation of interactive and adaptive ex...
Human-centered AI: towards the next generation of interactive and adaptive ex...
 
Building Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVFBuilding Effective Visualization Shiny WVF
Building Effective Visualization Shiny WVF
 
Closing Session ISWC 2015
Closing Session ISWC 2015Closing Session ISWC 2015
Closing Session ISWC 2015
 
Machine learning advances in 2020
Machine learning advances in 2020Machine learning advances in 2020
Machine learning advances in 2020
 
Springer paper
Springer paperSpringer paper
Springer paper
 
Twitter sentiment classifications 1
Twitter sentiment classifications 1Twitter sentiment classifications 1
Twitter sentiment classifications 1
 
Survey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease predictionSurvey on data mining techniques in heart disease prediction
Survey on data mining techniques in heart disease prediction
 
Emerging technologies for health and medicine_ virtual reality, augmented rea...
Emerging technologies for health and medicine_ virtual reality, augmented rea...Emerging technologies for health and medicine_ virtual reality, augmented rea...
Emerging technologies for health and medicine_ virtual reality, augmented rea...
 
Emerging Role of Social Media Analytics in Health Care and BioMedical Research
Emerging Role of Social Media Analytics in Health Care and BioMedical ResearchEmerging Role of Social Media Analytics in Health Care and BioMedical Research
Emerging Role of Social Media Analytics in Health Care and BioMedical Research
 
Mixed-initiative recommender systems
Mixed-initiative recommender systemsMixed-initiative recommender systems
Mixed-initiative recommender systems
 
ICACR 2023 final (1).pptx
ICACR 2023 final (1).pptxICACR 2023 final (1).pptx
ICACR 2023 final (1).pptx
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
 

Plus de Zia Ush Shamszaman

Plus de Zia Ush Shamszaman (12)

Hacking with Backtrack Lecture-3
Hacking with Backtrack Lecture-3Hacking with Backtrack Lecture-3
Hacking with Backtrack Lecture-3
 
Hacking with Backtrack Lecture-2
Hacking with Backtrack Lecture-2Hacking with Backtrack Lecture-2
Hacking with Backtrack Lecture-2
 
Hacking with Backtrack Lecture-1
Hacking with Backtrack Lecture-1Hacking with Backtrack Lecture-1
Hacking with Backtrack Lecture-1
 
On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...
 
On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...On the need for applications aware adaptive middleware in real-time RDF data ...
On the need for applications aware adaptive middleware in real-time RDF data ...
 
L 19 ct1120
L 19 ct1120L 19 ct1120
L 19 ct1120
 
L 18 ct1120
L 18 ct1120L 18 ct1120
L 18 ct1120
 
L 17 ct1120
L 17 ct1120L 17 ct1120
L 17 ct1120
 
L 15 ct1120
L 15 ct1120L 15 ct1120
L 15 ct1120
 
L 14-ct1120
L 14-ct1120L 14-ct1120
L 14-ct1120
 
linear search and binary search
linear search and binary searchlinear search and binary search
linear search and binary search
 
Bangladesh
BangladeshBangladesh
Bangladesh
 

Dernier

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...gajnagarg
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Roommeghakumariji156
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.krishnachandrapal52
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理F
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiMonica Sydney
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制pxcywzqs
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsMonica Sydney
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...kumargunjan9515
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdfMatthew Sinclair
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查ydyuyu
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsMonica Sydney
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge GraphsEleniIlkou
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsMonica Sydney
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoilmeghakumariji156
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Balliameghakumariji156
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理F
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsPriya Reddy
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样ayvbos
 

Dernier (20)

Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac RoomVip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
Vip Firozabad Phone 8250092165 Escorts Service At 6k To 30k Along With Ac Room
 
Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.Meaning of On page SEO & its process in detail.
Meaning of On page SEO & its process in detail.
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu DhabiAbu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
Abu Dhabi Escorts Service 0508644382 Escorts in Abu Dhabi
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
call girls in Anand Vihar (delhi) call me [🔝9953056974🔝] escort service 24X7
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girlsRussian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
Russian Call girls in Abu Dhabi 0508644382 Abu Dhabi Call girls
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi EscortsIndian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
Indian Escort in Abu DHabi 0508644382 Abu Dhabi Escorts
 
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime NagercoilNagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
Nagercoil Escorts Service Girl ^ 9332606886, WhatsApp Anytime Nagercoil
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 

Pdfslide.net book of-abstracts-insight-student-conference-2015

  • 1. Galway, Ireland October 30, 2015 Book of Abstracts Insight Student Conference 2015 INSIGHTSC2015 organised by Erik Aumayr, Narumol Prangnawarat, Zia Ush Shamszaman, Alokkumar Jha, Md. Rezaul Karim, Jaynal Abedin, Thu-Le Pham, Siamak Barzegar The Insight Centre for Data Analytics
  • 2. Book of Abstracts Insight Student Conference (INSIGHTSC 2015) Galway, Ireland Erik Aumayr, Narumol Prangnawarat, Zia Ush Shamszaman, Md. Rezaul Karim, Alokkumar Jha, Jaynal Abedin, Thu-Le Pham, Siamak Barzegar The Insight Centre for Data Analytics October 30, 2015
  • 3. Credits: Cover design: Srinivasan Arumugam, Md. Rezaul Karim Book editor: Erik Aumayr, Md. Rezaul Karim, Narumol Prangnawarat, Jaynal Abedin Published by: The Insight Centre for Data Analytics https://www.insight-centre.org/content/insight-student-conference-2015
  • 4. LIST OF ABSTRACTS Linked Data and Semantic Web 1 A Linked Data Platform as Service for Finite Element Biosimulations Joao Bosco Jares, Muntazir Mehdi andRatnesh Sahay 2 Adaptivity in RDF Stream Processing (RSP) Zia Ush Shamszaman, Muhammad Intizar Aliand Alessandra Mileo 3 An elastic and scalable spatiotemporal query processing for linked sensor data Hoan Nguyen Mau Quoc 4 An Infrastructure to Integrate Open Public Health Data and Predicting Health Status Jaynal Abedin, Ratnesh Sahay and Dietrich Rebholz-Schuhmann 5 Biological Link Extractor From Literature In Systems Biology Arindam Halder 6 Complex Reasoning over Big Data Streams with Answer Set Programming Thu-Le Pham 7 Data Agnostic Management Systems for The Internet of Things Zeeshan Jan, Aqeel Kazmi and Martin Serrano 8 Deployment and Configuration of Multi-Query Data Test Services in Federated Cloud Environments Salma Abdulaziz 9 Diminishing Business Challenges by Improving Open Data Business Model Fatemeh Ahmadi Zeleti 10 DInfra - Distributional Infrastructure Siamak Barzegar, Juliano Efson Sales, Andre Freitas and Brian Davis 11 DISCOV3R: Discovering Life-Sciences Datasets in LS-LOD Cloud Muntazir Mehdi, Ratnesh Sahay and Dietrich Rebholz-Schuhmann 12 Discovering Hidden Structures for Quality Assessment Emir Muñoz 13 Effective Data Visualisation to Promote Home Based Cardiac Rehabilitation Liam Sexton and David Monaghan 14 Enabling a better Collaboration and Communication through Personalizations Anne Helmreich 15 EnRG: Entity Relatedness Graph Nitish Aggarwal 16 Extracting Semantic Knowledge from Unstructured Text using Embedded Controlled Languages Hazem Abdelaal 17 Heuristic Based Adaptive Query Optimization Approach for RDF Stream Processing Yashpal Singh, Ali Intizar and Alessandra Mileo 18 Instance Search with Semantic Analysis and Faceted Navigation Zhenxing Zhang, Cathal Gurrin and Alan Smeaton 19 Knowledge Base Segmentation in Entity Linking with Multiple Knowledge Bases Bianca Pereira 20 Linked data approach for CNV annotation in cancer Alokkumar Jha, Yasar Khan, Ratnesh Sahay and Dietrich Rebholz-Schuhmann 21 Linked Data Profiling Andrejs Abele
  • 5. 22 Mobile RDF Store Le Tuan Anh and Danh Le Phuoc 23 Optimizing Access to Twitter Pull-Based APIs Soheila Dehghanzadeh 24 Overcoming Limitations to control personal data and ownership in DOSNs Safina Showkat Ara 25 Reference Implementation and Performance Evaluation for CQELS RDF Stream Processing engine Chan Le Van 26 Temporal Graph-based Approach for Document Summarisation Narumol Prangnawarat, Ioana Hulpus and Conor Hayes 27 Using semantic information for ontology translation Mihael Arcan 28 Zinc Phthalocyanine and its Substituted Derivatives as Sensitive Layers for Textile - Based Sensor Eva Marešová, Martin Vrňata, Přemysl Fitl, Jiřı́ Bulı́ř, Ján Lančok, Jan Vlček, David Tomeček,Michal Novotný, Larisa Florea, Shirley Coyleand Dermot Diamond Media Analytics 29 A Framework for Extraction of Exceptional Events Yuriy Gurin, Terrence Szymanski andMark Keane 30 An Approach to Measure the Impact of Academic Entities Using a Heterogeneous Graph Mohan Timilsina 31 Brain computer interfaces for digital media interaction Zhengwei Wang 32 Bridging Social Media and e-Participation Lukasz Porwol 33 Content-based Search Engine for Lifelogging with Collaborative Common Knowledge Base Tengqi Ye 34 Deep community analysis: Using statistical analysis to identify the most influential users in Social media Himasagar Tamatam and Conor Hayes 35 Deep Image Representations for Instance Search Eva Mohedano 36 Design considerations of a lifelog annotation system with three-level ontology Aaron Duane 37 Development of a Closed-loop Neurocognitive Engineering Platform Damien Kearney, Tomas Ward, Mahnaz Arvaneh and Ian Robertson 38 Exploring Twitter Data with Computational Homotopy Pablo Torres-Tramon and Graham Ellis 39 Extraction of Customer-To-Customer Suggestions from Reviews Sapna Negi 40 Human Action Recognition Framework using Graph Representation Iveel Jargalsaikhan 41 Measuring the Semantic Similarity in Interest Graphs for Social Recommendations Guangyuan Piao 42 Observing the Relationships Between MEPs on Twitter Mark Belford, James Cross and Derek Greene 43 Real Time Crowded Scene Understanding Mark Marsden 44 Twitter for Sentiment Analysis and Opinion Mining Peiman Barnaghi, Parsa Ghaffari and John Breslin 45 Using the NoSQL Model to support DWARF Cubes in XML Data Mining Michael Scriney and Mark Roantree 46 What are Words Worth? Exploring Semantic Spaces of Political Discourse Igor Brigadir
  • 6. Optimisation & Decision Analytics 47 A comparison of the SIR and the network model, with Kalman Filter Weipeng Huang 48 An Efficient Dispatch and Decision-making Model for Taxi-booking Service Cheng Qiao 49 Countermeasures to Mitigate Bandwidth Level DDoS attacks in Data Centers Samar Raza Talpur and Prof. M-Tahar Kechadi 50 Different solutions of Variable Cost and Size Bin Packaging Problem with Stochastic Item (VCSBPSI) Andrea Visentin 51 Improving Catalogue Navigation using Critique Graphs Begum Genc and Barry O’Sullivan 52 Learning User Preferences in Matching for Ridesharing Mojtaba Montazery and Nic Wilson 53 On Energy- and Cooling-Aware Data Centre Workload Management Danuta Sorina Chisca, Deepak Mehta,Ignacio Castiñeiras and Barry O’Sullivan 54 On Temporal Bin Packing Milan de Cauwer 55 Optimal Bayes decision rules in cluster analysis using greedy techniques Riccardo Rastelli and Nial Friel 56 Personalized Route Planning Daniel A. Desmond 57 ReACTR: Realtime Algorithm Configuration through Tournament Rankings Tadhg Fitzgerald 58 Solving a Hard Cutting Stock Problem by Machine Learning and Optimisation Adejuyigbe O. Fajemisin 59 Statistical Regimes and Runtime Prediction – Abstract Barry Hurley 60 Towards Fast Algorithms for the Preference Consistency Problem Anne-Marie George, Nic Wilson and Barry O’Sullivan Personal Sensing 61 A Data Driven Approach to Determining the Influence of Fatigue on Turning Characteristics and Associated Injury Risk in Chronic Ankle Instability Alexandria Remus, Eamonn Delahunt and Brian Caulfield 62 An Evaluation of the effects of the MedEx programme on physical, clinical and psychosocial outcomes and an examination of the determinants of adherence to MedEx. Fiona Skelly and Emer O Leary 63 Association between Objectively Measured Physical Activity and Vascular Endothelial Function in Adoles- cent Males Sinead Sheridan and Niall Moyna 64 Change of direction biomechanics continue to improve from six to nine months post Anterior Cruciate Ligament reconstruction. Shane Gore, Andrew Franklyn-Miller and Kieran Moran 65 Development of an autonomous phosphate sensor. Gillian Duffy, Kevin Murphy, Adrian Nightingale, Nigel Kent, Matthew Mowlem, Dermot Diamond and Fiona Regan 66 Do mobile phone Apps apply behaviour change strategies for clinical populations: A Literature Review Strategy Orlaith Duff, Deirdre Walsh and Catherine Woods 67 GUI based User Behaviour Modelling for Identification Zaher Hinbarji, Cathal Gurrin and Rami Albatal 68 Investigating normal day-to-day variations in postural control in a healthy young population using Wii Balance Boards
  • 7. William Johnston, Ciaran Purcell, Ciara Duffy, Tara Casey, David Singleton, Barry Greene, Denise Mc- Grath and Brain Caulfield 69 MedEx Move On: Community-based Exercise Rehabilitation for Cancer Survivors Mairead Cooney, Emer O’Leary, Brona Furlong, Catherine Woods and Noel McCaffrey 70 Multi-modal Continuous Human Affect Recognition Haolin Wei, David Monaghan and Noel E. O Connor 71 Non-Linear Analyses of Surface Electromyography in Parkinson’s Disease Matthew Flood and Madeleine Lowery 72 Ocular Glucose Biosensing Using Boronic Acid Fluorophores Danielle Bruen, Larisa Florea and Dermot Diamond 73 Quantifying Athletic Screening Tools Using Inertial Sensors And The Microsoft Kinect Darragh Whelan, Martin O’Reilly, Eamonn Delahunt and Brian Caulfield 74 Self-Propelled Electrotactic Ionic liquid Droplets Wayne Francis, Klaudia Wagner, Stephen Beirne, David Officer, Gordon Wallace, Larisa Florea and Dermot Diamond 75 Sodium sensing in sweat using crown ether functionalised polymeric hydrogels Deirdre Winrow, Larisa Florea and Dermot Diamond 76 Stimuli-responsive hydrogels based on acrylic acid and acrylamide Aishling Dunne, Siobhán Mac Ardle, Larisa Florea and Dermot Diamond 77 Tackling Neurodegenerative Diseases Zhemin Zhu 78 Textile Strain Sensors for Clinical Applications in Spinal Flexion Jennifer Deignan, Syamak Farajikhah, Ali Jeirani, Javad Foroughi, Shirley Coyle, Peter Innis, Rita Par- adiso, Gordon Wallace and Dermot Diamond 79 THE BIOMECHANICAL DETERMINANTS OF CUTTING PERFORMANCE IN ELITE FEMALE FIELD SPORT ATHLETES: STUDY JUSTIFICATION Neil Welch, Kieran Moran and Andrew Franklyn-Miller 80 The Development of a Wearable Potentiometric Sensor for Real-time Monitoring of Sodium Levels in Sweat Thomas Glennon, Conor O’Quigley, Giusy Matzeu, Eoghan Mc Namara, Florin Stroiescu, Kevin Fraser, Margaret McCaul, Stephen Beirne, Jens Ducrée, Gordon Wallace, Paddy White and Dermot Diamond 81 The use of an adaptive coaching system to improve patient outcomes during a free living step training exercise programme in type 2 diabetics Hugh Byrne 82 The Use of Inertial Measurement Units to Evaluate Exercise Performance Martin O’Reilly, Darragh Whelan, Tomas Ward and Brian Caulfield 83 Tuning the Stimuli-Responsive Properties of Poly(Ionic Liquid)s Alexandru Tudor, Larisa Florea andDermot Diamond 84 Using Intensity of Periodicity to detect Behaviour Change Feiyan Hu, Alan Smeaton and Eamonn Newman 85 Validation of a Wearable Optical Heart Rate Sensor during Rest and Exercise Alexandria Remus and Brian Caulfield 86 Validity and Reliability of the FitBit Charge HRTM to Monitor Heart Rate at Rest and During Exercise Clare McDermott, Niall Moyna and Kieran Moran 87 Wearables for Diabetes Prevention and Management José Juan Domı́nguez Veiga Recommender Systems 88 A Scalable and Secure Realtime Healthcare Analytics Framework with Apache Spark Md. Rezaul Karim, Ratnesh Sahay andDietrich Rebholz-Schuhmann 89 Actionable Recommendations using Resource Based Features Owen Corrigan 90 Career Development: Recommending Jobs, Career Paths and Skills to Users Xingsheng Guo, Houssem Jerbi and Michael O’Mahony 91 Exploitation-Exploration Aware Diversification for Recommendation Systems
  • 8. Andrea Barraza-Urbina and Conor Hayes 92 Not your Average Fortuneteller; A Study of Time-aware Recommender Systems for Linear TV Humberto Corona 93 Opinionated Explanations for Recommendations Khalil Muhammad, Aonghus Lawlor,Rachael Rafter and Barry Smyth 94 Recommendation Framework for Feature-rich Sequences Gunjan Kumar 95 Recommending from Experience Francisco J Peña 96 Tracking and Recommending News Doychin Doychev, Aonghus Lawlor and Barry Smyth Machine Learning & Statistics 97 A Distributed Approach for Clustering Large Spacial Datasets Malika Bendechache and Tahar Kechadi 98 AcademiScope: An Application for Building and Visualizing an Academic Network John Lonican and Conor Hayes 99 Adaptive MCMC for Changepoint Models on Large Datasets Alan Benson and Nial Friel 100 Application of Data Mining Techniques to Service Modelling and Evaluation at Caredoc Duncan Wallace and Tahar Kechadi 101 Automatic Dynamic Product Classification Guillermo Vinue and Andrew Parnell 102 Categorising Online Q&A Communities Based on User Behaviour Erik Aumayr and Conor Hayes 103 Deriving the String Student-t Process for Scalable Student-t Process Regression Gernot Roetzer and Simon Wilson 104 Exploring Variable Interactions with Restricted Boltzmann Machines Jim O’ Donoghue and Mark Roantree 105 Identification of Regional Interaction Pattern in Regulatory Networks Laleh Kazemzadeh 106 Indicators of Good Student Performance in Moodle Activity Data Ewa Mlynarska 107 Loss Functions and Optimization Strategies for Large-scale Sequence Learning Severin Gsponer, Georgiana Ifrim andBarry Smyth 108 Model Selection for Ranking Data Lucy Small 109 Model-based Clustering with Sparse Covariance Matrices Michael Fop and Thomas Brendan Murphy 110 Network Forensics Readiness and Security Awareness Framework Aadil Mahrouqi and Tahar Kechadi 111 Predicting Peer Groups Effects On University Exam Results Philip Scanlon and Prof Alan Smeaton 112 Predicting Topographical and Sociological Information Patterns from Building Access Logs Philip Scanlon and Alan Smeaton 113 Prenatal alcohol exposure and cord-blood DNA methylation: identifying latent structure in high dimensional data Cathal Mullin 114 Radiographic Knee Osteoarthritis Classification using Convolutional Neural Network Joseph Antony, Kevin McGuinness, Noel O’Connor and Kieran Moran 115 Reformulations of the Map Equation for Community Finding and Blockmodelling Neil Hurley and Erika Duriakova 116 Representative Itemset Mining
  • 9. Hong Huang and Barry O’Sullivan 117 Revealing the Hidden Patterns: Trajectory Mining from Mobility Data Ali Azzam Naeem 118 Sensor based sentiment analysis Srinivasan Arumugam, Jyothirmoy Patgiri andNavdeep Sharma 119 Social Network Analysis of Pride and Prejudice Siobhan Grayson, Derek Greene, Gerardine Meaney and Brian Mac Namee 120 AddressingCold-StartforStreamingTwitterHashtagRecommendationtoNews Bichen Shi 121 Topy the Story Tracker: Social Indexing for Real-time Story Tracking Gevorg Poghosyan, Georgiana Ifrim and Neil Hurley
  • 10. Acknowledgment: This publication has emanated from research supported in part by the research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 and EU project SIFEM (contract Number 600933). A Linked Data Platform as Service for Finite Element Biosimulations Joao Bosco Jares, Muntazir Mehdi and Ratnesh Sahay The Insight Centre for Data Analytics, National University of Ireland, Galway (NUIG) joao.jares@insight-centre.org Abstract Biosimulation studies have recently been introduced as models to understand the causes which results raise to impairment in human organs. Finite Element Method (FEM) provides a mathematical tool to simulate dynamic biological systems, with applications in many fields of human organs, from ear to neurovascular research. However, without a proper data infrastructure, steps involved in the execution and comparative evaluation of Finite Element Simulations might consume much more time and besides, it needs to be performed in an isolated environment. Considering these facts, we propose a service-oriented Linked Data platform to improve automation, integration, analysis and visualization of biosimulation model for inner ear (cochlea) mechanics. 1. Introduction and Motivation Finite Element (FE) models are numerical approaches used for finding approximate solutions to differential equations. Construction of FE models is a highly complex task comprising multiple steps, such as, definition of a discretized geometrical model (a mesh) and a physical-mathematical model, method type selection, visualization and the result interpretation of the model. Moreover, building a consistent FE model could consume much more time hence, depends on fine- tuning of different parameters; where, the complexity is expressed not only in difficulty of building and validating FE Models but also in reproducing and reusing third-party FE models [1]. Proposed works in the existing approaches [2] describe the creation of an infrastructure and platform to support more automated interpretation of FE simulations in bio-mechanics using Semantic Web standards and tools. In addition, most related data are represented in numeric way, this work explores mechanisms to bridge data on the numerical to the ontology (conceptual) level, facilitating and automating the interpretation of the simulation results. Therefore, in order to evaluate our proposed approach, the service is exposed via a single interface endpoint for third-party agent’s consumption. The main aim of the service is to improve and integrate the process of automating the individual tasks of performing a biosimulation under a realistic cochlear mechanics FE model. 2. Proposed Framework The Linked Data platform, which is exposed as a service is called SIFEM [2] system and the high level components of SIFEM system (and the conceptual model) are shown in Figure 1. The simulation service starts with the specification of the simulation input parameters (such as inner-ear geometrical mesh). After the specification of the simulation inputs, the user starts the simulation using the service interface, which invokes the SIFEM Solver Service to coordinate the execution and instantiation of multiple solver instances in an asynchronous fashion. Figure 1. Components of SIFEM Services & conceptual model The SIFEM Solver service reads the solver input data and transforms the data into solver specific input format. On receiving from the solvers, the RDFization component RDFizes the input and output of the simulation experiment and stores them in the RDF triple store. The RDFization of the input and output is performed using the SIFEM conceptual model. The Data Analyzer component extracts a set of data analysis features (such as extrema points, average point, and slope) from the numerical simulation data. The data analyses results are also RDFized using the SIFEM conceptual model and are also stored in the triple store. The simulation input-output and analysis data are RDFized and stored such that it could be re-used instead of performing a completely new simulation. The cochlea or inner-ear represents a bio-mechanical device, as well as, the complete understanding of the related organ (cochlea) behavior still an open research challenge. The creation of a complete cochlea model depends on the integration of heterogeneous models at different scales (e.g., basilar membrane, organ of corti and outer hair cells) and theoretical domains (e.g., mechanical, geometrical, and electrical). To the best of our knowledge, the proposed linked data platform is a first unified infrastructure that brings together numerical parameters, models, terminologies, storage, querying, visualization and analysis to conduct a finite element biosimulation. References [1] Yasar Khan, Muntazir Mehdi, Alokkumar Jha, Saleem Razaz, Andre Freitas, Marggie Jones, and Ratnesh Sahay. Extending Inner-Ear Anatomical concepts in the Foundational Model of Anatomy (FMA) Ontology. In proceedings of the BIBE’2015. [2] Muntazir Mehdi, Yasar Khan, Andre Freitas, Joao Jares, Stefan Decker, and Ratnesh Sahay. A Linked Data Platform for Finite Element Bio-Simulations. In proceedings of the SEMANTiCS’ 2015. INSIGHTSC [1]
  • 11. Adaptivity in RDF Stream Processing (RSP) ∗ Zia Ush Shamszaman, Muhammad Intizar Ali, Alessandra Mileo Insight Centre for Data Analytics, National University of Ireland Galway {zia.shamszaman, ali.intizar, alessandra.mileo}@insight-centre.org 1. Introduction and Motivation Existing directions in RSP research focus on reaching a consensus on how to process RDF streams, rather than on the ability to adapt and react to changing requirements of the applications and properties of the underlying data streams. It is not trivial to find a configuration of such fea- tures that can work independently of the data and the ap- plication domains. Therefore, we believe it is important to design flexible RSP solutions which can adapt to the ap- plication requirements and let the application discover and select the underlying RSP engine (or a configuration of their features) on the fly through dynamic adaptation. Over the last few years several stream processing systems have been proposed for efficient processing of RDF stream, just to name a few CQELS, C-SPARQL, EP-SPARQL, SPARQL- stream [2, 1]. Multiple aspects can affect the performance and correctness of the results produced by the query pro- cessors, including operational semantics of linked streams, query execution method, and target domain. However exist- ing approaches to stream query processing lack the adapt- ability to react to changing requirements of the applications and properties of the underlying data streams. The goal of this research is to design a more flexible and adaptive stream query processing approach which enables stream query pro- cessing approaches to adapt according to the requirements of the applications and to the characteristics of the data streams. 2. Hypothesis The adaptive approach will improve efficiency and cor- rectness of stream processing in general by serving a broader category of application requirements. In addi- tion, it will provide better results in changing environments, i.e. when application requirements and properties of data streams change at run-time. 3. Proposed Approach We believe there is a wide range of features which might limit the applicability of RSP solutions. We classify these features into two categories: i) design-time features include aspects such as input data model, language to define pro- cessing rules, operational semantics, and supported stream- ing operators, and ii) run-time features include aspects such as execution time, processing tech- niques, quality of ser- vice (QoS), privacy, target domain of applications and more. Existing RSP engines are designed to have two types of ∗This research has been partially supported by Science Foundation Ire- land (SFI) under grant No. SFI/12/RC/2289 Features CQELS CSPARQL Input Periodic Y N Data Driven N Y Output Istream Y N Rstream N Y Dstream N N Empty Relation Notification N Y Table 1: Narrowed-down features of RSP engines. input data models for query execution, (i) data driven i.e. query is executed whenever data is arrived, or (ii) periodic i.e. query is executed periodically. The data driven ap- proach is ideal for continuous monitoring while the time driven approach is good for periodic monitoring. The C- SPARQL follows time driven strategy where result may get stale if the re-execution frequency is lower than the fre- quency of the updates [1] and thus not suitable for appli- cation where delay can be crucial (e.g. burglar attempt no- tification in a home surveillance system). The CQELS en- gine follows data driven approach which is ideal for time efficiency but can be resource expensive for periodic noti- fication system ( e.g. periodic surveillance over childrens activities in a home monitoring system). Work in [1] has showed that C-SPARQL suffers from duplicate results for simple queries and misses some certain output in complex queries. This is due to implementing the R-Stream output operator in which old triples does not get removed from the time window. However, another evalua- tion in [2] have shown that C-SPARQL provide more cor- rect results among other RSP engines. Diversity in output results produced by various RSP engines is a known phe- nomenon. Input data models and output streaming opera- tors are mere two examples to showcase how some design- time features may affect correctness and performance of the RSP engine for query execution. Hence, we intend to design an engine that allign with the RSP-QL(query language) of W3C RSP community with new query semantics and on-demand query generation ca- pabilities. In the query execution part, based on the require- ments we are to select different features in Table 1 to swithc- on the best of our engine. References [1] D. Le-Phuoc, M. Dao-Tran, M. Pham, P. Boncz, T. Eiter, and M. Fink. Linked stream data processing engines: Facts and figures. The Semantic Web–ISWC 2012, pages 300–312, 2012. [2] Y. Zhang, M.-D. Pham, Ó. Corcho, and J.-P. Calbimonte. Sr- bench: A streaming rdf/sparql benchmark. In ISWC (1), pages 641–657, 2012. INSIGHTSC [2]
  • 12. An elastic and scalable spatiotemporal query processing for linked sensor data Hoan Nguyen Mau Quoc Insight Centre for Data Analytics E-mail: hoan.quoc@insight-centre.org Abstract Recently, many approaches have been proposed to man- age sensor data using Semantic Web technologies for ef- fective heterogeneous data integration. However, our re- search survey revealed that these solutions primarily fo- cused on semantic relationships and still paid less attention to its temporal-spatial correlation. In this paper, we pro- pose a spatiotemporal query engine for sensor data based on Linked Data model. The ultimate goal of our approach is to provide an elastic and scalable system which allows fast searching and analysis on the relationships of space, time and semantic in sensor data. 1. Introduction The Internet of Things(IoT) is the network of physical objects embedded with sensors that are making real-time observations about the world as it happens. The sensor observation data is always associated with spatiotemporal contexts, i.e, they are produced in specific locations at spe- cific time. Therefore, all sensor data items can be repre- sented in three dimensions: semantic, spatial and temporal. Consider the following example: ”What is the average tem- perature in last 30 minutes in Dublin city”. This simple example poses an aggregate query on weather temperature readings of all weather stations in Dublin city. Unfortu- nately, supporting such multidimensional analytical queries on sensor data is still challenging in terms of complexity, performance and scalability. In particular, these queries imply heavy aggregation on large amount of data points along with computation-intensive spatial and temporal fil- tering conditions. Moreover, the high update frequency and large volume of natures of our targeted systems (ten thou- sand updates per seconds on billions of records already in the storage) will add up the burden of answering the query within some seconds or milliseconds. On top of that, by their nature, such systems need to scale to millions of sen- sor sources and years of data. Motivated by such challenges, we propose an elastic spa- tiotemporal query engine, which is able to index, filter and aggregate a high throughput of sensor data together with a large volume of historical data stored in the engine. The engine is backed by distributed database management sys- tems, i.e., OpenTSDB for temporal data and ElasticSearch for spatial data, so that it enables us to store billion data points and ingest million records per second while it is still able to query live data streaming from sensor sources. 2. System Architecture To systematically address the short-comings in the previ- ous section, we exploit multidimensional nature of Linked Sensor Data to parallelise the writing and query operations on distributed computing nodes. This decision is also in- spired by the study of [1, 2] showing that, the processing in Big RDF Graph could be parallelised efficiently by parti- tioning the graph into smaller subgraphs to store in multiple processing nodes. Following will be the overview of the ar- chitecture associated with indexing strategies to deal with temporal and spatial aspects of the Linked Sensor Data. RDF$ Parser$ Triple$Analyzer$ Triple$pa2erns$ Recogni6on$ Rules$ Temporal) Rules) Text) Rules) Spa0al) Rules) Spa6al,$ Text$En6ty$ Indexer$ Temporal$ En6ty$ Indexer$ Elas6c$ record s$ Open)TSDB) RDF$ Triple) Storage) En6ty$ Producer$ Raw$ data$ TSDB$ records$ Query$Processing$ Module$ Search$ Query$ Results$ Data$Manager$ Triple$Router$ Figure 1: System architecture 3 Conclusions and Future Work The need for efficient querying on massive amount of sensor data lies at the heart of most sensor data analytics platform. In this paper, we present our recent effort on leveraging the linked data and NoSQL technologies to ef- fectively manage sensor data. Our approach provides not only a complex spatiotemporal query functions to the users but also proves the ability to handle billions of sensor data. Our experimental results show that this approach is both fast and scalable. For the future work, we expect to adapt a distributed triple store to our system. Furthermore, we are implement- ing some query optimisation algorithms to speed up the query performance. Whilst our system still has its limita- tion, it is a step towards providing high performance spa- tiotemporal query engine in the IoT world. 4 Acknowledgments This publication has emanated from research supported in part by a research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289, Irish Research Council under Grant No. GOIPD/2013/104 and by Euro- pean Union under Grant No. FP7-ICT-608662 (VITAL). References [1] J. Huang, D. J. Abadi, and K. Ren. Scalable sparql querying of large rdf graphs. In Proceedings of the 33rd VLDB, pages 1123–1134, 2011. [2] K. Lee and L. Liu. Scaling queries over big rdf graphs with se- mantic hash partitioning. In Proceedings of the VLDB, VLDB ’07. VLDB Endowment, 2014. INSIGHTSC [3]
  • 13. An Infrastructure to Integrate Open Public Health Data and Predicting Health Status Jaynal Abedin, Ratnesh Sahay, Dietrich Rebholz-Schuhmann Insight Centre for Data Analytics E-mail jaynal.abedin@insight-centre.org Abstract In recent years many datasets are being made publicly available, sometimes as donor requirement or sometimes due legislative initiatives. The sources are heterogeneous and also available in varieties of format. The datasets in- cludes, public health survey data, laboratory diagnostic re- sult, and even genotype result also available but little has been done to integrate these heterogeneous data to ensure interoperability and also very little research has been done using integrated data to predict health status of an individ- ual. Through this work we are proposing an infrastructure where we will integrate heterogeneous data sources through semantic web technology and produce linked data and then will develop predictive model to predict health status of an individual on top of it. Also there will be intermediate layer to assess data quality before and after integrating the sources. 1. Motivation Over the years healthcare providers used standalone healthcare information system to provide healthcare ser- vices which are not compatible to interoperability infras- tructure. Moreover, multidisciplinary healthcare communi- ties and decentralized healthcare information system largely contribute to the non-interoperability issues. To overcome interoperability issues several initiatives has been taken over the years. Though there has been several initiative taken over the years to integrate data sources to ensure interoperability but there is lack of assessing data quality before and after inte- gration. Also there scarcity of literature on aligning multi- ple dataset in uniform manner, like a dataset might not con- tain all the indicator which are contained in other dataset. There is a need to develop algorithm which will be able to impute those information and dataset across sources will be uniform. 2. Problem Statement There is lack of infrastructure to semantically integrate open public health data from heterogeneous sources and there is lack of algorithm to create uniform data sets across sources and utilize the integrated data sources to develop predictive model to predict individuals health status 3. Related Work Bischof et.al [1] has proposed an infrastructure in smart city domain to semantically integrate heterogeneous data sources and published their integrated data as linked open data. Our approach little bit similar to this approach but we will have different layers with various functionality 4. Research Objective To develop an infrastructure to semantically integrate heterogeneous public health data sources, indicator align- ment and develop predictive model to predict health status of individuals. 5. Proposed Infrastructure In the proposed infrastructure there will be nine differ- ent layer with specific functionality. Each layer has specific task to perform as below: Input Layer (IL): Identifying publicly available health data sources including, survey data, laboratory diagnostic data, Genomic data and Meta data. Pre-Processing Layer (PPL): In this layer we will assess data quality against meta data of the sources. Semantic Annotation Layer (SAL): Convert to linked data (RDF), Semantically Annotate with appropriate schema and ontology. Inter-Linking Layer (ILL): Semantically link all of the data sources. Indica- tor Alignment Layer (IAL): In this layer align indicators across dataset, predict indicators if there is any missing indi- cator in a dataset through statistical modeling (non paramet- ric or semi-parametric predictive model). Assessing Qual- ity of Indicator Alignment (AQIA): Assess quality of in- dicator alighment algorithm developed in IA layer. Inte- grated Data Layer (IDL): Store semantically integrated data set in a way that it can be queried using SPARQL query language. Predictive Modelling Layer (PML): Develop predictive modeling using integrated dataset on specific dis- ease of interest. Output Layer (OL): Store the modeling result in linked data form e.g. RDF. 6. Future Work Once the infrastructure has been developed we can then develop web app on top of it where users can provide his own data which will be conferted to linked data. Then the usuer can see the health status based on his own data and other publicly available dataset. References [1] S. Bischof, C. Martin, A. Polleres, and P. Schneider. Collect- ing, integrating, enriching and republishing open city data as linked data. INSIGHTSC [4]
  • 14. Biological Link Extractor From Literature In Systems Biology Arindam Halder, Frank Barry REMEDI/INSIGHT Centre for Data Analytics Arindam.halder@insight-centre.org Abstract Systems biology has been gaining in prominence due to the ever-increasing sources of generating biological data. Despite advances in the field of language processing and data integration we have failed to fully utilize the data being generated and one of the most under-utilized resource has been the published literature. To utilize the gold mine that is the published literature it’s paramount that we integrate the results from across publications to build a better understanding of the underlying biological systems. 1. Introduction The successful completion of the Human Genome Project ushered in a period where data generation became easier and faster with time. This led to a glut of data, which added to the already huge corpus of data about proteins, genes, pathways and diseases. The solution to the information explosion was to create huge databases, which documented the data, but the published literature was never satisfactorily mined in a similar manner. The frameworks to combine data from various data sources is pretty advanced to give a overview about the complexity and functionality along with the completeness of a biological network but the integration of scientific literature is yet to be accomplished in a satisfactory manner [1]. 2. Motivations And Problem Statement At REMEDI it was discovered that a protein ‘SPARCL1’ which when applied along with MSC’s to infarcted heart leads to increased efficiency of the repair mechanism. The increase in SPARCL1 levels enhanced the efficiency of tissue repair significantly but no information exists about the mechanism of action. Our work presents a framework to automate the process of knowledge discovery from published literature, which would lead to better discovery of hidden links between biological entities. The framework works on full papers either uploaded by the user or abstracts retrieved from PubMed based on the query formulated by the user. By processing the unstructured text, each paper is converted to a sub- graph based on all the genes/proteins co-occurring with a list of pre-defined interaction verbs with the co- occurring gene/protein pairs forming an edge in the graph. 3. Proposed Solution This section will elaborate the various methods applied, which helps in creating sub-graphs and eventually a network by merging the sub-graphs. The whole processing pipeline of the text and how it is annotated to derive meaningful relationships and generate the sub- graphs is illustrated in Figure 1. Figure 1: The Architecture The text is filtered based on a dictionary of interaction verbs (3548) extracted from the GENIA corpus [2], acts as a trigger. Simultaneously a POS tagger is used to find if nouns are co-occurring with the words in our dictionary. This initial filtering of text based on co- occurrence of nouns and verb is to filter out non- essential sentences for the faster tagging by GENIA tagger [2]. After tagging by the GENIA tagger the text with presence of nouns but no gene/protein tags are checked for presence of gene/proteins missed by the tagger against a dictionary of protein and gene names from Uniprot and HGNC to weed out false negatives. The tool was evaluated against the existing solutions namely ABNER and BANNER and we achieved precision values based on the GENIA corpus comprising 2000 abstracts. 4. Future Work The mechanism of action as predicted by the pipeline is being presently tested in the biological lab. The tool will be further evaluated on other major biological corpuses like AIMED and BIOINFER. 5. References [1] Chen Li, Maria Liakata, and Dietrich Rebholz-Schuhmann Biological network extraction from scientific literature: state of the art and challenges. [2] J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii GENIA corpus—a semantically annotated corpus for bio-textmining. Bioinformatics (2003) 19 (Suppl 1): i180-i182 doi: 10.1093/bioinformatics/btg1023. INSIGHTSC [5]
  • 15. Complex Reasoning over Big Data Streams with Answer Set Programming∗ Thu-Le Pham, Alessandra Mileo Insight Centre for Data Analytics, NUIG E-mail: {thule.pham, alessandra.mileo}@insight-centre.org Abstract This paper addresses the problem of performing complex non-monotonic reasoning over dynamic data. We propose a data-driven approach to examine how the stable model se- mantics of Answer Set Programming can be used for scal- able processing big data streams. 1. Introduction One of the significant challenges arising in the reason- ing community nowadays is the ability to produce timely new results from data streams continuously. State-of-the-art reasoners traditionally focus on performing complex rea- soning tasks over static data where there are no hard con- straints on the response time. Moreover, the expressivity of a reasoner is known to be inversely related to its per- formance. Therefore, building scalable complex reasoning systems over streaming data becomes a difficult task. Answer Set Programming (ASP) with its stable model semantics is well-known as a powerful high- expressive declarative programming language to represent rich knowledge structures and the ability of managing de- faults, common-sense, preferences, recursion, and non- determinism. However the high expressivity of ASP comes at the expense of the efficiency, which makes ASP-based reasoning systems over streams harder to scale. How can we perform ASP-based reasoning over big data streams main- taining scalability? 2. Related Work Authors in [4] focus on distributed methods for non- monotonic rule-based reasoning which tries to achieve bet- ter scalability using the MapReduce framework. Their cur- rent work performs parallel reasoning with well-founded semantics which is good for highly parallelizable prob- lems, not for computationally complex ones. Authors in [3] present the StreamRule framework, which combines a stream processing engine (CQELS) and a non-monotonic reasoner (Clingo) as a one-direction processing pipeline. The current work in [2] focuses on enabling adaptivity for StreamRule by studying correlations between streaming rate and window size. These works show that we can find an optimal window size for a given streaming rate for reducing the processing time of the reasoning layer in StreamRule. However, this conclusion holds if and only if there is no de- pendency between input data for the reasoning component. ∗This research has been partially supported by SFI under grant No. SFI/12/RC/2289 and EU FP7 CityPulse Project under grant No.603095. 3. Problem Statement & Proposed Solution The basic idea for tackling the research question in Sec- tion 1 is to divide input data into chunks, each chunk fed to an ASP reasoner to perform complex reasoning tasks. In [2] we argue that if the processing time for the whole input data in one computation is monotonically increasing, we can re- duce that time by reasoning (even) sequentially on indepen- dent subsets of input data. In this paper, we follow this hypothesis. However, reasoning sequentially on chunks of data has to guarantee that the final results must be the same as if we process the whole input data in one computation. Therefore, the problem is refined as follow: Given an ASP logic program P (a set of rules), and input data I, find subsets I1, ..., In of I such that: I = [ i=1..n Ii and ANS(P, I) = [ i=1..n ANS(P, Ii) where ANS(P, Ii) are the answer sets (results) of rea- soning on P and Ii. In order to find I1, ..., In to satisfy two above equations, we take into account some structural information of the logic program P for studying the dependencies among ele- ments in I. We need to extend the concept of dependency graph of P [1] because this definition considers only the relationship between a positive IDB (intensional database) predicate in the body with a predicate in the head of a rule in P. We intend to extend the dependency graph by consid- ering: i) the (transitive) correlation between two predicates in a body of a rule, ii) not only positive literals but also neg- ative ones. We believe that this extended dependency graph can help to reduce the reasoning time of a single system while main- taining the correctness of the results by splitting input data. Moreover, it can help to enable the parallelism for perform- ing ASP reasoning. References [1] F. Calimeri, S. Perri, and F. Ricca. Experimenting with paral- lelism for the instantiation of asp programs. Journal of Algo- rithms, 63(1):34–54, 2008. [2] S. Germano, T.-L. Pham, and A. Mileo. Web stream reasoning in practice: on the expressivity vs. scalability tradeoff. In Web Reasoning and Rule Systems, pages 105–112. Springer, 2015. [3] A. Mileo, A. Abdelrahman, S. Policarpio, and M. Hauswirth. Streamrule: a nonmonotonic stream reasoning system for the semantic web. In Web Reasoning and Rule Systems, pages 247–252. Springer, 2013. [4] I. Tachmazidis, G. Antoniou, and W. Faber. Efficient com- putation of the well-founded semantics over big data. arXiv preprint arXiv:1405.2590, 2014. INSIGHTSC [6]
  • 16. Data Agnostic Management Systems for The Internet of Things Zeeshan Jan, Aqeel Kazmi, Martin Serrano Insight Centre for Data Analytics, NUIG {first}.{last}@insight-centre.org Abstract In the Internet of Things area (IoT), interoperability is seen as the way to align different levels of data that also use different representation models. Data Management Systems in IoT are becoming the main bottle-neck when a high level of data interoperability is required. IoT systems that store heterogeneous data sets, called silos, are unable to communicate with each other at the data level. This work aims for bringing interoperability at one common layer by using semantics and for storing the heterogeneous data generated by different IoT systems and by means of a common data format using ontologies. A Hybrid DMS platform has been prototyped for handling RDF and JSON-LD formats providing interoperability for IoT data. The DMS uses internal and external data representations of collected IoT data and this is how DMS targets data integration and application interoperability. 1. Introduction Managing data generated from various IoT platforms is potentially a big challenge when it comes to be necessary storing, managing and making it available for other components of a system. Traditionally each IoT system uses its own taxonomy to store data generated by ICOs. For storing such heterogeneous data at one place, there is a need of common taxonomy. As technology evolved, developers have shifted their interest from RDF to JSON-LD since data representation in JSON-LD is closer to the development tools and developers would not bother to learn RDF. Our solution is to come up with an ontology and to store JSON-LD data chunks based on our ontology generated by different IoT systems and then making this data available for other components of the system in three data formats i.e., JSON-LD, JSON and RDF. 2. Background The Internet of Things has been a hot topic for research for the past years. There are a numerous IoT platforms producing and managing data generated by ICOs (called sensors), in specific domains like health care and smart traffic. There is a noticeable emergence of IoT tools available for public to connect their ICOs and push their data into cloud systems [1]. Recent examples are Xively [2], Ubidots [3] and OpenIoT [4], which allow various applications of different domains to exploit the data. Data silos of the IoT platforms producing data have a big potential for more research and development if they could communicate with each other. It can also open another space of applications to cope with challenges being faced by IoT platforms. There is a need of an architecture that potentially could manage and integrate that diversity of data formats and even more, that can provide the management mechanism to handle the information indistinctly. 3. Proposed Solution VITAL project defines the paradigm of a System of Systems, which manages the data from different IoT systems producing data (i.e. data silos) and makes it available for exploitation. VITAL schemas define the taxonomy to cover up various domains from the IoT platforms that have been working connected to VITAL, and then the data from those IoT platforms is wrapped into VITAL ontology and stored. In VITAL DMS PPI (Platform Provider Interfaces) are made responsible for wrapping up the data generated by IoT platforms and then pushed into VITAL system. VUAIs (Virtualized Unified Access Interfaces) are the different components responsible for exploiting the enriched VITAL datastore in three available data formats i.e., JSON-LD, JSON and RDF. DMS acts like a data-model-agnostic system, and its main objective is to maintain models of data being pushed into it, data-modeling is done at PPI level where it is supposed to wrap the ICO’s data into VITAL ontology and then push it into DMS. Fig 1: DMS communicates with PPI and VUAIs 8. References [1] J. Soldatos, M. Serrano and M. Hauswith. “Covergence of Utility Computing with the Internet of Things” , International Workshop on Extending Seamlessly to the Internet of Things (esIoT), collocated at the IMIS-2012 International Conference, 4th 6th July, 2012, Palermo, Italy. [2] Xively – [Online]. Available: http://www.xively.com [3] Ubidots – [Online]. Available: http://www.ubidots.com [4] OpenIoT – [Online]. Available: http://www.openiot.eu [5] VITAL Project – [Online]. Available: http://www.vital.eu [6] H. Chen, F. Perich, T. Finin, and A. Joshi, "Soupa: Standard ontology for ubiquitous and pervasive applications," in Mobile and Ubiquitous Systems: Networking and Services, International Conference on, 2004. INSIGHTSC [7]
  • 17. Dynamic Deployment of Multi-Query Internet of Things Data Services in Federated Cloud Environments Salma Abdulaziz, Martin Serrano Insight Centre for Data Analytics, NUIG {salma.abdulaziz, martin.serrano}@insight-centre.org Abstract The fast proliferation of the Internet of Things platforms and the dynamic deployment of their applications in the cloud raise many research concerns regarding their integration in a semantic interoperable way. The OpenIoT platform solved the interoperability problem using the concept of data virtualization and the SSN ontology. However, Integrating different OpenIoT platforms in the cloud is not addressed yet. This study focuses on extending the OpenIoT functionalities to enable IoT data services federation in the cloud. This federation will be the key enabler of smart cities. 1. Motivation The Interoperability of different IoT testbeds problem rises from the fact that these testbeds are in different geographical locations and are administratively dispersed. Moreover, they use different kinds of sensors with different technologies. The OpenIoT platform made it possible to collect data from any IoT testbed regardless of the used technology and regardless of its location using the concept of data virtualization, linked data technologies and SSN ontology. This enables the semantic interoperability of IoT services in the cloud. However, integrating different OpenIoT platforms in a way that it becomes possible for one platform to collect data from another platforms is still challenging. This is a typical problem in smart cities where information systems measuring different aspects about the city functions need to be integrated with other information systems to provide smart cities with new services based on this interaction, even if the systems are located at different geographical locations. This will enable new data exchange applications to be built on the top of this interaction(s). 2. Problem Statement However, the integration of different geographically and administratively dispersed IoT platforms is becoming crucial for enabling smart cities, there is not an easy and dynamic way to enable this federation of platforms in the cloud due to the usage of different technologies and the need of accessing these platforms from the same place at the same time with almost zero- programming effort 3. Related Work The OpenIoT platform proposed in [1] is the latest technology addressing the problem of gathering data from heterogeneous IoT testbeds. It managed to virtualize the sensor data to make it independent of the used sensor technology. However, interaction between different OpenIoT platforms to access more than one IoT testbed at the same time in not addressed yet. 4. Research Question How to integrate different OpenIoT platforms connected to different IoT testbeds so that every platform can access the sensor data of other platforms in the cloud without the need of any programming effort or configurations from the user perspective? 5. Proposed Solution Different OpenIoT instances will be implemented in the cloud. Each OpenIoT instance will be connected to a group of sensors measuring a certain environmental condition. Two approaches will be investigated. The first will be extending the current OpenIoT platform by implementing a user interface that enables data exchange between different federated platforms. The other approach is to make each platform to automatically push its sensor data to other platforms available in the cloud so that each platform will have the ability to access the data of other platforms without the need to request it every time. These two approaches will be compared in terms of data availability, performance and scalability. Based on this comparison, one of them will be implemented to extend the OpenIoT platform functionalities together with a third party entity called the federator. The federator is to monitor all the OpenIoT instances in the cloud and keep records about each of them. All the platforms will be connected to this federator for consultancy about other platforms. The configuration and implementation of sensor Multi- Queries will be addressed as well so that it becomes possible for one platform to be accessible by more than one other platform at the same time. 6. Evaluation The evaluation will be based on the user experience using this extended functionality of the OpenIoT platform to gather data from other instances with zero- programming effort where the time taken to share the sensor data, stability and database usage will be measured for evaluation 7. References [1] J. Soldatos, N. Kefalakis, M. Hauswirth, M. Serrano, J. Calbimonte, M. Riahi, K. Aberer, P.Jayaraman, A. Zaslavsky, I. P. Zarko, L. Skorin- Kapov, and R. Herzog, “OpenIoT: Open Source Internet-of-Things in the Cloud,” in Interoperability and Open-Source Solutions for the Internet of Things, 2015, vol. 9001, pp. 13–25. 
 INSIGHTSC [8]
  • 18. Diminishing Business Challenges by Improving Open Data Business Model Fatemeh Ahmadi-Zeleti Insight, NUIG fatemeh.ahmadizeleti@insight-centre.org Abstract Growing list of business models is observed however, on closer examination, they are not clearly delineated and lack clear value orientation. Therefore, understanding of value creation and exploitation mechanisms in existing open data businesses is difficult and challenging. This gap is in the center of this research work focused on development of open data business model (ODBM) to diminish difficulties and challenges open data businesses face and to unlock the real value of open data. 1. Introduction Large numbers of businesses are seeking to tap into the potential of open data. As new entrants flood the marketplace, businesses are seeking to uniquely position themselves through specialization to create and capture value for their customers. Business models are conceptual instruments for describing how value is created and revenue is generated however, there are very few scholarly studies available on business models to harness the potential value of open data. 2. Related Work Several business models (15 identified) [1] and business model frameworks have been proposed in the literature. The well-known Osterwalder and Pigneur business model canvas (nine building-blocks), Shafer, Smith and Linder framework (four building-blocks) and Hamel Business Model (four building-blocks) are the three most cited business model frameworks. In my study, I adopt the notion of business model provided by Osterwalder which considers a business model as a conceptual tool that contains a set of inter- related elements that allows a company to make money. 3. Early Results Grounded in the extant literature of business models, 6-Values (6-Vs) business model conceptual framework (Figure1) is developed that exploits the key components of a business model and their interrelations [1]. Figure 1: The 6-Vs Conceptual Framework Based on the 6-Vs framework, the 15 business models are elaborated, characterized, and analyzed in open data context. The result of this analysis is the emergence of five open data business model patterns and four business disciplines (Figure 2) [1]. Figure 2: Patterns and Value Disciplines 5. Research Questions and Hypotheses The challenges and difficulties associated with open data business models lead to the following research questions: 1. What revenue pattern can be observed? 2. What are the core capabilities? 3. What are the business success factors? Key hypotheses related to our research questions are: H1. Higher utilization rate of open data products and services determines the pricing method, H2. Increasing transparency increases the value of the open data products and services, H3. Multiple open data revenue streams positively affect the profitability of the business, H4. Added value to open data products and services leads to higher profit. 6. Current and Future Work A survey was carefully designed, tested, and distributed to 250 globally located companies (from over 1,500 examples of Open Data Impact Map database). To better understand the un-known aspects, interviews will be conducted. Through R and Tableau, data are analyzed in order to empirically test the 6-Vs model and to develop ODBM. 8. Reference [1] F. A. Zeleti, A. Ojo, and E. Curry, “Business Models for the Open Data Industry: Characterization and Analysis of Emerging Models,” in 15th Annual International Conference on Digital Government Research, 2014. INSIGHTSC [9]
  • 19. DInfra - Distributional Infrastructure Siamak Barzegar , Juliano Efson Sales ,Brian Davis Insight Centre for Data Analytics E-mail author.name@insight-centre.org Andre Freitas Department of Computer Science and Mathematics, University of Passau, Germany Firstame.Lastname@uni-passau.de Abstract The distributional semantic infrastructure presents an in- frastructure for computing multilingual semantic related- ness and correlation for twelve natural languages by using five distributional semantic models (DSMs). The software infrastructure - DInfra (Distributional Infrastructure) pro- vides researchers and developers with a easy-to-use plat- form for processing large-scale corpora and conducting ex- periments with distributional semantics. The infrastructure integrates several multilingual DSMs, so end user can ob- tain a result without worrying about the complexities in- volved in building DSMs. The DInfra webservice allows the users to have easy access to a wide range of comparisons of DSMs with different model parameters. In addition, users can configure and access DSM parameters using a easy to use API. 1. Introduction Distributional semantics is built upon the assumption that the context surrounding a given word in a text provides important information about its meaning [2], [3]. Distribu- tional semantics focuses on the construction of a semantic representation of a word based on the statistical distribution of word co-occurrence in unstructured data. 2. The Distributional Infrastructure DInfra is an implementation of Explicit Semantic Anal- ysis (ESA), Latent Semantic Analysis (LSA), Random In- dexing based on the EasyESA [Carvalho et at. 2014] and S-Space [Jurgens et al. 2010], GloVe and word2Vector models. ESA, LSA and RI models are based on Vector Space Models but GloVe and Word2Vec models are based on Deep Learning Models1 . The service runs as a JSON 2 webservice, which allows users to submit queries for simi- lar terms in a multilingual fashion bases on a semantic re- latedness measure which use Spearmans correlation to test relatedness scores. We also consider Wikipedia corpora for the years (2006, 2008, 2014) and ukWaC [1] corpus from which to build the vectors. The DInfra webservice allows the user to obtain semantic similarity using Spearman cor- 1Deep learning is a branch of machine learning based on a set of algo- rithms that attempt to model high-level abstractions in data by using model architectures, with complex structures or otherwise, composed of multiple non-linear transformations [Li Deng and Dong Yu. 2014 2JSON - Java Script Object Notation relation for 12 natural languages. Our service can be tested online. It includes two components: 1- Semantic Related- ness that calculates the words similarity, 2- Correlation that calculates the spearmans rank correlation. Figure 1: Screenshot of DInfra web service 3. Multilingual Analysis in DInfra The evaluation will consist of two scenarios: Scenario One: Translated similarity datasets3 by expert translators from a well known localisation company4 were used and evaluated the distributional semantics infrastruc- ture using Wikipedia as a corpus for 12 different languages has been evaluated by spearman’s rank correlation. Scenario Two: Used an automatic machine translation to translate the word pairs in different languages to the English language and evaluate them in the English corpus. References [1] M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: a collection of very large linguistically pro- cessed web-crawled corpora. Language resources and evalu- ation, 43(3):209–226, 2009. [2] Z. S. Harris. Distributional structure. Word, 1954. [3] P. D. Turney, P. Pantel, et al. From frequency to meaning: Vector space models of semantics. Journal of artificial intel- ligence research, 37(1):141–188, 2010. 3WordSim353 (WS353), the Rubenstein & Goodenough (RG) (1965) and Miller & Charles (MC) (1991) 4Lionbridge Technologies, Finland, Natural Language Solutions Team INSIGHTSC [10]
  • 20. Acknowledgment: This publication has emanated from research supported in part by the research grant from Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289 and EU project SIFEM (contract Number 600933). DISCOV3R: Discovering Life-Sciences Datasets in LS-LOD Cloud Muntazir Mehdi, Ratnesh Sahay and Dietrich Rebholz-Schuhmann The Insight Centre for Data Analytics, National University of Ireland Galway {muntazir.mehdi, ratnesh.sahay, rebholz}@insight-centre.org Abstract A significant portion of the Linked Open Data (LOD) cloud consists of Life Sciences datasets known as Life Sciences Linked Open Data (LS-LOD) Cloud contains billions of clinical facts that interlink to form a “Web of Clinical Data”. However, tools for new publishers to find relevant datasets that could potentially be linked to are missing, particularly in specialist domain-specific settings. Based on a set of domain-specific keywords extracted from a local dataset, we propose methods to automatically identify relevant datasets from LS-LOD Cloud. 1. Introduction The LOD Cloud is composed of datasets published by different publishers coming from academia, government organizations, online communities and companies alike. Most of the datasets within the LOD Cloud are accessible via at-least one SPARQL endpoint. The Datahub1 or Mannheim Linked Data2 catalogue lists such SPARQL endpoints available on the Web. The LOD Cloud comprises of 500 million links across datasets3 , following the fourth Linked Data principle: “links to related data”. However, creating links with external LOD datasets is a challenging task for publishers. Addressing this challenge, a number of linking frameworks, such as Silk [2] and LIMES [3], have been proposed. Given that there are now hundreds of remote datasets and many of them are black-boxes that do not describe their content [1], it becomes a challenging task for a publisher to find relevant datasets. In certain cases, the content of remote datasets are usually described using VoID4 , SPARQL 1.1 Service Descriptions, and specialized vocabulary (or ontology), which may help, but these are not available for many endpoints [1]. The most general option is to consider the SPARQL endpoints of datasets as black boxes whose content is opaque and directly query them to determine their relevance. In our work, we explore this option by assuming that high-quality, representative set of domain-specific keywords is made available as input to the process; this set of keywords may be extracted from any local source in any format - such as a taxonomy, a relational schema or a term dictionary. Based on this set of domain-specific keywords, we propose to directly probe SPARQL endpoints with queries to determine their relevance. 1 http://datahub.io/group/lodcloud 2 http://linkeddatacatalog.dws.informatik.uni- mannheim.de/dataset 3 http://lod-cloud.net/state/ 4 http://www.w3.org/TR/void/ 2. DISCOV3R A generic workflow of the DISCOV3R framework is given in Figure 1. For our use-case, we use a set of clinical terminologies (CTerms) used to define a human ear. The terminologies, in their original form are used to prepare query terms (QTerms) which will be later used by three different approaches to discover relevant datasets from LS-LOD Cloud. The CTerms are first filtered using string manipulation techniques. Using language processing tools, the stopwords or general terms are eliminated from the CTerms. Finally, a set of QTerms are created by generating n-grams of the CTerms with order preserved. Once QTerms are generated, the QTerms are then forwarded to three different discovering techniques: Figure 1. DISCOV3R Workflow DMatch: The Direct Matching (DMatch) approach makes use of the original QTerms without any modification to search for the term using a SPARQL query for direct literal matching. μMatch: The Multi Matching (μMatch) approach expands the DMatch approach by creating multiple case and language-tag variants for each QTerm so as to generate more hits. DetMatch: The Detect Matching (DetMatch) approach expands the μMatch approach by removing duplicate literal matching results and reducing the overall query execution time. Results: In our experiments, we consider a total of 222 CTerms and 35 SPARQL endpoints. Out of 222 CTerms, 124 were found on 18 datasets for DMatch, 203 were found on 23 datasets for both μMatch & DetMatch. The average execution time for DMatch approach was 0.16 seconds, for μMatch & DetMatch approaches was 1.01s & 0.17s respectively. References 1. Buil-Aranda, Carlos, et al. "SPARQL web-querying infrastructure: Ready for action?." The Semantic Web- ISWC 2013. Springer Berlin Heidelberg, 2013. 277-293. 2. Volz, Julius, et al. "Silk-A Link Discovery Framework for the Web of Data."LDOW 538 (2009). 3. Ngomo, Axel-Cyrille Ngonga, and Sören Auer. "Limes-a time-efficient approach for large-scale link discovery on the web of data." IJCAI'11. INSIGHTSC [11]
  • 21. Discovering Hidden Structures for Quality Assessment Emir Muñoz Fujitsu Ireland Ltd. and Insight Centre for Data Analytics E-mail: emir.munoz@insight-centre.org Abstract Despite all the structured data available on the Web, users’ capability to exploit that data is limited. To achieve that users need an understanding of 1) underlying data model, 2) quality of the data, and 3) possible use cases. All com- ponents that cannot always be found or are not explicit for the users. Our work focuses in the discovery of implicit structures present in the Web of Data to aid the users’ un- derstanding of these components. 1. Introduction Initiatives like Schema.org1 have helped to the growth of structured data on the Web, by allowing users to rep- resent entities like people, places, and things in general within webpages. However, in many cases structured data is published without a clear structure or spine which allows: users’ understanding of the data modelling, assessment of quality, and identification of use cases. When users try to query their data (which can be very expensive) they usually face problems given by an unclear structure of the large data at hands. First, they need a minimum idea of how the data looks like in order to perform any task. Constraints are a fundamental part of design in data modelling and allow to express structure. Here, we present an ongoing work that focus on mining constraints from the Web of Data that al- lows us to build ‘core spines’ for datasets. Hence, users will not see a piece of data as a black-box anymore, and will gain insights on how to query and unlock the potential of data. 2. Constraints A constraint defines certain properties that data in a dataset must comply with. For example, a RANGECON- STRAINT indicates that a given property can have only val- ues of a given type; a MIN/MAXCONSTRAINT indicates that a given property occurs at least or at most a number of times. More constraints and validation algorithms can be found in [1] and [2] using automaton and regular expres- sions based approaches, respectively. Constraints in data management are useful for differ- ent perennial tasks, such as indexing, query optimization, views, etc. Hence, the discovery of constraints satisfied in a piece of data becomes a relevant and not trivial challenge. Although many inconsistencies can be found in the Web of Data, a strict consideration of constraints would lead to data lost. Allowing some exceptions can prevent systems from losing data [4]. Thus, the idea is to consider constraints with soft bounds that can be violated by individual entities, but should be respected on average. 1http://schema.org 3. Structure Mining Discovery of constraints is not a new topic in the database community. Grahne [3] mines approximate keys in XML data using a rule mining approach. Recently, the discovery of keys2 in RDF have gained attraction for their utility in data linkage [6]. For example, by means of keys we can determine whether a student is enrolled in the li- brary and in the rugby team. However, the state-of-the-art for the Web of Data has only focused on mining keys over a particular and limited RDF view, which is known as Con- cise Bounded Description (CBD). CBD does not consider RDF blank nodes and adds an unexpected complexity to the problem. For example, to determine if two blank nodes are value equal we should determine if there exists an isomor- phism between both graphs, which results to be an NP (GI- complete) problem [5]. We aim to characterise this problem as a frequent itemsets mining problem. We believe that fre- quencies in data will uncover the structure the data have and allow the users’ understanding. 4. Ongoing Work In this work, we hypothesize that datasets in the Web of Data have hidden structures, and that those structures help users to assess quality of the data. Preliminary results over DBpedia3 have shown that syntax patterns can be found in literal predicates. For instance, the ISBN code follows the ALPHANUM-NUM-NUM-NUM pattern. Currently, our plans focus on define/discover soft constraints, and process larger datasets, which will require of scalable methods. References [1] P. M. Fischer, G. Lausen, A. Schätzle, and M. Schmidt. RDF constraint checking. In Proc. of the Workshops of the EDBT/ICDT, pages 205–212, 2015. [2] J. E. L. Gayo, E. Prud’hommeaux, I. Boneva, S. Staworko, H. R. Solbrig, and S. Hym. Towards an RDF validation lan- guage based on regular expression derivatives. In Proc. of the Workshops of the EDBT/ICDT, pages 197–204, 2015. [3] G. Grahne and J. Zhu. Discovering approximate keys in XML data. In Proc. of the 2002 ACM CIKM, pages 453–460, 2002. [4] S. Hartmann. Soft constraints and heuristic constraint cor- rection in entity-relationship modelling. In Semantics in Databases, 2nd International Workshop, pages 82–99, 2001. [5] A. Hogan. Skolemising blank nodes while preserving isomor- phism. In Proc. of the 24th WWW, pages 430–440, 2015. [6] T. Soru, E. Marx, and A. N. Ngomo. ROCKER: A refinement operator for key discovery. In Proc. of the 24th WWW, pages 1025–1033, 2015. 2Keys are a particular type that uniquely identifies elements in a collec- tion, e.g., StudentID in a university context. 3http://dbpedia.org/ INSIGHTSC [12]
  • 22. Effective Data Visualisation to Promote Home Based Cardiac Rehabilitation Liam Sexton, David S. Monaghan Insight Centre for Data Analytics liam.sexton@insight-centre.org, david.monaghan@dcu.ie Abstract Data visualisation can play a crucial role in improving ad- herence rates in home Cardiac Rehabilitation (CR) pro- grammes. The research presented here explores the use of open-source visualisation libraries in a prototype home- based CR system. 1. Motivation Cardiovascular disease is a world-wide burden to pa- tients and health care agencies alike. Traditionally, individ- uals who have suffered a cardiac event attend centre-based CR to aid recovery and prevent further cardiac illness, how- ever participation in these programmes is sub-optimal. 2. Problem Statement Home-based CR has been introduced in an attempt to widen access and promote uptake, however moving care from a dedicated facility has drawbacks; will the patient adhere to the programme and how does a clinician know this? This research aims to use effective data visualisations to promote adherence in home CR programmes. 3. Related Work In the literature, the most widely cited reasons for pa- tients not attending CR programmes are facility distance and reluctance to partake in group classes [2]. Studies have shown that home-based CR is equally effective as centre-based CR [1]. Evidence suggests that CR partic- ipation can be improved by 18% to 30% using patient- targeted strategies, such as motivational communications, phonecalls, and home visits [1]. Thus, there is a need for al- ternative evidence-based approaches to traditional CR that provide affordable access to effective clinical interventions. 4. Hypothesis This work attempts to demonstrate that through effective use of data visualisations in a CR programme, patient adher- ence can be improved. From a clinician’s perspective, the use of data visualisations can present a favourable method of viewing the progress of individuals and groups. 5. Proposed Solution A number of different data visualisations were developed in this research. Fig. 1 shows a prototype patient mon- itoring dashboard developed using Javascript and D3.js. Weekly goals are set based on WHO recommended tar- gets and patients are encouraged to meet or exceed previ- ous daily totals through motivational graphics. Two nor- mal distribution graphs show patient progress against those of a chosen peer group, incorporating a social comparison element. A clinician can become aware of an individuals non-adherence through the dashboard. However, to get a sense of overall programme participation additional infor- mation must be communicated. Fig. 2 shows a heatmap that has been created to show mass patient (y-axis) partici- pation (cell colour) over time (x-axis). Figure 1: Prototype patient monitoring dashboard P a r t i c i p a n t I D 1 2 3 4 5 6 7 7 8 9 1 0 1 1 1 2 1 3 1 4 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 8 2 9 3 0 3 1 3 2 3 3 # S e s s i o n s 0 3 5 8 1 0 F e b 2 0 1 3 A p r 2 0 1 3 J u n 2 0 1 3 J u n 2 0 1 3 A u g 2 0 1 3 O c t 2 0 1 3 D e c 2 0 1 3 F e b 2 0 1 4 A p r 2 0 1 4 J u n 2 0 1 4 A u g 2 0 1 4 O c t 2 0 1 4 D e c 2 0 1 4 F e b 2 0 1 5 Figure 2: Heatmap showing group adherence over time 6. Evaluation & Future Work The heatmap visualisation has been evaluated using pa- tient data from the MedEX CR programme. Relationships and patterns were found that prompted further research questions into quantifying the statistical likelihood of pa- tient adherence. The patient dashboard is still in develop- ment and will be integrated into a larger home based CR system in the future. References [1] H. M. Dalal, A. Zawada, K. Jolly, T. Moxham, and R. S. Taylor. Home based versus centre based cardiac rehabilita- tion: Cochrane systematic review and meta-analysis. Bmj, 340:b5631, 2010. [2] P. Davies, F. Taylor, A. Beswick, F. Wise, T. Moxham, K. Rees, and S. Ebrahim. Promoting patient uptake and adher- ence in cardiac rehabilitation. Cochrane Database Syst Rev, 7(7), 2010. INSIGHTSC [13]
  • 23. Enabling a better Collaboration and Communication through Personalizations Anne Helmreich, Martin Serrano Insight Centre for Data Analytics {anne.helmreich, martin.serrano}@insight-centre.org Joerg Haehner University of Augsburg Abstract Many people are working indeed in the same company but don’t work together in terms of collaborations or in- formation sharing. To solve this problem we developed C&C as an approach to support collaboration and com- munication. The implemented prototype ExpertO (Ex- perts open collaboration) assists staff members in find- ing collaboration partners through personalization. 1. Motivation Let it be computer science, medicine or any other topic: we are living in a fast changing world. For any topic there are researches searching for a new and/or better solutions to improve the world. All those topics can’t be treated separately anymore, as the relatively new fusion in medical computer science demonstrates. There are two options to deal with this trend: 1) in- venting everything from scratch or 2) work with some- one experienced. 2. Problem Statement Nowadays it is quite difficult to find an expert in a specific topic because of the information overflow in the internet and the leak of proper information shar- ing even inside companies [2]. A survey pointed out a leak of collaboration and information flow between researchers and experts working in Insight Centre as well. Although collaboration could save money, time and nerves, it seems to be difficult to bring people to- gether. In fact, the employees do not know who else is researching in the same research field or who has some desired expertise to discuss about. 3. Proposed Solution As many people are unsatisfied by the current state of information flow inside their company, it was neces- sary to tackle this issue. Our approach is to develop a system (C&C) which provides an expert finding sys- tem and an information sharing platform. For each staff member a profile with some basic data and the current research topics will be created automatically. Based on these profiles and through some recommen- dation/personalization techniques, suggestions of suit- able individuals will be created for each member. In addition, a forum, a messaging systems and an event systems will help the staff members to improve knowl- edge sharing inside the company. IoT-sensors could collect data about staff members’ location to enhance fellow-recommendations by including context informa- tion in the recommendation process. 4. Evaluation To evaluate this approach, expertO was developed as a prototype taking off with personalized suggestions as main function. Figure 1 shows an overview of the approach. The prototype is coming as a web appli- cation which is available for both staff members and visitors (after registration). The platform is welcom- ing the user with a bunch of suggested profiles which could be of interest. Furthermore it offers the possibil- ity to add/delete research topics, interests, expertise as well as social activities to the own profile. ExpertO provides a topic search for specific quests. In case the user is interested in a profile, more details about the suggested profile are provided including the seating lo- cation (currently only NUIG). Google Analytics will measure the usage of the platform. 5. Future steps As expertO is a prototype there are plenty of ideas waiting. Next steps are the implementation of a mes- saging system for communicating through the plat- form, a forum to enable information sharing and an event calendar with the possibility to create and join events as well as to get event recommendation. The final goal is the integration in the VITAL [1] platform. The expertO platform can be in use while restocking bit by bit with new features and improved recommen- dation algorithms. Figure 1: ExpertO Platform References [1] T. V. Consortium’. Vital, 2015. [2] J. Hohu and A. Saksena’. Expert collaboration: Dy- namic access to distributed expertise will shape suc- cessful enterprises, 2011. INSIGHTSC [14]
  • 24. EnRG: Entity Relatedness Graph Nitish Aggarwal Insight Centre for Data Analytics National University of Ireland, Galway nitish.aggarwal@insight-centre.org Abstract Wikipedia provides an enormous amount of background knowledge to reason about the semantic relatedness be- tween two entities. In order to find out all the related en- tities for a given entity, an entity relatedness graph (EnRG) is constructed, where the nodes represent Wikipedia enti- ties and the relatedness scores are reflected by the edges. Wikipedia contains more than 4.1 millions entities, which required efficient computation of the relatedness scores be- tween the corresponding 17 trillions of entity-pairs. We compute the relatedness between two entities by quantifying the distance between the corresponding high-dimensional vectors, built over Wikipedia concepts. We evaluate the approach on a benchmark that contains the relative entity relatedness scores for 420 entity pairs. Our approach im- proves the accuracy by 15% on state of the art methods for computing entity relatedness. 1. Introduction Significance of measuring relatedness between entities has been shown in various tasks which deal with informa- tion retrieval (IR), natural language processing (NLP), text analysis or other related fields. In this paper, we present an Entity Relatedness Graph (EnRG)1 , which can be used to obtain a ranked list of related entities that is required by various tasks such as query suggestion in web search, query expansion, and recommendation systems. Related entities can be obtained from knowledge bases such as DBpedia or Freebase by retrieving the directly connected entities. How- ever, most of the popular entities have more than 1,000 di- rectly connected entities, and these knowledge bases mainly cover some specific types of relations. For instance, “Steve Jobs” and “Steve Wozniak” are not directly connected in DBpedia graph. Therefore, we need to find related entities beyond the relations defined in Knowledge base graphs. 2. Proposed Solution We developed an approach called Wikipedia-based Dis- tributional Semantics for Entity Relatedness (DiSER), which builds the semantic profile of an entity by using the high dimensional concept space derived from Wikipedia. DiSER generates a high dimensional vector by taking ev- ery Wikipedia concept as dimension, and the associativity weight of an entity with the concept as the magnitude of the corresponding dimension. To measure the semantic relat- edness between two entities, we simply calculate the cosine 1http://monnet01.sindice.net:8080/enrg/ score between their corresponding DiSER vectors. DiSER considers only human annotated entities in Wikipedia, thus keeping all the canonical entities that ap- pear with hyperlinks in Wikipedia articles. The tf-idf weight of an entity with every Wikipedia article is calculated and used to build the corresponding semantic profile, which is represented by the retrieved Wikipedia concepts sorted by their tf-idf scores. For instance, for an entity e, DiSER builds a semantic vector v, where v = PN i=0 ai ∗ ci and ci is ith concept in the Wikipedia concept space, and ai is the tf-idf weight of the entity e with the concept ci. Here, N represents the total number of Wikipedia concepts. With the Entity relatedness scores obtained by DiSER, we constructed the EnRG (Entity Relatedness Graph). EnRG is constructed by calculating the DiSER scores between 16.83 trillions of entity-pairs (4.1 millions x 4.1 millions). 3. Evaluation We computed semantic relatedness scores for all entity pairs provided in a gold standard 420 entity pairs. These scores were obtained by using our proposed method DiSER, and other state of the art methods: ESA [1], WLM [3] and KORE [2]. We calculated Spearman Rank correlation be- tween the gold standard dataset and the results obtained from different methods. Table 1 shows the results. DiSER out performs all of the state of the art methods. Entity Relatedness Spearman Rank Measures Correlation with human DiSER 0.781 WLM 0.610 KORE 0.673 Table 1: Spearman rank correlation of relatedness measures with gold standard References [1] E. Gabrilovich and S. Markovitch. Computing semantic relat- edness using wikipedia-based explicit semantic analysis. In Proceedings of the 20th IJCAI, 2007. [2] J. Hoffart, S. Seufert, D. B. Nguyen, M. Theobald, and G. Weikum. Kore: Keyphrase overlap relatedness for entity disambiguation. In Proceedings of the 21st CIKM, 2012. [3] I. Witten and D. Milne. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Pro- ceeding of AAAI Workshop on Wikipedia and Artificial Intel- ligence: an Evolving Synergy, 2008. INSIGHTSC [15]
  • 25. Extracting Semantic Knowledge from Unstructured Text using Embedded Controlled Languages Hazem Safwat Insight Centre for Data Analytics, NUIG hazem.abdelaal@insight-centre.org Abstract Knowledge extraction from unstructured text is highly desirable but extremely challenging, due to the inherent ambiguity of natural language. Although, most of the efforts are directed towards extracting data from structured or semi-structured text such as Wikipedia info-boxes. In this article, we present an architecture based on embedded controlled languages that can extract formal semantic knowledge from unstructured text corpus with a potential to support multilinguality as well. 1. Motivation Recent years has seen an increasing deluge of heterogeneous unstructured text on the Web. While semantic technologies and text mining systems can serve an important role in mapping and linking text to formal knowledge for efficient retrieval and management, unambiguous processing of natural language, in particular the ability to capture complex knowledge statements is far from being solved. In order to push the semantic web forward, more efforts should be carried out to extract formal semantic knowledge out of the unstructured text due to the amount of knowledge it can deliver. 2. Problem Statement Information Extraction systems can serve an important role in extracting knowledge from text by identifying references to named entities and recognize some relationships between these entities. However, in order to extract more complex insights from the text, different NLP approaches are needed. 3. Related Work The authors of [1] tried to address a close but not similar approach for extracting entities from events of a military dataset. The purpose of this approach is to build a system that can support knowledge sharing and decision making across different groups from different nations without conversion to a technical format. 4. Research Question How to build a robust system that can extract semantic knowledge from unstructured text, while keeping the semantics unchanged and support multilinguality as well. 5. Hypothesis While Controlled Natural Languages (CNLs) are not a perfect catch all solution to manage and process unstructured text, we argue that rewriting natural language into CNL could offer an attractive solution in certain domains where there is a preexisting need for human orientated CNLs i.e. aviation, legal and public policy. We propose an extension to the work in [2], where the author defines a process to embed CNLs in the grammar of the host languages. 6. Proposed Solution In order to solve this problem we built an architecture divided into two phases, the first phase is the pre- processing that is responsible for preparing the unstructured corpus for parsing, and the second phase is the post-processing responsible for extracting the CNL subtrees and convert them into a structured form as shown in figure [1]. Figure 1: The proposed system architecture. 7. Evaluation The evaluation plan is to implement a beta version of the system, where all the modules in the architecture will be implemented manually by an expert. This version will be tested on a small corpus to get an idea about the performance of the system, and whether some extra modules need to be added to the architecture. Also, the test set will be used to compare the performance of the system after automating each module versus the performance of our reference test set. 8. References [1] D. Mott, D. Braines, S. Poteet, A. Kao, and P. Xue. Controlled Natural Language to facilitate information extraction. ACITA, 2012. [2] A. Ranta. Embedded Controlled Languages. In 4th International Workshop, CNL 2014, pages 1–7. Galway. INSIGHTSC [16]
  • 26. Heuristic Based Adaptive Query Optimization Approach for RSP ∗ Yashpal Singh, Muhammad Intizar Ali, Alessandra Mileo Insight Centre for Data Analytics yashpal.singh, ali.intizar, alessandra.mileo(@insight-centre.org) 1. Motivation and Problem Statement RDF streams are massive unbounded sequence of linked data that are continuously generated at a random rate which raises the issue of optimization of continuous query pro- cessing. The problem of supporting rapid adaptation to run-time conditions during adaptive query processing [3] is of increasing importance in today’s data processing en- vironments. In applications such as network monitoring, telecommunications data management, manufacturing, sen- sor networks, and others, data takes the form of continuous data streams rather than finite stored data sets, and clients require long-running continuous queries as opposed to one- time queries. There are two evaluation strategies of eval- uating sliding window operators: re-evaluation and incre- mental evaluation [4]. The incremental order in which mul- tiple joins are performed as per query plan may decide the performance of the system because selectivity of one join can differ significantly from the selectivity of other joins. However existing approaches lack a cost-bound monitor- ing component which can exploit the run-time parameter and stored heuristics to generate a highly optimized global plan. The goal of our research is to design an adaptive and cost-effective monitoring component for RSP which en- ables existing stream query processing engine to adapt the highly optimized query plan based on the system environ- ment. The monitoring component does not only augments the probability of generating an optimized plan, but also re- duces the run-time delay caused by optimization.This strat- egy granteatly benefits from the available techniques and approaches provided by DSMS and RSP research commu- nities. 2. State-of-the-art Considering state-of-the-art solutions, the RSP engine are categorized in two classes on the basis of their archi- tecture [5]. Blackbox architecture which uses existing sys- tems as sub-components for the processing model [2], [1] and the other one is whitebox architecture where a query engine natively support optimization strategies [5] . To im- plement our approach currently we are taking one engine from whitebox architecture and we will try to make our ap- proach independent of engines architectural design. 3. Proposed Approach and Discussion In this research, we investigate the different query exe- cution model of the existing RSP engines based on the ar- ∗This research has been partially supported by Science Foundation Ire- land (SFI) under grant No. SFI/12/RC/2289 for RSP(RDF Stream Process- ing) chitectural design. Our main goal is to implement a cost- effective monitoring component which helps the engine at run-time to decide the best global optimized plan as per the current system environment. We consider the merging of logical and physical plan incorporated by the existing query optimizers, to rival their performance and flexibility. All our techniques extend existing platform to a parallel and distributed RDF stream processing. We have maintained some initial set of cost-bound param- eters: referential integrity, window size of stored map- pings, success rate of generated plan and joining patterns between streams over the query life cycle. These all pa- rameters counts as heuristics for our new monitoring com- ponent and helps the query execution component to de- cide the global optimized plan. As a next step, we plan to build the query execution monitoring component indepen- dent of the existing RSP engines architecture while consid- ering some cost-bound parameters. Recent initiatives such as AROA(Adaptive Run-time Overhead Adjustment) [6] in the DSMS community has a significant contribution to the efficient stream data query processing. We intend to Con- tribute to RSP in designing an efficient stream query pro- cessing model and that can support different existing RSP engines in standard approach. References [1] D. Anicic, P. Fodor, S. Rudolph, and N. Stojanovic. Ep- sparql: a unified language for event processing and stream reasoning. In Proceedings of the 20th international confer- ence on World wide web, pages 635–644. ACM, 2011. [2] D. F. Barbieri, D. Braga, S. Ceri, E. Della Valle, and M. Grossniklaus. C-sparql: Sparql for continuous querying. In Proceedings of the 18th international conference on World wide web, pages 1061–1062. ACM, 2009. [3] A. Deshpande, Z. Ives, and V. Raman. Adaptive query pro- cessing. Foundations and Trends in Databases, 1(1):1–140, 2007. [4] T. M. Ghanem, M. Hammad, M. F. Mokbel, W. G. Aref, A. K. Elmagarmid, et al. Incremental evaluation of sliding-window queries over data streams. Knowledge and Data Engineering, IEEE Transactions on, 19(1):57–72, 2007. [5] D. Le-Phuoc, J. X. Parreira, M. Hausenblas, and M. Hauswirth. Continuous query optimization and evaluation over unified linked stream data and linked open data. Technical report, Citeseer, 2010. [6] H.-H. Lee, H.-K. Park, J.-C. Park, W.-S. Lee, and K.-H. Joo. Adaptive run-time overhead adjustments for optimizing mul- tiple continuous query processing. International Journal of Software Engineering and Its Applications, 8(11):183–196, 2014. INSIGHTSC [17]
  • 27. Instance Search with Semantic Analysis and Faceted Navigation Zhenxing Zhang Cathal Gurrin Alan F.Smeaton Insight Centre for Data Analytics E-mail zhenxing.zhang, cathal.gurrin, alan.smeaton@insight-centre.org Abstract In this paper we present an interactive instance search ap- proach to finding video clips which contain specific objects from a large video collection. We implement a complete in- teractive search system with open vocabulary querying and faceted navigation to enable users to search video archives based on automatically extracted semantic categories, ob- ject labels, and object attributes from visual content. 1. Introduction Recent work has described methods to build content based video retrieval systems from automatically extracted semantic information using various approaches, such as col- laborative browsing or concept filtering. However an anal- ysis of these methods reveals that subjective and imprecise user abilities when querying and interacting with retrieval systems pose challenges to make use of advanced visual re- trieval capabilities. This paper presents an interactive in- stance search approach with open vocabulary querying and faceted navigation to enable users to browse large video collections quickly and intuitively based on automatically extracted semantic categories, object labels, and object at- tributes from video content. We implemented a complete interactive search system and designed comprehensive ex- periments on large public data collections to evaluate the efficiency and effectiveness of proposed interactive video search tool. 2. Visual Content Analysis The recent development in image classification and object recognition using Deep Convolutional Networks (DCNs) outperforms other state-of-the-art approaches on various evaluation data sets.The learning framework, Caffe [2], along with learned models is available to use as open source, which encourages researchers to use and con- tribute to the framework. In our work, we employ DCNs to extract meaningful information, such as semantic con- cepts, object labels, and object attributes, to describe the visual content of video shots. More specifically, we chose two pre-trained models to cover the wide range of possible topics in the BBC programming videos, such as desired ob- ject information as well as environments context. We used the Caffe Library [2] running on a machine equipped with GeForce GTX 970 Graphic card and 16 GB RAM to do the heavy visual processing tasks. 3. Interactive Search In this work, we incorporated an open-vocabulary query stage to allow unrestricted query search and faceted naviga- tion to help users access large archives. Open Vocabulary Querying The open-vocabulary querying stage allow us to measure of relatedness between input query words and the list of semantic labels used by the system, which ensures that even if the query word is not one of the pre-trained semantic labels, the system is able to pro- duce a ranked list of closest match multimedia documents for users. Faceted Navigation During the results presentation stage, we implemented faceted navigation to support users with content exploration. This essentially reflects the fact that users may seek information in a number of different ways from their own understanding of the query topics. Figure 1: Screenshot for interactive browsing interface For our implementation, we build a network of related concepts from WordNet [1], a freely available lexical dic- tionary, to measure semantic relatedness or similarity be- tween different words. Figure 1 demonstrated an example of using our faceted navigation to find the interested video clips from two different routes. In this text query example, users can drill down through concepts and attributes more naturally. 4. Experiment The experiment data set consists of about 100 hours of video with the content coming from various BBC TV pro- grammes. The performance of the system is measured with an average score(0–100) calculated by taking both search accuracy and search efficiency into account. The experi- ment results demonstrates our approach could help users to find the interested video clips quickly. More comprehensive evaluation experiments will be carried out by the end of this year to further evaluate the performance. References [1] I. Feinerer and K. Hornik. wordnet: WordNet Interface, 2015. R package version 0.1-10. [2] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir- shick, S. Guadarrama, and T. Darrell. Caffe: Convolu- tional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014. INSIGHTSC [18]