The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation)

The evolution of semantic
technology evaluation
in my own flesh
(The 15 tips for technology evaluation)
Raúl García-Castro
Ontology Engineering Group.
Universidad Politécnica de Madrid, Spain
rgarcia@fi.upm.es

Speaker: Raúl García-Castro

Talk at IMATI-CNR,
October 15th,
Genova, Italy

Index

• 
• 
• 
• 
• 

Self-awareness
Crawling (Graduation Project)
Walking (Ph.D. Thesis)
Cruising (Postdoctoral Research)
Insight

© Raúl García Castro

Talk at IMATI-CNR. 15th October 2013

2

Who am I?

•  Assistant Professor
-  Ontology Engineering Group
-  Computer Science School at Universidad Politécnica de Madrid (UPM)

•  Research lines
-  Evaluation and benchmarking of semantic technologies
•  Conformance and interoperability of ontology engineering tools
•  Evaluation infrastructures
-  Ontological engineering
•  Sensors, ALM, energy efficiency, context, software evaluation
-  Application integration
http://www.garcia-castro.com/


3

Semantic Web technologies
The Semantic Web is:
•  “An extension of the current web in which information is given welldefined meaning, better enabling computers and people to work in
cooperation” [Berners-Lee et al., 2001]
•  A common framework for data sharing and reusing across applications
• 

Distinctive characteristics:
- 
- 
- 
- 

Use of W3C standards
Use ontologies as data models
Inference of new information
Open world assumption

Information
Directory
manager
Service
directory
manager
Ontology
editor
Ontology
visualizer

• 

High heterogeneity:

-  Different functionalities
•  In general
•  In particular

-  Different KR formalisms
•  Different expressivity
•  Different reasoning capabilities

Ontology
browser
Ontology
selector

Service
discoverer

Ontology
aligner

Ontology
localizer

Ontology
evaluator

Ontology
searcher

Ontology
learner

Ontology
ranker

Ontology
modularizer

Ontology
profiler

ONTOLOGY
ONTOLOGY
DEVELOPMENT
& MANAGEMENT CUSTOMIZATION

Ontology
evolution
manager
Ontology
evolution
visualizer

Service
non-functional
selector

Ontology
matcher
Ontology
merger

Service
process
mediator

Instance
editor
Query
answering

Ontology
integrator

Manual
annotation

Ontology
transformer

Automatic
annotation

Semantic
query
processor

Ontology
configuration
manager

Ontology
reconciler

Ontology
populator

ONTOLOGY
EVOLUTION

ONTOLOGY
ALIGNMENT

ONTOLOGY
INSTANCE
GENERATION

Ontology
versioner

Service
choreography
engine

Distributed
ontology
repository
Distributed
instance
repository
Distributed
data
repository
Distributed
annotated
data
repository

Service
orchestration

Distributed
alignment
repository

Semantic
query
editor

Service
composer

Distributed
registry

QUERYING
AND
REASONING

SEMANTIC
WEB
SERVICES

DATA
MANAGEMENT

García-Castro, R.; Muñoz-García, O.; Gómez-Pérez, A.; Nixon L. "Towards a component-based framework for developing Semantic
Web applications". 3rd Asian Semantic Web Conference (ASWC 2008). 2-5 February, 2009. Bangkok, Thailand.



4

Ontology engineering tools
Allow the creation and management of ontologies:
•  Ontology editors
-  User oriented

•  Ontology language APIs
-  Programming oriented



5

Index

• 
• 
• 
• 
• 

Self-awareness
Insight

http://www.phdcomics.com/comics/archive.php?comicid=1012



6

Evaluation goal
GQM paradigm: Any software measurement activity should be preceded by:
1.- The identification of a software engineering goal ...

Latency

2.- ... which leads to questions ...

Scalability

3.- ... which in turn lead to actual metrics.

Goal: To improve the performance and the scalability of the methods provided
by the ontology management APIs of ontology development tools
Which is the
actual
performance of
the API methods?

Is the
performance
of the methods
stable?

Are there any
anomalies in the
performance of
the methods?

Do changes in
a method’s
parameters
affect its
performance?

Does tool load
affect the
performance of
the methods?

Execution time
of each method

Variance of
execution
times of each
method

Percentage of
execution times
out of range in
each method’s
sample

Execution time
with parameter
A = Execution
time with
parameter B

Tool load versus
execution time
relationship

Metric: Execution times of the methods of the API with different load factors



7

Evaluation data
•  Atomic operations of the ontology management API
•  Multiple benchmarks defined for each method according to
changes in its parameters
•  Benchmarks parameterised according to the number of
consecutive executions of the method
insertConcept(String ontology, String concept)
insertConcept
insertRelation
insertClassAttribute
insertInstanceAttribute
insertConstant
insertReasoningElement
insertInstance
updateConcept
updateRelation
updateClassAttribute
updateInstanceAttribute
updateConstant
updateReasoningElement
updateInstance
.......

(72 methods)


benchmark1_1_08(N)
“Inserts N concepts in 1 ontology”

benchmark1_1_09(N)
“Inserts 1 concept in N ontologies”
Ontology_1

Concept_1
.
.
.

Ontology_1
Concept_1

Concept_N

.
.
.

Ontology_N

(128 benchmarks)


8

Workload generator
•  Generates and inserts into the tool synthetic
ontologies accordant with:
-  Load factor (X). Defines the size of ontology data
-  Ontology structure dependent on the benchmarks

Benchmark

Operation

Execution needs

benchmark1_1_08

Inserts N concepts in an ontology

1 ontology

benchmark1_1_09

Inserts a concept in N ontologies

N ontologies

benchmark1_3_20

Removes N concepts from an ontology

1 ontology with N concepts

benchmark1_3_21

Removes a concept from N ontologies

N ontologies with 1 concept

For executing all the benchmarks, the ontology structure includes the execution needs of all the benchmarks



9

Evaluation infrastructure
Benchmark
Suite
Executor

Performance
Benchmark
Suite

Workload
Generator

Ontology
Development
Tool

Measurement
Data Library

Statistical
Analyser

To be instantiated for each tool

…
http://knowledgeweb.semanticweb.org/wpbs/



10

Statistical analyser
Benchmark
Suite
Executor

Performance
Benchmark
Suite

Workload
Generator

Ontology
Development
Tool

BenchStats

Measurement
Data Library

Statistical
Analyser

Measurement
Data Library
benchmark1_1_08
400 measurements
2134 ms.
2300 ms.
2242 ms.
2809 ms.
...

Statistical software

benchmark1_1_09
400 measurements
1399 ms.
2180 ms.
...
benchmark1_3_20
400 measurements
2032 ms.
1459 ms.
...

…


Load

N

UQ

LQ

IQR

Median

% Outliers

Function

benchmark1_1_08

5000

400

60

60

0

60

1.25

y=62.0-0.009x

benchmark1_1_09

5000

400

912

901

11

911

1.75

y=910.25-0.003x

benchmark1_3_20

5000

400

160

150

10

150

1.25

y=155.25-0.003x

benchmark1_3_21

5000

400

160

150

10

151

0.25

y=154.96-0.001x


11

Result analysis - Latency
Metric for the execution time:

Metric for anomalies in the execution times:

The median of the execution times of a method

Percentage of outliers in the execution times of a method

No se puede mostrar la imagen. Puede que su equipo no tenga suﬁciente memoria para abrir la
imagen o que ésta esté dañada. Reinicie el equipo y, a continuación, abra el archivo de nuevo. Si
sigue apareciendo la x roja, puede que tenga que borrar la imagen e insertarla de nuevo.

8 Methods with
execution times>800 ms.


N=400, X=5000

2 methods with
% outliers>5%

N=400, X=5000

Metric for the variability of the execution time:

Effect of changes in method parameters:

The interquartile range of the execution times of a method

Comparison of the medians of the execution times of
the benchmarks that use the same method


3 methods with
IQR>11 ms.

N=400, X=5000



5 methods with
differences in
execution times >
60 ms.

N=400, X=5000


12

Result analysis - Scalability
Effect of changes in WebODE’s load:
Slope of the function estimated by simple linear regression of the medians of the
execution times from a minimum load (X=500) to a maximum one (X=5000).

8 methods with
slope>0.1 ms.

N=400, X=500..5000



13

Limitations
•  Evaluating other tools is expensive
Benchmark
Suite
Executor

Workload
Generator

Performance
Benchmark
Suite

Measurement
Data Library

Statistical
Analyser

Ontology
Ontology
Ontology
Development
Ontology
Development
Development
Tool
Development
Tool
Tool
Tool

•  Analysis of results was difficult
-  The evaluation was executed 10 times with different load factors:
500, 1000, 1500, 2000, 2500, 3000, 3500, 4000, 4500, and 5000
-  128 benchmarks x 10 executions = 1280 files with results!!!!!

García-Castro R., Gómez-Pérez A "Guidelines for Benchmarking the Performance of Ontology Management APIs" 4th International
Semantic Web Conference (ISWC2005), LNCS 3729. November 2005. Galway, Ireland.



14

The 15 tips for technology evaluation

•  Know the technology

•  Support different types
of technology


•  Automate the
evaluation framework
•  Expect reproducibility
•  Beware of result
analysis
•  Learn statistics
•  Plan for evaluation
requirements


15

Index

• 
• 
• 
• 
• 

Self-awareness
Insight
KHAAAAN!




16

Interoperability in the Semantic Web
•  Interoperability is the ability that Semantic Web
technologies have to interchange ontologies and use
them
-  At the information level; not at the system level
-  In terms of knowledge reuse; not information integration

•  In the real world it is not feasible to use a single
system or a single formalism
•  Different behaviours in interchanges between
different formalisms:

C
disjoint
subclass

A

B

LOSSLESS

Different formalism

A

LOSS

Same formalism

disjoint

B

subclass

A

C

subclass

disjoint

subclass

A

A

C

B

disjoint

myDisjoint

A

myDisjoint

B
C

subclass

B

C

B

A

B


17

Evaluation goal
To evaluate and improve the interoperability of
Semantic Web technologies using RDF(S) and OWL as
interchange languages



18

Evaluation workflow - Manual
Import

Export

Ontologies
Ontologies

Tool X
Oi

RDF(S)/OWL

Tool X

Oi’

Oi

Oi’

RDF(S)/OWL

Oi = Oi’ + β - β’

Oi = Oi’ + α – α’

Interoperability
Ontologies

Tool Y

Tool X
Oi

Oi’

RDF(S)/OWL

Oi’’

Oi = Oi’’ + α - α’ + β - β’


19

Evaluation workflow - Automatic

Interchange
language

Existing
ontologies
O1..On

Tool X
O1RDF(S)/OWL

Tool Y

O1’

O1’’
RDF(S)/OWL

Step 1: Import + Export

O1’’’

O1’’’’
RDF(S)/OWL

Step 2: Import + Export

O1’’=O1’’’’ + β - β’

O1 = O1’’ + α - α’
Interchange

O1 = O1’’’’ + α - α’ + β - β’



20

Evaluation data - OWL Lite Import Test Suite
Component combinations

RDF/XML Syntax variants
<rdf:Description rdf:about="#class1">
<rdf:type rdf:resource="&rdfs;Class"/>
</rdf:Description>

=

<rdfs:Class rdf:about="#class1">
</rdfs:Class>

Group
Class hierarchies
Class equivalences
Classes defined with set operators
Property hierarchies
Properties with domain and range
Relations between properties
Global cardinality constraints and
logical property characteristics
Single individuals
Named individuals and properties
Anonymous individuals and properties
Individual identity
Syntax and abbreviation
TOTAL

No.
17
12
2
4
10
3
5

Subclass of class

Subclass of restriction
Value constraints Set operators

Cardinality +
object property

Cardinality +
datatype property

3
5
3
3
15
82

David S., García-Castro, R.; Gómez-Pérez, A. "Defining a Benchmark Suite for Evaluating the Import of OWL Lite Ontologies". Second International
Workshop OWL: Experiences and Directions 2006 (OWL2006). November, 2006. Athens, Georgia, USA.



21

Evaluation criteria
•  Execution informs about the correct execution:
– 
– 
– 
– 

OK. No execution problem
FAIL. Some execution problem
Comparer Error (C.E.) Comparer exception
Not Executed. (N.E.) Second step not executed

•  Information added or lost in terms of triples
Oi = Oi’ + α - α’
•  Interchange informs whether the ontology has been
interchanged correctly with no addition or loss of
information:
–  SAME if Execution is OK and Information added and
Information lost are void
–  DIFFERENT if Execution is OK but Information added
or Information lost are not void
–  NO if Execution is FAIL, N.E. or C.E.


Oi = Oi’ ?


22

Evaluation campaigns
RDF(S) Interoperability Benchmarking
3 ontology
repositories

3 ontology
development tools

6
Tools

(Frames)

OWL Interoperability Benchmarking
1 ontology-based
annotation tool

5 ontology
development tools

3 ontology
repositories

9
Tools


SemTalk

(Frames)

(OWL)

(Frames)


23

Evaluation infrastructure - IRIBA

!

!
!

!
!

!

http://knowledgeweb.semanticweb.org/iriba/



24

Evaluation infrastructure - IBSE
Benchmark descriptions
OWL Lite
Import
Benchmark
Suite

benchmarkOntology

Reports
(HTML, SVG)

rdf:type

1 Describe

benchmarks

<rdf:RDF
xmlns:rdf="http://www.w3.org/
<rdf:RDF
1999/02/22-rdf-syntax-ns#"
<rdf:RDF
xmlns:rdfs="http://www.w3.org/
2000/01/rdf-schema#"
xmlns:owl="http://www.w3.org/
2002/07/owl#" xmlns:xsd="http://
www.w3.org/2001/XMLSchema#"
arkOntology#" arkOntology#">
2002/07/owl#"
<owl:Ontology rescription of the
benchmark suite inputs.</
rdfs:comment>
<owl:versionInfo>24 October
2006</owl:versionInfo>
</owl:Ontology>


Execution results
resultOntology

rdf:type

Tools

…
• 
• 
• 
• 

2 Execute

benchmarks

<rdf:RDF
<rdf:RDF
<rdf:RDF
2002/07/owl#" xmlns:xsd="http://
www.w3.org/2001/XMLSchema#"
arkOntology#" arkOntology#">
2002/07/owl#"
<owl:Ontology rescription of the
benchmark suite inputs.</
rdfs:comment>
<owl:versionInfo>24 October
2006</owl:versionInfo>
</owl:Ontology>


3 Generate

reports

Automatically executes experiments between all the tools
Allows configuring different execution parameters
Uses ontologies to represent benchmarks and results
Depends on external ontology comparers (KAON2 OWL Tools and RDFutils)

http://knowledgeweb.semanticweb.org/benchmarking_interoperability/ibse/
García-Castro, R.; Gómez-Pérez, A., Prieto-González J. "IBSE: An OWL Interoperability Evaluation Infrastructure". Third International Workshop OWL:
Experiences and Directions 2007 (OWL2007). June, 2007. Innsbruck, Austria.



25

Evaluation results - Variability
•  High variability in evaluation results
Tool import/export:

Ontology comparison:
Same information
More information
Less information
Tool fails
Comparer fails
Not valid ontology

Models and executes
Does not model and executes
Models and fails
Does not model and fails
Not executed

•  Different perspectives for analysis
- 
- 
- 
- 

50
45

Results per tool / pair of tools
Results per component
Result evolution over time
…
W-W

K-P

K-W

P-W

K-P-W

Y
Y

Y
Y

Y
-

Y
N

Y
-

Y
-

Y
-

Classes instance of multiple metaclasses (1)

Y

N

-

N

-

-

-

Class hierarchies without cycles (3)

Y

Y

Y

Y

Y

Y

Y
Y

Y
-

-

N
-

-

-

-

Datatype properties whose range is String (5)
Datatype properties whose range is a XML Schema datatype (2)
Object properties without domain or range (8)
Object properties with a domain and a range (2)

Y
Y
Y
Y

Y
Y
Y

Y
Y
Y

N
Y
Y

N
Y
Y

Y
Y

N
Y

Object properties with multiple domains or ranges (5)
Instances of undefined resources (1)
Instances of a single class (2)
Instances of a multiple classes (1)
Instances related via object properties (7)
Instances related via datatype properties (2)
Instances related via datatype properties with range a XML schema datatype (2)
Instances related via undefined object or datatype properties (3)

Y
Y
Y
Y
Y
-

Y
N
Y
Y
-

Y
Y
Y
Y
-

Y
N
Y
N
-

Y
Y
Y
-

Y
Y
Y
-

Y
Y
N
-

Models and executes

30

Not models and exec

25

Models and fails

20

Not models and fails

15
10
5
0
04-2005

Y

Class hierarchies with cycles (2)
Classes related through object or datatype properties (6)
Datatype properties without domain or range (7)
Datatype properties with multiple domains (3)


K-K

35

05-2005

10-2005

DESTINATION

01-2006

DESTINATION
JE

PO

SW

K2

GA

ST

WE

PF

Jena

100

100

100

78

85

16

17

5

5

Protégé-OWL

100

100

95

78

89

16

17

5

17

5

SWIProlog

100

100

100

78

55

45

17

5

39

6

0

KAON2

78

78

78

78

40

39

6

0

46

13

15

13

GATE

96

52

79

74

46

13

15

13

JE

PO

SW

K2

GA

ST

WE

PF

Jena

100

100

100

78

85

16

17

5

Protégé-OWL

100

100

95

78

89

16

17

SWIProlog

100

100

100

78

55

45

KAON2

78

78

78

78

40

GATE

96

52

79

74

ORIGIN

P-P

Classes (2)
Classes instance of a single metaclass (4)

ORIGIN

Combinations

40

SemTalk

45

46

46

27

24

46

17

0

SemTalk

45

46

46

27

24

46

17

0

WebODE

17

18

0

6

16

17

17

12

WebODE

17

18

0

6

16

17

17

12

Protégé-Frames

5

5

0

0

4

5

0

13

Protégé-Frames

5

5

0

0

4

5

0

13


26

Evaluation results - Interoperability
Clear picture of the interoperability between different tools
•  Low interoperability and few clusters of interoperable tools
•  Interoperability depends on:
- 
- 
- 
- 

Ontology translation (tool knowledge model)
Specification (development decisions)
Robustness (tool defects)
Tools participating in the interchange (each behaves differently)

•  Tools have improved
•  Involvement of tool developers is needed
-  Tool developers have been informed
-  Tool improvement is out of our scope

•  Results are expected to change
-  Continuous evaluation is needed
García-Castro, R.; Gómez-Pérez, A. "Interoperability results for Semantic Web technologies using OWL as the interchange language". Web
Semantics: Science, Services and Agents in the World Wide Web. ISSN: 1570-8268. Elsevier. Volume 8, number 4. pp. 278-291. November 2010.
García-Castro, R.; Gómez-Pérez, A. "RDF(S) Interoperability Results for Semantic Web Technologies". International Journal of Software Engineering
and Knowledge Engineering. ISSN: 0218-1940. Editor: Shi-Kuo Chang. Volume 19, number 8. pp. 1083-1108. December 2009.



27

Benchmarking interoperability
Method for benchmarking interoperability
•  Common for different Semantic Web technologies
•  Problem-focused instead of tool-focused
•  Manual vs automatic experiments:
-  It depends on the specific needs of the benchmarking
-  Automatic: cheaper, more flexible and extensible
-  Manual: higher quality of results

Resources for benchmarking interoperability
•  All the benchmark suites, software and results are publicly
available
•  Independent of:
RDF(S) Interoperability B.

OWL Interoperability B.

RDF(S) Import B. Suite

RDF(S) Export B. Suite

OWL Lite Import B. Suite

RDF(S) Interoperability B. Suite

-  The interchange language
-  The input ontologies

Manual
Tool X

Automatic
Tool Y

rdfsbs
IRIBA

Tool X

Tool Y

IBSE

García-Castro, R. "Benchmarking Semantic Web technology". Studies on the Semantic Web vol. 3. AKA Verlag – IOS Press. ISBN:
978-3-89838-622-7. January 2010.



28

Limitations
•  Number of results to analyse increased exponentially
-  2168 executions in the RDF(S) benchmarking activity and
-  6642 executions in the OWL one

•  Hard to support and maintain different test data and
tools
•  Every tool to be evaluated had to be deployed in the
same computer



29


•  Support different test
data
•  Support different
types of technology

•  Use machineprocessable
descriptions of
evaluation resources

•  Automate the
analysis
requirements
•  Organize (or join)
evaluation
campaigns

30

Index

• 
• 
• 
• 
• 

Self-awareness
Insight




31

The SEALS Project (RI-238975)

hVp://www.seals-‐project.eu/

Project
Coordinator:

Asunción
Gómez
Pérez

<asun@ﬁ.upm.es>

3

2
1 1 2

1

EC

contribu2on:

3.500.000
€

Dura2on:

June
2009-‐June
2012

Universidad
Politécnica
de
Madrid,
Spain
(Coordinator)

University
of
Sheﬃeld,
UK

University
of
Mannheim,
Germany

Forschungszentrum
InformaCk,
Germany

University
of
Zurich,
Switzerland

University
of
Innsbruck,
Austria

STI
InternaConal,
Austria

InsCtut
NaConal
de
Recherche
en

Open
University,
UK

InformaCque
et
en
AutomaCque,
France

Oxford
University,
UK


32

Semantic technology evaluation @ SEALS

SEALS
Pla8orm

SEALS
Evalua2on
Campaigns

SEALS
Evalua2on

Services

SEALS
Community

Wrigley S.; García-Castro R.; Nixon L. "Semantic Evaluation At Large Scale (SEALS)". 21st International World Wide Web Conference (WWW 2012).
European projects track. pp. 299-302. Lyon, France. 16-20 April 2012.



33

The SEALS entities

Evaluations

Tools

Ontology engineering
Storage and reasoning
Ontology matching
Semantic search
Semantic web service

Results

Test Data

Raw Results
Interpretations

15/10/13
34


34

Structure of the SEALS entities
• Java
Binaries

• Shell
scripts

• Bundles

• BPEL

• Java
Binaries

• Ontologies

EnCty

Discovery,

Valida2on

SEALS
Ontologies

Metadata

Data

Exploita2on

http://www.seals-project.eu/ontologies/

García-Castro R.; Esteban-Gutiérrez M.; Kerrigan M.; Grimm S. "An Ontology Model to Support the Automatic Evaluation of Software". 22nd
International Conference on Software Engineering and Knowledge Engineering (SEKE 2010). pp. 129-134. Redwood City, USA. 1-3 July 2010.



35

SEALS logical architecture

O

S

Evaluation Organisers

A

Technology
Providers

Technology
Adopters

SEALS
Portal

Run2me

Evalua2on

Service

SEALS

Service
Manager

Software agents

SEALS Repositories
Test
Data

Repository

Service

Tools

Repository

Service

Results

Repository

Service

Evalua2on

Descrip2ons

Repository
Service

García-Castro R.; Esteban-Gutiérrez M.; Gómez-Pérez A. "Towards an Infrastructure for the Evaluation of Semantic Technologies". eChallenges
e-2010 Conference (e-2010). pp. 1-8. Warsaw, Poland. 27-29 October 2010.



36

Challenges
•  Tool heterogeneity

Virtualization
as a
technology
enabler

-  Hardware requirements
-  Software requirements

•  Reproducibility
-  Ensure execution environment offers
the same initial status

Processing
Node

Execu)on

Node

Virtual
Machine

Tool

Virtual
Machine

Virtualiza)on
Solu)on

•  VMWare
Server
2.0.2

•  VMWare
vSphere
4

•  Amazon
EC2
(In
progress)

Tool

…



37

Evaluation campaign methodology

SEALS
Methodology for
Evaluation
Campaigns

•  SEALS-independent
•  Includes:
- 
- 
- 
- 
- 
- 

Actors
Process
Recommendations
Alternatives
Terms of participation
Use rights

Raúl García-Castro and Stuart N. Wrigley

September 2011

INITIATION

INVOLVEMENT

DISSEMINATION

PREPARATION
&
EXECUTION

FINALIZATION

García Castro R.; Martin-Recuerda F.; Wrigley S. "SEALS. Deliverable 3.8 SEALS Methodology for Evaluation Campaigns v2". Technical Report.
SEALS project. July 2011.



38

Current SEALS evaluation services
Ontology
engineering

Ontology
reasoning

Ontology
matching

Semantic
search

Semantic web
service

•  Conformance
•  Interoperability
•  Scalability

DL reasoning:
• Classification
• Class
satisfiability
• Ontology
satisfiability
• Entailment
• Non-entailment
• Instance retrieval
RDF reasoning:
• Conformance

• Matching
accuracy
• Matching
accuracy
multilingual
• Scalability
(ontology size, #
CPU)

• Search accuracy,
efficiency
(automated)
• Usability,
satisfaction (userin-the-loop)

• SWS Discovery

Conformance &
interoperability:
•  RDF(S)
•  OWL Lite, DL and
Full
•  OWL 2
Expressive x3
•  OWL 2 Full
Scalability:
•  Real-world
•  LUBM
•  Real-world +
•  LUBM +

DL reasoning:
•  Gardiner test
suite
•  Wang et al.
repository
•  Versions of
GALEN
•  Ontologies from
EU projects
•  Instance
retrieval test
data
RDF reasoning:
• OWL 2 Full

•  Benchmark
•  Anatomy
Conference
•  MultiFarm
•  Large Biomed
(supported by
SEALS)

Automated:
• EvoOnt
• MusicBrainz
(from QALD-1)
User-in-the-loop:
• Mooney
• Mooney +

• OWLS-TC 4.0
• SAWSDL-TC 3.0
• WSMO-LITE-TC

Evaluations

Test Data



39

New evaluation data – Conformance and interoperability
•  OWL DL test suite à
keyword-driven approach
-  Manual definition of test in CSV/
spreadsheet using a keyword
library

Test Suite Generator
Test Suite
Definition
Script

Expanded
Test Suite
Definition
Script

Preprocessor

Test Suite
Metadata
ontology01.owl
ontology02.owl

Keyword
Library

Interpreter

ontology03.owl
…

OWL2EG (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWL2EG/)

•  OWL 2 test suite à
automatically generate
ontologies of increasing
expressiveness:

Online Ontologies

Ontology Search

Ontology Module
Extraction
Initial
ontologies

Ontology generation process

OWL API
Original test suite

Metad
ata

-  Using ontologies in the Web
-  Maximizing expressiveness

Expressive test
suite

Increase
expressivity

Metad
ata
Maximize
expressivity

OWLDLGenerator (http://knowledgeweb.semanticweb.org/benchmarking_interoperability/OWLDLGenerator/)

Full-expressive
test suite

Metad
ata

García-Castro R.; Gómez-Pérez A. "A Keyword-driven Approach for Generating Ontology Language Conformance Test Data". Engineering
Applications of Artificial Intelligence. ISSN: 0952-1976. Elsevier. Editor: B. Grabot.
Grangel-González I.; García-Castro R. "Automatic Conformance Test Data Generation Using Existing Ontologies in the Web". Second International
Workshop on Evaluation of Semantic Technologies (IWEST 2012). 28 May 2012. Heraklion, Greece.



40

1st Evaluation Campaign
Campaign
Ontology engineering

Tool

Jena
Sesame
Protégé 4
Protégé OWL
NEON toolkit
OWL API
Reasoning
HermiT
jcel
FaCT++
Matching
AROMA
ASMOV
Aroma
Falcon-AO
Lily
RiMOM
Mapso
CODI
AgreeMaker
Gerome*
Ef2Match
Semantic search
K-Search
Ginseng
NLP-Reduce
PowerAqua
Jena Arq
Semantic web service 4 OWLS-MX variants

Provider

Country

HP Labs
Aduna
University of Stanford
NEON Foundation
University of Manchester
University of Oxford
Tec. Universitat Dresden
INRIA
INFOTECH Soft
Nantes University
Southeast University
Tsinghua University
FZI
University of Mannheim
Advances in Computing Lab
RWTH Aachen
Nanyang Tec. University
K-Now Ltd
University of Zurich
KMi, Open University
HP Labs, Talis
DFKI

UK
Netherlands
USA
USA
Europe
UK
UK
Germany
UK
France
USA
France
China
China
China
Germany
Germany
USA
Germany
China
UK
Switzerland
Switzerland
UK
UK
Germany

29 tools from
8 countries

Nixon L.; García-Castro R.; Wrigley S.; Yatskevich M.; Trojahn-dos-Santos C.; Cabral L. "The state of semantic technology today – overview of the
first SEALS evaluation campaigns". 7th International Conference on Semantic Systems (I-SEMANTICS2011). Graz, Austria. 7-9 September 2011.



41

2nd Evaluation Campaign
W
P

Tool

10

Jena
HP Labs
UK
W Tool
Provider
Country
Sesame
Aduna
Netherlands
P
Protégé 4
USA
12
AgrMaker
University of Illinois at Chicago
Protégé OWL
Stanford
USA USA Country
W Tool University of Provider
Aroma
INRIA Grenoble Rhone-Alpes
France
NeOn toolkit
NeOn Foundation
Europe
P
AUTOMSv2 University of Manchester
VTT Technical Research Centre Finland
OWL API
UK
13
K-Search
K-Now Ltd
CIDER
Universidad Politecnica de UK Spain UK
HermiT
Ginseng
Switzerland
Madrid
jcel
Technischen Universitat Dresden Germany
Switzerland
CODI NLP-Reduce Universitat Mannheim
FaCT++
UK Germany
UK
CSA PowerAqua University of Ho Chi Minh City
Vietnam
WSReasoner
University of New Brunswick
Canada
Jena Arq v2.8.2
HP Labs, Talis
UK
GOMMA
Universitat Leipzig
Germany
Jena Arq v2.9.0
HP Labs, Talis
UK
Hertuda
Technische Universitat
Germany
rdfQuery v0.5.1University of Southampton
UK
Darmstadt
LDOA beta
Tunis-El Manar University
Tunisia
Lily Semantic Crystal
China Switzerland
Affective Graphs
University of Sheffield
LogMap
UK UK
14
WSMO-LITE-OU
LogMapLt
UK UK
SAWSDL-OU Maastricht University
UK
MaasMtch
Netherlands
OWLS-URJC FZI Forschungszentrum Juan Carlos
University of Rey
Spain
MapEVO
Germany
OWLS-M0
DFKI
Germany
Informatik
MapPSO
FZI Forschungszentrum
Germany
Informatik
MapSSS
Wright State University
USA
Optima
University of Georgia
USA
WeSeEMtch
Technische Universitat
Germany
Darmstadt
YAM++
LIRMM
France

11


Provider

41 tools from 13
countries

Country


42

Evaluation services
Tools
Evaluations

Test data
My tool

Or
deﬁne

your
own


Tools

My test data

Tools

Update
them

Evaluations

My results

Test data

My evaluation

Exploit
results

Execute

evaluaCons

Results

My results

My results
My tool

My test data


43

Quality model for semantic technologies

Tool/Measures

Raw
Results

Interpretations

Quality
Measures

Quality subcharacteristics

Ontology engineering tools

7

20

8

6

Ontology matching tools

1

4

4

2

Reasoning systems

11

0

16

5

Semantic search tools

12

8

18

7

Semantic web service tools

5

9

10

2

Total

34

41

55

17

Radulovic, F., Garcia-Castro, R., Extending Software Quality Models - A Sample In The Domain of Semantic Technologies. 23rd International
Conference on Software Engineering and Knowledge Engineering (SEKE2011). Miami, USA. July, 2011



44

Semantic technology recommendation
I
need
a
robust

ontology
engineering

tool
and
a
seman.c

search
tool
with
the

highest
precision

User
Quality

Requirements

SEALS
Pla<orm

You
should
use
Sesame

v2.6.5
and
Arq
v2.9.0

The
reason
for
this
is...

Alterna.vely,
you
can
use
...

SemanCc

Technology

Quality
model

Seman2c

Technology

Recommenda2on

Tools

Repository

Service

RecommendaCon

Results

Repository

Service

Radulovic F.; García-Castro R. "Semantic Technology Recommendation Based on the Analytic Network Process". 24th Int. Conference on Software
Engineering and Knowledge Engineering (SEKE 2012). Redwood City, CA, USA. 1-3 July 2012. 3rd Best Paper Award!



45

You can use the SEALS Platform

•  The SEALS Platform facilitates:
- 
- 
- 
- 
- 
- 

Comparing tools under common settings
Reproducibility of evaluations
Reusing evaluation resources, completely or partially
Or defining new ones
Managing evaluation resources using platform services
Computational resources for demanding evaluations

•  Don’t start your evaluation from scratch!

15/10/13
46


46


data
•  Facilitate test data
definition
•  Support different
types of technology
•  Define declarative
evaluation workflows
descriptions of


•  Automate the
analysis
requirements
•  Use a quality model
evaluation campaigns
•  Share evaluation
resources
•  Exploit evaluation
results

47

Index

• 
• 
• 
• 
• 

Self-awareness
Insight
insight
noun
[…]
[mass noun] Psychiatry awareness by a mentally ill
person that their mental experiences are not based in
external reality.



48

Evolution towards maturity
Software Evaluation Technology Maturity Model
TABLE I
L EVELS AND THEMES OF SOFTWARE EVALUATION TECHNOLOGY MATURITY

Level
Initial
Repeatable

Reusable

Integrated

Optimized

Formalization of the
evaluation workflow
Ad-hoc workflow informally
defined.
Ad-hoc workflow defined.

Software support to
the evaluation
Manual evaluation.
No software support.
Ad-hoc evaluation software.

Technology-specific
workflow defined.

Reusable
evaluation
software:
- multiple software
products.
- multiple test data.
Evaluation
infrastructure:
- multiple types of
software products.
- multiple test data.
Federation of evaluation
infrastructures:
- autonomous infrastructures.
- interchange of evaluation resources.
- data access and use
policies.

Generic workflow defined.
Machine-processable
and
built reusing common parts.
Evaluation resources built
upon shared principles.
Generic workflow defined.
Machine-processable
and
built reusing common parts.
Evaluation resources built
upon shared principles.
Measured and optimized.

Applicability to multiple
software types
Small number of software
products of the same type.
Small number of software
products of the same type.
Ad-hoc access to software
products.
Multiple software products of the same type.
Generic access to software products.

Usability of test data
Informally defined.
Defined.

Machine-processable.

Exploitability of
results
Informally defined.
Not verifiable.
Combined for some
software products of
the same type.
Combined for many
the same type.

Representativeness
of participants
One team.
One or few teams.

Several teams.

UPM-FBI
RDF(S) Import B.
Suite
RDF(S) Interoperability B.
RDF(S) Export B.
Suite

OWL Interoperability B.
OWL Lite Import B. Suite

RDF(S)
Interoperability B.
Manual
Suite

Tool X

rdfsbs
IRIBA

Multiple software products of different types.
Generic access to software products.

Reused across evaluations.

Combined for many
different types.

Reused
across
evaluations.
Customizable,
optimized
and
curated.

Combined for many
different types.
High availability and
quality.

Tool X

Tool Y

IBSE

Several teams.
Stakeholders.

Multiple
software
products of different
types.
Generic
access
to
software products.
Support any software
product requirement.

Automatic

Tool Y

Community.

characteristics of such software products. This workflow is access, interchange, and use. This federation of infrastructures
supported by evaluation software that can be used to assess permits satisfying any software or hardware requirements of
any software product of the type covered by the evaluation; the different software products; customizing, optimizing, and
García-Castro R. "SET-MM – A Software Evaluation Technology Maturity Model". 23rd International Conference on Software Engineering and
the software product must have previously 660-665. Miami Beach, curating July 2011. and improving the availability and quality
implemented the USA. 7-9 test data;
Knowledge Engineering (SEKE2011). pp.
required mechanisms to be integrated with the evaluation soft- of the evaluation results.
ware. Test García Castro
© Raúl data and evaluation results are machine-processable;

49


data
•  Facilitate test data
definition
•  Support different types
of technology
•  Define declarative
evaluation workflows
descriptions of


•  Automate the
analysis
requirements
•  Use a quality model
evaluation campaigns
•  Share evaluation
resources
•  Exploit evaluation
results

50

Thank you for your
attention!

Speaker: Raúl García-Castro

Talk at IMATI-CNR,
October 15th,
Genova, Italy

The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation)

Recommandé

Recommandé

Contenu connexe

Similaire à The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation)

Similaire à The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation) (20)

Dernier

Dernier (20)

The evolution of semantic technology evaluation in my own flesh (The 15 tips for technology evaluation)