Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Domeo, Text Mining, UIMA and Clerezza
1. DOMEO ANNOTATION TOOLKIT
AND TEXT MINING
CREATING, VISUALISING, CURATING AND SHARING
TEXT MINING RESULTS
Paolo Ciccarese, PhD
paolo.ciccarese@gmail.com
January 30th 2012, W3C Scientific Discourse Call
2. Domeo Annotation Toolkit is a collection of software
components that allow to create and share
annotation of web documents and their fragments
It can export and exchange all the annotation in
Annotation Ontology (AO) RDF format
The Domeo client is the user interface that can be
used to produce manual and semi-automatic
annotation of HTML documents directly in your
browser
http://annotationframework.org/
3. ANNOTATION ONTOLOGY
OWL vocabulary for representing and sharing
annotation and semantic annotationof digital
resources and their fragments:
Is orthogonal to the domain(s) of interest
http://purl.org/ao/home
Supports Stand-off annotation
Offers tools for identifying fragments
Designed with extension points
Defines basic annotation containers
Supports versioning
Tracks provenance
4. DOMEO AND TEXT MINING SERVICES
Domeo allows to trigger text mining algorithms
when they are available through web services
Software connectors have to be developed to
translate the results in a suitable format
The results are displayed in the web documents
Users can record their feedback/judgment through
customizable user interfaces
5. NCBO ANNOTATOR
http://www.bioontology.org/annotator-service
Web service that annotates textual metadata (e.g.
journal abstract) with relevant ontology concepts
It is possible to preselect the ontologies of interests
as one of the many parameters
6. DOMEO AND THE NCBO ANNOTATOR
http://www.bioontology.org/annotator-service
Domeo allows automatic/manual annotation with
terms coming from selected ontologies managed by
the BioPortal
12. SOFTWARE CONNECTORS
At the current stage
For each text mining service we have to write a
specific connector that normally is translating offset
and range into prefix and postfix
And keep it up to date!
13. UIMA, CLEREZZA AND AO
OSS BASED INFRASTRUCTURE FOR TEXT MINING OVER
ONTOLOGIES
TommasoTeofili and Paolo Ciccarese
tommaso@apache.org
14. APACHE UIMA
Architecturalframework for UIM
OASIS standard
Build, deploy and run text mining pipelines
Scaling capabilities for large volumes of data
NLP/TM algorithms wrapped as Analysis Engines
http://uima.apache.org/
15. UIMA TYPES
Defining annotation domain in Typesystems
Types and features are just declared
Existing Typesystemscan be
imported/exported/enhanced
Ease data exchange between AEs
Two “main” types
TOP
Annotation
16. APACHE CLEREZZA
Service platform for linked data
OSGi-based
RDF API
RESTful Web Service Framework
TripleStore independent
Integrated with Apache UIMA
http://incubator.apache.org/clerezza/
17. UIMA/CLEREZZA CONVENTION
devs can create custom types / typesystems
need to manage URIs
integration of services vs ontology sharing
ClerezzaTypeSystem
ClerezzaBaseAnnotation
uri
ClerezzaBaseEntity
uri
label (rdfs:label)
references (annotations referring this entity)
service specific annotations and entity types are defined
subclassing the above
22. CONVERSION STRATEGIES
UIMA annotations stored inside CAS
Services “talking” via webservices + RDF
CAS to RDF mapping via Clerezza
Pluggable mapping strategies
Clerezza Default
AnnotationOntology
…
23. CONVERSION STRATEGIES
Change mapping strategies via XML/Eclipse plugin
Or in the descriptor directly
<nameValuePair>
<name>mappingStrategy</name>
<value><string>ao</string></value>
</nameValuePair>
26. DOMEO ANNOTATION TOOLKIT V.2
DomeoAnnotation Toolkit v.2 is planned by the end
of the first quarter of 2012
It will consist in major refactoring to improve
modularity and make plug-ins writing easier
It will include various new features and will be the
first step towards a federated architecture
It will be open source!
27. DOMEO FEDERATION
We currently have two instances of the Domeo
Toolkit and the number of instances is going to
increase
We need to define a clean architecture that
supports communication between instances or
nodes
Instances should be able to access each other
annotations in multiple ways
28. Annotation Flow
Web Service
DOMEO FEDERATION Triplestore
Domeo Domeo Web Client
Web Client
Node 1 Node 2
SPARQL
Web Client
Domeo DomeoN
Node 3 ode 4
SPARQL
Ex: DT3 retrieves annotation from DT1 through a web service
and from DT2 through a SPARQL query against its triplestore
29. SOFTWARE ANNOTATION ACCESS
Nodes can access annotations of other nodes through
Through Web Services
Annotation by User
Annotation by Group
Annotation by Document
Annotation by Corpora
…
SPARQL queries, when a SPARQL end-point is available
30. USERS ANNOTATION ACCESS
Users can export their own annotation in AO RDF
Annotation by document
Annotation by corpora
All of the annotation
31. Request
CURRENT DOMEO ARCHITECTURE Annotation
Domeo
Web Client
AO-RDF
Annotation
Web Services
Domeo
User
MySQL Annotation
Export
Text Mining UI
Connector
NCBO
Web Service
NCBO
Annotator
32. DOMEO NODE ARCHITECTURE
> ACCESSING EXTERNAL ANNOTATION
Other 1 2
External
Domeo Domeo
Triplestore
Node Web Client
AO-RDF
SPARQL
AO-RDF AO-RDF
Annotation Triple Store
Web Services Connector
Domeo v.2 Node
User
MySQL Annotation
Export
Text Mining UI
Connector
NCBO
Web Service
NCBO
Annotator
33. DOMEO NODE ARCHITECTURE
> ADDING A SPARQL ENDPOINT
Other
External
Domeo Domeo
Triplestore
Node Web Client
AO-RDF
SPARQL
AO-RDF AO-RDF
Annotation Triple Store SPARQL
Web Services Connector
Triplestore
Domeo v.2 Node
User
MySQL Annotation
Export
Text Mining UI
Connector
NCBO
Web Service
NCBO
Annotator
34. DOMEO NODE ARCHITECTURE
> TEXT MINING ALGORITHMS INTEGRATION
Other 1
External
Domeo Domeo
Triplestore
Node Web Client
AO-RDF
SPARQL
AO-RDF AO-RDF
Annotation Triple Store SPARQL
Web Services Connector
Triplestore
Domeo v.2 Node
3 MySQL User
Annotation
Export
Text Mining Clerezza Text Mining UI
Connector Connector Connector
2 4
NCBO Clerezza Text Mining
Library
Web Service Web Service Manager
NCBO UIMA Text Mining
Annotator Algorithm Algorithm
35. DOMEO AND TEXT MINING
IN SUMMARY
Run algorithms within Domeo
Making available the algorithms through Web Services
Integrating the algorithms - as libraries – within the
Domeo architecture.
Run algorithms separately and then
Load the results into a Domeo node through web
services
Store the results directly in the (a) triplestore
Store the results directly in the database
36. W3C COMMUNITY GROUP
OPEN ANNOTATION
Annotation Ontology (AO) and Open Annotation
Collaboration (OAC) are merging
Unified model for representing and sharing
annotation in RDF
http://www.w3.org/community/openannotation/
37. THANK YOU!
If you are interested in using - or contributing to -
the Domeo Annotation Toolkit follow our website
http://annotationframework.org or contact
paolo.ciccarese -at- gmail.com