SlideShare une entreprise Scribd logo
1  sur  37
DOMEO ANNOTATION TOOLKIT
AND TEXT MINING


CREATING,   VISUALISING, CURATING AND SHARING
TEXT MINING RESULTS

Paolo Ciccarese, PhD
paolo.ciccarese@gmail.com


January 30th 2012, W3C Scientific Discourse Call
 Domeo Annotation Toolkit is a collection of software
  components that allow to create and share
  annotation of web documents and their fragments
 It can export and exchange all the annotation in
  Annotation Ontology (AO) RDF format
 The Domeo client is the user interface that can be
  used to produce manual and semi-automatic
  annotation of HTML documents directly in your
  browser


                              http://annotationframework.org/
ANNOTATION ONTOLOGY
   OWL vocabulary for representing and sharing
    annotation and semantic annotationof digital
    resources and their fragments:
       Is orthogonal to the domain(s) of interest




                                                     http://purl.org/ao/home
       Supports Stand-off annotation
       Offers tools for identifying fragments
       Designed with extension points
       Defines basic annotation containers
       Supports versioning
       Tracks provenance
DOMEO AND TEXT MINING SERVICES
 Domeo allows to trigger text mining algorithms
  when they are available through web services
 Software connectors have to be developed to
  translate the results in a suitable format
 The results are displayed in the web documents

 Users can record their feedback/judgment through
  customizable user interfaces
NCBO ANNOTATOR




                                                            http://www.bioontology.org/annotator-service
 Web service that annotates textual metadata (e.g.
  journal abstract) with relevant ontology concepts
 It is possible to preselect the ontologies of interests
  as one of the many parameters
DOMEO AND THE NCBO ANNOTATOR




                                                       http://www.bioontology.org/annotator-service
   Domeo allows automatic/manual annotation with
    terms coming from selected ontologies managed by
    the BioPortal
RUNNING NCBO ANNOTATOR




 Additional text mining services
 will be listed here
NCBO ANNOTATOR RESULTS IN DOMEO




List of recognized
entities
RESULTS CURATION

                   Customizable
CUMULATIVE RESULTS CURATION
 One item only
 All instances with the same text match

 All instances independently from the text match
SERIALIZATION IN AO/RDF
SOFTWARE CONNECTORS
At the current stage
 For each text mining service we have to write a
  specific connector that normally is translating offset
  and range into prefix and postfix
 And keep it up to date!
UIMA, CLEREZZA AND AO
OSS BASED    INFRASTRUCTURE FOR TEXT MINING OVER
ONTOLOGIES

TommasoTeofili and Paolo Ciccarese
tommaso@apache.org
APACHE UIMA
 Architecturalframework for UIM
 OASIS standard

 Build, deploy and run text mining pipelines

 Scaling capabilities for large volumes of data

 NLP/TM algorithms wrapped as Analysis Engines




                                   http://uima.apache.org/
UIMA TYPES
 Defining annotation domain in Typesystems
 Types and features are just declared

 Existing Typesystemscan be
  imported/exported/enhanced
 Ease data exchange between AEs

 Two “main” types
   TOP
   Annotation
APACHE CLEREZZA
 Service platform for linked data
 OSGi-based

 RDF API

 RESTful Web Service Framework

 TripleStore independent

 Integrated with Apache UIMA




                          http://incubator.apache.org/clerezza/
UIMA/CLEREZZA CONVENTION
 devs  can create custom types / typesystems
 need to manage URIs

 integration of services vs ontology sharing

 ClerezzaTypeSystem
     ClerezzaBaseAnnotation
         uri
     ClerezzaBaseEntity
       uri
       label (rdfs:label)

       references (annotations referring this entity)

     service specific annotations and entity types are defined
      subclassing the above
CLEREZZABASEANNOTATION DESCRIPTOR
CLEREZZABASEENTITYDESCRIPTOR
BEFORE
AFTER (URI FIELD INHERITED)
CONVERSION STRATEGIES
 UIMA  annotations stored inside CAS
 Services “talking” via webservices + RDF

 CAS to RDF mapping via Clerezza

 Pluggable mapping strategies
   Clerezza Default
   AnnotationOntology
   …
CONVERSION STRATEGIES
Change mapping strategies via XML/Eclipse plugin




Or in the descriptor directly
 <nameValuePair>
 <name>mappingStrategy</name>
 <value><string>ao</string></value>
 </nameValuePair>
CLEREZZA WEB SERVICES EXAMPLE
LOOKING AHEAD
DOMEO TOOLKIT V. 2

Paolo Ciccarese, PhD
DOMEO ANNOTATION TOOLKIT V.2
 DomeoAnnotation Toolkit v.2 is planned by the end
  of the first quarter of 2012
 It will consist in major refactoring to improve
  modularity and make plug-ins writing easier
 It will include various new features and will be the
  first step towards a federated architecture
 It will be open source!
DOMEO FEDERATION
 We currently have two instances of the Domeo
  Toolkit and the number of instances is going to
  increase
 We need to define a clean architecture that
  supports communication between instances or
  nodes
 Instances should be able to access each other
  annotations in multiple ways
Annotation Flow
                                                                         Web Service
  DOMEO FEDERATION                                                       Triplestore



      Domeo                                        Domeo    Web Client
               Web Client
      Node 1                                       Node 2




                                          SPARQL
                                      Web Client
                             Domeo                                         DomeoN
                             Node 3                                         ode 4
                    SPARQL




Ex: DT3 retrieves annotation from DT1 through a web service
and from DT2 through a SPARQL query against its triplestore
SOFTWARE ANNOTATION ACCESS
Nodes can access annotations of other nodes through
 Through Web Services
       Annotation by User
       Annotation by Group
       Annotation by Document
       Annotation by Corpora
       …
   SPARQL queries, when a SPARQL end-point is available
USERS ANNOTATION ACCESS
Users can export their own annotation in AO RDF
   Annotation by document
   Annotation by corpora
   All of the annotation
Request
CURRENT DOMEO ARCHITECTURE                              Annotation


                              Domeo
                              Web Client
                    AO-RDF




                Annotation
               Web Services



                               Domeo
                                                           User
                                           MySQL           Annotation
                                                           Export
 Text Mining                                       UI
 Connector




   NCBO
 Web Service

  NCBO
 Annotator
DOMEO NODE ARCHITECTURE
> ACCESSING EXTERNAL ANNOTATION
 Other          1                                         2
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store
               Web Services                Connector



Domeo v.2 Node
                                                                   User
                                           MySQL                   Annotation
                                                                   Export
 Text Mining                                                  UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
> ADDING A SPARQL ENDPOINT
 Other
                                            External
 Domeo                        Domeo
                                           Triplestore
  Node                        Web Client
                    AO-RDF
                                           SPARQL

     AO-RDF                                   AO-RDF


                Annotation                 Triple Store    SPARQL
               Web Services                Connector

                                                          Triplestore
Domeo v.2 Node
                                                                        User
                                           MySQL                        Annotation
                                                                        Export
 Text Mining                                                      UI
 Connector




   NCBO
 Web Service

   NCBO
  Annotator
DOMEO NODE ARCHITECTURE
    > TEXT MINING ALGORITHMS INTEGRATION
     Other                                                                     1
                                                                 External
     Domeo                            Domeo
                                                                Triplestore
      Node                            Web Client
                        AO-RDF
                                                                SPARQL

         AO-RDF                                                    AO-RDF


                    Annotation                                  Triple Store        SPARQL
                   Web Services                                 Connector

                                                                                   Triplestore
    Domeo v.2 Node
                              3                                 MySQL                            User
                                                                                                 Annotation
                                                                                                 Export
     Text Mining      Clerezza                Text Mining                                  UI
     Connector        Connector               Connector
2                                                           4


       NCBO            Clerezza               Text Mining
                                    Library




     Web Service      Web Service              Manager

       NCBO              UIMA                 Text Mining
      Annotator        Algorithm               Algorithm
DOMEO AND TEXT MINING
IN SUMMARY
   Run algorithms within Domeo
     Making available the algorithms through Web Services
     Integrating the algorithms - as libraries – within the
      Domeo architecture.
   Run algorithms separately and then
     Load the results into a Domeo node through web
      services
     Store the results directly in the (a) triplestore
     Store the results directly in the database
W3C COMMUNITY GROUP
OPEN ANNOTATION
 Annotation Ontology (AO) and Open Annotation
  Collaboration (OAC) are merging
 Unified model for representing and sharing
  annotation in RDF




                 http://www.w3.org/community/openannotation/
THANK YOU!
If you are interested in using - or contributing to -
the Domeo Annotation Toolkit follow our website
http://annotationframework.org or contact
paolo.ciccarese -at- gmail.com

Contenu connexe

En vedette

BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.Starckn
 
Annotation Ontology (AO)
Annotation Ontology (AO)Annotation Ontology (AO)
Annotation Ontology (AO)Paolo Ciccarese
 
Benefits Of Collaborative E Learning
Benefits Of Collaborative E LearningBenefits Of Collaborative E Learning
Benefits Of Collaborative E LearningWilson Araromi
 
Economic presentation
Economic presentationEconomic presentation
Economic presentationErin McClarty
 
An Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOAAn Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOAAliaksandr Birukou
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristicsdijahapple
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristicsdijahapple
 
Career And Inventory Management
Career And Inventory ManagementCareer And Inventory Management
Career And Inventory ManagementSven Kruijs
 
Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)lacoplano
 
Chapter 3 1 take 2
Chapter 3 1 take 2Chapter 3 1 take 2
Chapter 3 1 take 2gmaidekamido
 
Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?Anil Chopra
 
The Digestive System
The Digestive SystemThe Digestive System
The Digestive Systemjamesdeal1
 
Building Online Learning Environments
Building Online Learning EnvironmentsBuilding Online Learning Environments
Building Online Learning EnvironmentsTracy Shaw
 

En vedette (20)

Apache Marmotta - Introduction
Apache Marmotta - IntroductionApache Marmotta - Introduction
Apache Marmotta - Introduction
 
BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.BOV, Abu Dhabi, U.A.E.
BOV, Abu Dhabi, U.A.E.
 
Annotation Ontology (AO)
Annotation Ontology (AO)Annotation Ontology (AO)
Annotation Ontology (AO)
 
Benefits Of Collaborative E Learning
Benefits Of Collaborative E LearningBenefits Of Collaborative E Learning
Benefits Of Collaborative E Learning
 
E Learning Benefits
E Learning BenefitsE Learning Benefits
E Learning Benefits
 
Russell Simmons Ppt
Russell Simmons PptRussell Simmons Ppt
Russell Simmons Ppt
 
Economic presentation
Economic presentationEconomic presentation
Economic presentation
 
An Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOAAn Integrated Solution for Runtime Compliance Governance in SOA
An Integrated Solution for Runtime Compliance Governance in SOA
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristics
 
Thesartor
ThesartorThesartor
Thesartor
 
Electrical characteristics
Electrical characteristicsElectrical characteristics
Electrical characteristics
 
Career And Inventory Management
Career And Inventory ManagementCareer And Inventory Management
Career And Inventory Management
 
Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)Portofolio1 (Fil Eminimizer)
Portofolio1 (Fil Eminimizer)
 
Being a Club Webmaster
Being a Club WebmasterBeing a Club Webmaster
Being a Club Webmaster
 
Chapter 3 1 take 2
Chapter 3 1 take 2Chapter 3 1 take 2
Chapter 3 1 take 2
 
Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?Next Generation Traffic: Is Your Network Ready?
Next Generation Traffic: Is Your Network Ready?
 
How to Be Your Club's VPPR
How to Be Your Club's VPPRHow to Be Your Club's VPPR
How to Be Your Club's VPPR
 
Chapter 2 5
Chapter 2 5Chapter 2 5
Chapter 2 5
 
The Digestive System
The Digestive SystemThe Digestive System
The Digestive System
 
Building Online Learning Environments
Building Online Learning EnvironmentsBuilding Online Learning Environments
Building Online Learning Environments
 

Similaire à Domeo, Text Mining, UIMA and Clerezza

Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)ukdpe
 
Html 5 Revolution
Html 5 RevolutionHtml 5 Revolution
Html 5 RevolutionAlex Ivy
 
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns FrameworksMike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns Frameworksukdpe
 
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresIntroduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresSandro Pereira
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?Thomas Roessler
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Eb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnEb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnValtech
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Alexandre Morgaut
 
Donabe-essex-conference-readout
Donabe-essex-conference-readoutDonabe-essex-conference-readout
Donabe-essex-conference-readoutDebojyoti Dutta
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai vibrantuser
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptmartinlippert
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5David Nuescheler
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...James Broberg
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsAdrian Cockcroft
 
Exposing Business Value
Exposing Business ValueExposing Business Value
Exposing Business ValueESUG
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle listBharath Marrivada
 
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)Saltlux zinyus
 
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)zinyus
 

Similaire à Domeo, Text Mining, UIMA and Clerezza (20)

Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
 
Html 5 Revolution
Html 5 RevolutionHtml 5 Revolution
Html 5 Revolution
 
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns FrameworksMike Taulty MIX10 Silverlight 4 Patterns Frameworks
Mike Taulty MIX10 Silverlight 4 Patterns Frameworks
 
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi featuresIntroduction to the Azure Service Bus EAI & EDI featuresiedi features
Introduction to the Azure Service Bus EAI & EDI featuresiedi features
 
Web standards, why care?
Web standards, why care?Web standards, why care?
Web standards, why care?
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Eb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management EnEb07 Day Communiqué Web Content Management En
Eb07 Day Communiqué Web Content Management En
 
Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29Wakanda - apps.berlin.js - 2012-11-29
Wakanda - apps.berlin.js - 2012-11-29
 
Corba
CorbaCorba
Corba
 
Donabe-essex-conference-readout
Donabe-essex-conference-readoutDonabe-essex-conference-readout
Donabe-essex-conference-readout
 
Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai Websphere-corporate-training-in-mumbai
Websphere-corporate-training-in-mumbai
 
Modern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScriptModern Architectures with Spring and JavaScript
Modern Architectures with Spring and JavaScript
 
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
Valtech Days 2009 Paris Presentation: WCM in 2010 and an intro to CQ5
 
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
MetaCDN: Enabling High Performance, Low Cost Content Storage and Delivery via...
 
Netflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and OpsNetflix on Cloud - combined slides for Dev and Ops
Netflix on Cloud - combined slides for Dev and Ops
 
Exposing Business Value
Exposing Business ValueExposing Business Value
Exposing Business Value
 
Loadrunner Protocol bundle list
Loadrunner Protocol bundle listLoadrunner Protocol bundle list
Loadrunner Protocol bundle list
 
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가토크릴레이 1탄 html5 전망 (전종홍 박사)
 
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
전문가 토크릴레이 1탄 html5 전망 (전종홍 박사)
 

Plus de Paolo Ciccarese

Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY Paolo Ciccarese
 
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)Paolo Ciccarese
 
Annotopia: Open Annotation Server
Annotopia: Open Annotation ServerAnnotopia: Open Annotation Server
Annotopia: Open Annotation ServerPaolo Ciccarese
 
Paolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo Ciccarese
 
Open Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorialOpen Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorialPaolo Ciccarese
 
2012 CNI Fall Membership Meeting
2012 CNI Fall Membership Meeting2012 CNI Fall Membership Meeting
2012 CNI Fall Membership MeetingPaolo Ciccarese
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...Paolo Ciccarese
 
AO and Annotation Tool for AOC
AO and Annotation Tool for AOCAO and Annotation Tool for AOC
AO and Annotation Tool for AOCPaolo Ciccarese
 
SWAN, HyQue and Nanopublications
SWAN, HyQue and NanopublicationsSWAN, HyQue and Nanopublications
SWAN, HyQue and NanopublicationsPaolo Ciccarese
 
Swan Annotation Tool - Text Mining
Swan Annotation Tool - Text MiningSwan Annotation Tool - Text Mining
Swan Annotation Tool - Text MiningPaolo Ciccarese
 
AO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the webAO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the webPaolo Ciccarese
 
Semantics is not a luxury
Semantics is not a luxurySemantics is not a luxury
Semantics is not a luxuryPaolo Ciccarese
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPaolo Ciccarese
 

Plus de Paolo Ciccarese (14)

Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
Integrating OPEN ANNOTATION with any DOMAIN ONTOLOGY
 
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
(Live) Annotopia Overview by Paolo Ciccarese (Architect and principal developer)
 
Annotopia: Open Annotation Server
Annotopia: Open Annotation ServerAnnotopia: Open Annotation Server
Annotopia: Open Annotation Server
 
Paolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynotePaolo ciccarese DILS 2013 keynote
Paolo ciccarese DILS 2013 keynote
 
Open Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorialOpen Annotation, Specifiers and Specific Resources tutorial
Open Annotation, Specifiers and Specific Resources tutorial
 
Open Annotation Model
Open Annotation ModelOpen Annotation Model
Open Annotation Model
 
2012 CNI Fall Membership Meeting
2012 CNI Fall Membership Meeting2012 CNI Fall Membership Meeting
2012 CNI Fall Membership Meeting
 
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
SemTechBiz 2012: Domeo: a web-based tool for semantic annotation of online do...
 
AO and Annotation Tool for AOC
AO and Annotation Tool for AOCAO and Annotation Tool for AOC
AO and Annotation Tool for AOC
 
SWAN, HyQue and Nanopublications
SWAN, HyQue and NanopublicationsSWAN, HyQue and Nanopublications
SWAN, HyQue and Nanopublications
 
Swan Annotation Tool - Text Mining
Swan Annotation Tool - Text MiningSwan Annotation Tool - Text Mining
Swan Annotation Tool - Text Mining
 
AO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the webAO: Annotation Ontology for science on the web
AO: Annotation Ontology for science on the web
 
Semantics is not a luxury
Semantics is not a luxurySemantics is not a luxury
Semantics is not a luxury
 
PRO Use Cases for Scientific Communities
PRO Use Cases for Scientific CommunitiesPRO Use Cases for Scientific Communities
PRO Use Cases for Scientific Communities
 

Dernier

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 

Dernier (20)

UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 

Domeo, Text Mining, UIMA and Clerezza

  • 1. DOMEO ANNOTATION TOOLKIT AND TEXT MINING CREATING, VISUALISING, CURATING AND SHARING TEXT MINING RESULTS Paolo Ciccarese, PhD paolo.ciccarese@gmail.com January 30th 2012, W3C Scientific Discourse Call
  • 2.  Domeo Annotation Toolkit is a collection of software components that allow to create and share annotation of web documents and their fragments  It can export and exchange all the annotation in Annotation Ontology (AO) RDF format  The Domeo client is the user interface that can be used to produce manual and semi-automatic annotation of HTML documents directly in your browser http://annotationframework.org/
  • 3. ANNOTATION ONTOLOGY  OWL vocabulary for representing and sharing annotation and semantic annotationof digital resources and their fragments:  Is orthogonal to the domain(s) of interest http://purl.org/ao/home  Supports Stand-off annotation  Offers tools for identifying fragments  Designed with extension points  Defines basic annotation containers  Supports versioning  Tracks provenance
  • 4. DOMEO AND TEXT MINING SERVICES  Domeo allows to trigger text mining algorithms when they are available through web services  Software connectors have to be developed to translate the results in a suitable format  The results are displayed in the web documents  Users can record their feedback/judgment through customizable user interfaces
  • 5. NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Web service that annotates textual metadata (e.g. journal abstract) with relevant ontology concepts  It is possible to preselect the ontologies of interests as one of the many parameters
  • 6. DOMEO AND THE NCBO ANNOTATOR http://www.bioontology.org/annotator-service  Domeo allows automatic/manual annotation with terms coming from selected ontologies managed by the BioPortal
  • 7. RUNNING NCBO ANNOTATOR Additional text mining services will be listed here
  • 8. NCBO ANNOTATOR RESULTS IN DOMEO List of recognized entities
  • 9. RESULTS CURATION Customizable
  • 10. CUMULATIVE RESULTS CURATION  One item only  All instances with the same text match  All instances independently from the text match
  • 12. SOFTWARE CONNECTORS At the current stage  For each text mining service we have to write a specific connector that normally is translating offset and range into prefix and postfix  And keep it up to date!
  • 13. UIMA, CLEREZZA AND AO OSS BASED INFRASTRUCTURE FOR TEXT MINING OVER ONTOLOGIES TommasoTeofili and Paolo Ciccarese tommaso@apache.org
  • 14. APACHE UIMA  Architecturalframework for UIM  OASIS standard  Build, deploy and run text mining pipelines  Scaling capabilities for large volumes of data  NLP/TM algorithms wrapped as Analysis Engines http://uima.apache.org/
  • 15. UIMA TYPES  Defining annotation domain in Typesystems  Types and features are just declared  Existing Typesystemscan be imported/exported/enhanced  Ease data exchange between AEs  Two “main” types  TOP  Annotation
  • 16. APACHE CLEREZZA  Service platform for linked data  OSGi-based  RDF API  RESTful Web Service Framework  TripleStore independent  Integrated with Apache UIMA http://incubator.apache.org/clerezza/
  • 17. UIMA/CLEREZZA CONVENTION  devs can create custom types / typesystems  need to manage URIs  integration of services vs ontology sharing  ClerezzaTypeSystem  ClerezzaBaseAnnotation  uri  ClerezzaBaseEntity  uri  label (rdfs:label)  references (annotations referring this entity)  service specific annotations and entity types are defined subclassing the above
  • 21. AFTER (URI FIELD INHERITED)
  • 22. CONVERSION STRATEGIES  UIMA annotations stored inside CAS  Services “talking” via webservices + RDF  CAS to RDF mapping via Clerezza  Pluggable mapping strategies  Clerezza Default  AnnotationOntology  …
  • 23. CONVERSION STRATEGIES Change mapping strategies via XML/Eclipse plugin Or in the descriptor directly <nameValuePair> <name>mappingStrategy</name> <value><string>ao</string></value> </nameValuePair>
  • 25. LOOKING AHEAD DOMEO TOOLKIT V. 2 Paolo Ciccarese, PhD
  • 26. DOMEO ANNOTATION TOOLKIT V.2  DomeoAnnotation Toolkit v.2 is planned by the end of the first quarter of 2012  It will consist in major refactoring to improve modularity and make plug-ins writing easier  It will include various new features and will be the first step towards a federated architecture  It will be open source!
  • 27. DOMEO FEDERATION  We currently have two instances of the Domeo Toolkit and the number of instances is going to increase  We need to define a clean architecture that supports communication between instances or nodes  Instances should be able to access each other annotations in multiple ways
  • 28. Annotation Flow Web Service DOMEO FEDERATION Triplestore Domeo Domeo Web Client Web Client Node 1 Node 2 SPARQL Web Client Domeo DomeoN Node 3 ode 4 SPARQL Ex: DT3 retrieves annotation from DT1 through a web service and from DT2 through a SPARQL query against its triplestore
  • 29. SOFTWARE ANNOTATION ACCESS Nodes can access annotations of other nodes through  Through Web Services  Annotation by User  Annotation by Group  Annotation by Document  Annotation by Corpora  …  SPARQL queries, when a SPARQL end-point is available
  • 30. USERS ANNOTATION ACCESS Users can export their own annotation in AO RDF  Annotation by document  Annotation by corpora  All of the annotation
  • 31. Request CURRENT DOMEO ARCHITECTURE Annotation Domeo Web Client AO-RDF Annotation Web Services Domeo User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 32. DOMEO NODE ARCHITECTURE > ACCESSING EXTERNAL ANNOTATION Other 1 2 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store Web Services Connector Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 33. DOMEO NODE ARCHITECTURE > ADDING A SPARQL ENDPOINT Other External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node User MySQL Annotation Export Text Mining UI Connector NCBO Web Service NCBO Annotator
  • 34. DOMEO NODE ARCHITECTURE > TEXT MINING ALGORITHMS INTEGRATION Other 1 External Domeo Domeo Triplestore Node Web Client AO-RDF SPARQL AO-RDF AO-RDF Annotation Triple Store SPARQL Web Services Connector Triplestore Domeo v.2 Node 3 MySQL User Annotation Export Text Mining Clerezza Text Mining UI Connector Connector Connector 2 4 NCBO Clerezza Text Mining Library Web Service Web Service Manager NCBO UIMA Text Mining Annotator Algorithm Algorithm
  • 35. DOMEO AND TEXT MINING IN SUMMARY  Run algorithms within Domeo  Making available the algorithms through Web Services  Integrating the algorithms - as libraries – within the Domeo architecture.  Run algorithms separately and then  Load the results into a Domeo node through web services  Store the results directly in the (a) triplestore  Store the results directly in the database
  • 36. W3C COMMUNITY GROUP OPEN ANNOTATION  Annotation Ontology (AO) and Open Annotation Collaboration (OAC) are merging  Unified model for representing and sharing annotation in RDF http://www.w3.org/community/openannotation/
  • 37. THANK YOU! If you are interested in using - or contributing to - the Domeo Annotation Toolkit follow our website http://annotationframework.org or contact paolo.ciccarese -at- gmail.com