SlideShare a Scribd company logo
1 of 29
Download to read offline
Semantically Enhanced
Quality Assurance in the
JURION Business Use Case
Dimitris Kontokostas, Christian Mader,
Christian Dirschl, Katja Eck, Michael Leuthold,
Jens Lehmann, Sebastian Hellmann
ESWC 2016
Overview
● Wolters Kluwers overview
● Use Case Tools
● Challenges
● Solutions
● Evaluation
● Future Work
ESWC 2016
Wolters Kluwers
Wolters Kluwer provides solutions to customers
in over 170 countries and provides content in at
least a dozen languages.
Focusing on legal, tax, finance and health
industries.
ESWC 2016
Wolters Kluwer Transformation
ESWC 2016
Wolters Kluwer Transformation
Quality
ESWC 2016
WKD in LOD2 project
ESWC 2016
ESWC 2016
WKD in the ALIGNED Project
ESWC 2016
RDF in the publishing industry
ESWC 2016
Use Case Tools
ESWC 2016
● TDDD: Test Driven (Data) Development
○ Methodology, definitions & Tools
● SPARQL
● Reusable unit tests for
○ vocabularies
○ datasets
○ applications
● Test Auto Generators
○ OWL
○ IBM Shapes
○ DSP (Dublin Core Set Profiles)
○ W3c Shapes (in progress)
● Open Source (Apache license)
● Stable tool, used in many research & industrial settings
http://rdfunit.aksw.org
ESWC 2016
https://www.poolparty.biz
● Commercial product developed by Semantic Web Company
● Thesauri development in a collaborative way
○ From scratch / by extraction of terms from a document corpus
● Compliance to the 5-star Open Data principles (RDF & SKOS)
● Automatically retrieve potential additional concepts for inclusion into the
thesauri by querying SPARQL endpoints (e.g. DBpedia)
● identify and link to related resources from local / remote projects
● Simple ontology editing (rdf:type, rdfs:subClassOf, rdfs:domain/range,...)
● Automated quality assurance mechanisms
○ Conformance to SKOS or a custom schema
○ Enforcement level of some quality metrics can be configured by the user so that it is, e.g.,
possible to get an alert if circular hierarchical relation
○ Check a taxonomy “as a whole” against a set of potential quality violations
ESWC 2016
Challenges
ESWC 2016
Metadata RDF Conversion Verification
Existing Infrastructure
● Platform Content Interface (PCI) ontology
○ proprietary schema that describes legal documents and metadata in OWL
● PCI revisions => verify data conforms to PCI
● Proprietary SOAP-based validation service
○ Package based validation => hard error detection
○ Asynchronous & complex web service => hard to use
○ Network dependency => potentially unstable
ESWC 2016
Metadata RDF Conversion Verification
Continuous & high quality triplification of semi-structured data is a common
problem in the information industry. Schema changes and enhancements are
routine tasks, but ensuring data quality is still very often purely manual effort.
So any automation will support a lot of real-life use cases in different domains.
Goal: Based on the schema, test cases should automatically be created, which
are run on a regular basis against the data that needs to be transformed. The
errors detected will lead to refinements and changes of the XSLT scripts and
sometimes also to schema changes, which impose again new automatically
created test cases
ESWC 2016
ESWC 2016
RDFUnit / JUnit Integration
ESWC 2016
Quality Control in Thesaurus Management
● WKD develops multiple controlled vocabularies for annotating documents (e.g.,
court decision, labour law,...) using PoolParty
● Interconnected to each other
● Consistency and quality must be ensured over all vocabularies
● Various quality issues, e.g.,
○ Duplicates
○ Links to deprecated (deleted) concepts
○ Unresolvable links
● Up to now curated manually in deployed system, regular errors in production
versions
ESWC 2016
Quality Control in Thesaurus Management
The creation and maintenance of knowledge models is gaining importance in the
Web of Data. These tasks are increasingly being executed by SME’s in the domain,
not in knowledge modelling and IT as such. Therefore, better automatic support of
these processes will directly help achieving quality and efficiency gains.
● Automated quality checks over multiple vocabularies
● Improved notifications: email on changes performed by users
● Additional statistics on, e.g, vocabulary dependencies, changes, etc
ESWC 2016
Vocabulary link validation (PoolParty)
● Uses project metadata to identify linked vocabularies
● Link is invalid if target concept is either deprecated or deleted
● Creates a report for human curators
● Vocabulary repair still manual process
Quality Control in Thesaurus Management
ESWC 2016
Results & Evaluation
The analysis is based on measured metrics and the qualitative feedback of
experts and users.
Participants of the evaluation study were selected from WKD staff in the fields of
software development and data development. There were seven participants in
total: four involved in the expert evaluation and three content experts involved in
the usability/interview evaluation.
● Productivity
● Quality
● Agility
ESWC 2016
Productivity (RDFUnit)
● Total time for quality checks and error detection
● The time need for manual interaction.
What we measured:
● 1ms to 50ms per single test (depending on the document / ontology size)
○ as close to real-time as possible, currently a couple of minutes
● Quality checks can be triggered by manual execution, but they are always
verified automatically by the CI build system
● A total of 44.000 tests with a total duration of 11 minutes
○ may scale-up easily when parallelized or clustered
ESWC 2016
Quality (RDFUnit)
What kind of errors can be detected and is categorization possible?
● Experts concluded that it is helpful to spot errors introduced by changes,
since issues spotted in this way can be assumed to point to really existing
errors; the causes of which can be identified and addressed
● Successful tests are less significant as we are not yet able to evaluate
whether and how the measurements taken correspond to target measures and
these tests do not point to concrete errors.
○ Coverage & other metrics needed
ESWC 2016
Agility (RDFUnit)
… time to include new requirements
● Including new constraints or adapting existing constraints works by adding
new reference documents to the input dataset to make the test environment
as representative as possible.
● The process of generating tests and testing is fully automated, it adapts very
easily to changed parameters.
● Adding more documents to the input dataset increases the total runtime
ESWC 2016
Productivity (PoolParty)
● The number of checked links
● The number of violations
● The total time
What we measured:
The presentation of the results was well understood. In general, the tool was
received well by the experts, which was reflected by their feedback in the
interviews.
ESWC 2016
Quality (PoolParty)
● No false broken link detection
● Prototype still lacks some usability.
ESWC 2016
Agility (PoolParty)
… integration, configuration time and extension
● Very useful for getting an overview
● cases it is desired to limit the link lookups and adapt the way links to external
datasets are detected
○ Use custom base URI or regular expression-based techniques
● Re-configuration is possible but recompiling the application might be needed
○ Plans to delegate this process to unified views
ESWC 2016
Future Work
● Error analysis (statistics, time to fix an issue, regressions)
● Test coverage and better metrics
● Improve the UI of the Link Validation tool
● Provide more advanced settings
● Inter-repository Link Validation
ESWC 2016
Thank You!
Questions ?
(You might want to) take a look at…
RDF and XML Interoperability W3c Community group
https://www.w3.org/community/rax/

More Related Content

Viewers also liked

DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)Dimitris Kontokostas
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDimitris Kontokostas
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016Dimitris Kontokostas
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Qualityandimou
 
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...Paulo Pinheiro
 
Contextual Data Collection for Smart Cities
Contextual Data Collection for Smart CitiesContextual Data Collection for Smart Cities
Contextual Data Collection for Smart CitiesHenrique O. Santos
 
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data MappingsRMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data MappingsPieter Heyvaert
 
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum IntelligenceTEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum IntelligenceLora Aroyo
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)Lora Aroyo
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Lora Aroyo
 
Semantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research EnvironmentsSemantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research EnvironmentsHenrique O. Santos
 

Viewers also liked (15)

DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)DBpedia i18n - Amsterdam Meeting (30/01/2014)
DBpedia i18n - Amsterdam Meeting (30/01/2014)
 
DBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in DublinDBpedia+ / DBpedia meeting in Dublin
DBpedia+ / DBpedia meeting in Dublin
 
8th DBpedia meeting / California 2016
8th DBpedia meeting /  California 20168th DBpedia meeting /  California 2016
8th DBpedia meeting / California 2016
 
Assessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset QualityAssessing and Refining Mappings to RDF to Improve Dataset Quality
Assessing and Refining Mappings to RDF to Improve Dataset Quality
 
VOLT - ESWC 2016
VOLT - ESWC 2016VOLT - ESWC 2016
VOLT - ESWC 2016
 
cyclades eswc2016
cyclades eswc2016cyclades eswc2016
cyclades eswc2016
 
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
Human-Aware Sensor Network Ontology: Semantic Support for Empirical Data Coll...
 
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks Assessing the performance of RDF Engines: Discussing RDF Benchmarks
Assessing the performance of RDF Engines: Discussing RDF Benchmarks
 
Contextual Data Collection for Smart Cities
Contextual Data Collection for Smart CitiesContextual Data Collection for Smart Cities
Contextual Data Collection for Smart Cities
 
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data MappingsRMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
RMLEditor: A Graph-based Mapping Editor for Linked Data Mappings
 
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum IntelligenceTEDx Navesink 2015: to be AND not to be - Quantum Intelligence
TEDx Navesink 2015: to be AND not to be - Quantum Intelligence
 
CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)CrowdTruth @VU Faculty Colloquium (June 2015)
CrowdTruth @VU Faculty Colloquium (June 2015)
 
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
Crowds & Niches Teaching Machines to Diagnose: NLeSC Kick off eHumanities pr...
 
Wither OWL
Wither OWLWither OWL
Wither OWL
 
Semantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research EnvironmentsSemantic Support for Complex Ecosystem Research Environments
Semantic Support for Complex Ecosystem Research Environments
 

Similar to Semantically enhanced quality assurance in the jurion business use case

Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...Martin Kaltenböck
 
Dineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5yearsDineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5yearsdineshkumar selvaraj
 
QA is not quality
QA is not qualityQA is not quality
QA is not qualityAlex Wilson
 
Agile Testing Analytics
Agile Testing AnalyticsAgile Testing Analytics
Agile Testing AnalyticsQASymphony
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia VoulibasiISSEL
 
SharadchandraPawar 4 Years Manual and Web service Testinng
SharadchandraPawar  4 Years Manual and Web service TestinngSharadchandraPawar  4 Years Manual and Web service Testinng
SharadchandraPawar 4 Years Manual and Web service TestinngSharad Pawar
 
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...HostedbyConfluent
 
Listen to Your Machines: DevOps Analytics for Better Feedback Loops
Listen to Your Machines: DevOps Analytics for Better Feedback LoopsListen to Your Machines: DevOps Analytics for Better Feedback Loops
Listen to Your Machines: DevOps Analytics for Better Feedback LoopsSplunk
 
Solo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 UpcSolo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 UpcPepe
 
StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?mabl
 
Jonathan Dunn Resume
Jonathan Dunn ResumeJonathan Dunn Resume
Jonathan Dunn ResumeJonathan Dunn
 
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...CodeScience
 

Similar to Semantically enhanced quality assurance in the jurion business use case (20)

Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
Presentation ADEQUATe Project: Workshop on Quality Assessment and Improvement...
 
Dineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5yearsDineshkumar_Automation tester_6.5years
Dineshkumar_Automation tester_6.5years
 
QA is not quality
QA is not qualityQA is not quality
QA is not quality
 
Agile Testing Analytics
Agile Testing AnalyticsAgile Testing Analytics
Agile Testing Analytics
 
Santosh_Nayak_CV
Santosh_Nayak_CVSantosh_Nayak_CV
Santosh_Nayak_CV
 
Triantafyllia Voulibasi
Triantafyllia VoulibasiTriantafyllia Voulibasi
Triantafyllia Voulibasi
 
SharadchandraPawar 4 Years Manual and Web service Testinng
SharadchandraPawar  4 Years Manual and Web service TestinngSharadchandraPawar  4 Years Manual and Web service Testinng
SharadchandraPawar 4 Years Manual and Web service Testinng
 
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
Semantic Validation: Enforcing Kafka Data Quality Through Schema-Driven Verif...
 
Shuchi_Agrawal
Shuchi_AgrawalShuchi_Agrawal
Shuchi_Agrawal
 
Umesh Resume
Umesh ResumeUmesh Resume
Umesh Resume
 
Listen to Your Machines: DevOps Analytics for Better Feedback Loops
Listen to Your Machines: DevOps Analytics for Better Feedback LoopsListen to Your Machines: DevOps Analytics for Better Feedback Loops
Listen to Your Machines: DevOps Analytics for Better Feedback Loops
 
Solo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 UpcSolo Requisitos 2008 - 07 Upc
Solo Requisitos 2008 - 07 Upc
 
StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?StarWest 2019 - End to end testing: Stupid or Legit?
StarWest 2019 - End to end testing: Stupid or Legit?
 
Jonathan Dunn Resume
Jonathan Dunn ResumeJonathan Dunn Resume
Jonathan Dunn Resume
 
TMNKSSAI
TMNKSSAITMNKSSAI
TMNKSSAI
 
TR.Bhavani_Inf
TR.Bhavani_InfTR.Bhavani_Inf
TR.Bhavani_Inf
 
Ravi_Nelluri_QA
Ravi_Nelluri_QARavi_Nelluri_QA
Ravi_Nelluri_QA
 
Ignatius Prasad Guntupalli
Ignatius Prasad GuntupalliIgnatius Prasad Guntupalli
Ignatius Prasad Guntupalli
 
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
CodeScience Webinar - Automated Testing for Your Salesforce App — Tips and Tr...
 
Denker SQA Résumé
Denker SQA RésuméDenker SQA Résumé
Denker SQA Résumé
 

Recently uploaded

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Semantically enhanced quality assurance in the jurion business use case

  • 1. Semantically Enhanced Quality Assurance in the JURION Business Use Case Dimitris Kontokostas, Christian Mader, Christian Dirschl, Katja Eck, Michael Leuthold, Jens Lehmann, Sebastian Hellmann
  • 2. ESWC 2016 Overview ● Wolters Kluwers overview ● Use Case Tools ● Challenges ● Solutions ● Evaluation ● Future Work
  • 3. ESWC 2016 Wolters Kluwers Wolters Kluwer provides solutions to customers in over 170 countries and provides content in at least a dozen languages. Focusing on legal, tax, finance and health industries.
  • 4. ESWC 2016 Wolters Kluwer Transformation
  • 5. ESWC 2016 Wolters Kluwer Transformation Quality
  • 6. ESWC 2016 WKD in LOD2 project
  • 8. ESWC 2016 WKD in the ALIGNED Project
  • 9. ESWC 2016 RDF in the publishing industry
  • 11. ESWC 2016 ● TDDD: Test Driven (Data) Development ○ Methodology, definitions & Tools ● SPARQL ● Reusable unit tests for ○ vocabularies ○ datasets ○ applications ● Test Auto Generators ○ OWL ○ IBM Shapes ○ DSP (Dublin Core Set Profiles) ○ W3c Shapes (in progress) ● Open Source (Apache license) ● Stable tool, used in many research & industrial settings http://rdfunit.aksw.org
  • 12. ESWC 2016 https://www.poolparty.biz ● Commercial product developed by Semantic Web Company ● Thesauri development in a collaborative way ○ From scratch / by extraction of terms from a document corpus ● Compliance to the 5-star Open Data principles (RDF & SKOS) ● Automatically retrieve potential additional concepts for inclusion into the thesauri by querying SPARQL endpoints (e.g. DBpedia) ● identify and link to related resources from local / remote projects ● Simple ontology editing (rdf:type, rdfs:subClassOf, rdfs:domain/range,...) ● Automated quality assurance mechanisms ○ Conformance to SKOS or a custom schema ○ Enforcement level of some quality metrics can be configured by the user so that it is, e.g., possible to get an alert if circular hierarchical relation ○ Check a taxonomy “as a whole” against a set of potential quality violations
  • 14. ESWC 2016 Metadata RDF Conversion Verification Existing Infrastructure ● Platform Content Interface (PCI) ontology ○ proprietary schema that describes legal documents and metadata in OWL ● PCI revisions => verify data conforms to PCI ● Proprietary SOAP-based validation service ○ Package based validation => hard error detection ○ Asynchronous & complex web service => hard to use ○ Network dependency => potentially unstable
  • 15. ESWC 2016 Metadata RDF Conversion Verification Continuous & high quality triplification of semi-structured data is a common problem in the information industry. Schema changes and enhancements are routine tasks, but ensuring data quality is still very often purely manual effort. So any automation will support a lot of real-life use cases in different domains. Goal: Based on the schema, test cases should automatically be created, which are run on a regular basis against the data that needs to be transformed. The errors detected will lead to refinements and changes of the XSLT scripts and sometimes also to schema changes, which impose again new automatically created test cases
  • 17. ESWC 2016 RDFUnit / JUnit Integration
  • 18. ESWC 2016 Quality Control in Thesaurus Management ● WKD develops multiple controlled vocabularies for annotating documents (e.g., court decision, labour law,...) using PoolParty ● Interconnected to each other ● Consistency and quality must be ensured over all vocabularies ● Various quality issues, e.g., ○ Duplicates ○ Links to deprecated (deleted) concepts ○ Unresolvable links ● Up to now curated manually in deployed system, regular errors in production versions
  • 19. ESWC 2016 Quality Control in Thesaurus Management The creation and maintenance of knowledge models is gaining importance in the Web of Data. These tasks are increasingly being executed by SME’s in the domain, not in knowledge modelling and IT as such. Therefore, better automatic support of these processes will directly help achieving quality and efficiency gains. ● Automated quality checks over multiple vocabularies ● Improved notifications: email on changes performed by users ● Additional statistics on, e.g, vocabulary dependencies, changes, etc
  • 20. ESWC 2016 Vocabulary link validation (PoolParty) ● Uses project metadata to identify linked vocabularies ● Link is invalid if target concept is either deprecated or deleted ● Creates a report for human curators ● Vocabulary repair still manual process Quality Control in Thesaurus Management
  • 21. ESWC 2016 Results & Evaluation The analysis is based on measured metrics and the qualitative feedback of experts and users. Participants of the evaluation study were selected from WKD staff in the fields of software development and data development. There were seven participants in total: four involved in the expert evaluation and three content experts involved in the usability/interview evaluation. ● Productivity ● Quality ● Agility
  • 22. ESWC 2016 Productivity (RDFUnit) ● Total time for quality checks and error detection ● The time need for manual interaction. What we measured: ● 1ms to 50ms per single test (depending on the document / ontology size) ○ as close to real-time as possible, currently a couple of minutes ● Quality checks can be triggered by manual execution, but they are always verified automatically by the CI build system ● A total of 44.000 tests with a total duration of 11 minutes ○ may scale-up easily when parallelized or clustered
  • 23. ESWC 2016 Quality (RDFUnit) What kind of errors can be detected and is categorization possible? ● Experts concluded that it is helpful to spot errors introduced by changes, since issues spotted in this way can be assumed to point to really existing errors; the causes of which can be identified and addressed ● Successful tests are less significant as we are not yet able to evaluate whether and how the measurements taken correspond to target measures and these tests do not point to concrete errors. ○ Coverage & other metrics needed
  • 24. ESWC 2016 Agility (RDFUnit) … time to include new requirements ● Including new constraints or adapting existing constraints works by adding new reference documents to the input dataset to make the test environment as representative as possible. ● The process of generating tests and testing is fully automated, it adapts very easily to changed parameters. ● Adding more documents to the input dataset increases the total runtime
  • 25. ESWC 2016 Productivity (PoolParty) ● The number of checked links ● The number of violations ● The total time What we measured: The presentation of the results was well understood. In general, the tool was received well by the experts, which was reflected by their feedback in the interviews.
  • 26. ESWC 2016 Quality (PoolParty) ● No false broken link detection ● Prototype still lacks some usability.
  • 27. ESWC 2016 Agility (PoolParty) … integration, configuration time and extension ● Very useful for getting an overview ● cases it is desired to limit the link lookups and adapt the way links to external datasets are detected ○ Use custom base URI or regular expression-based techniques ● Re-configuration is possible but recompiling the application might be needed ○ Plans to delegate this process to unified views
  • 28. ESWC 2016 Future Work ● Error analysis (statistics, time to fix an issue, regressions) ● Test coverage and better metrics ● Improve the UI of the Link Validation tool ● Provide more advanced settings ● Inter-repository Link Validation
  • 29. ESWC 2016 Thank You! Questions ? (You might want to) take a look at… RDF and XML Interoperability W3c Community group https://www.w3.org/community/rax/